Justification
A machine-readable license is Essential per RDA (RDA-R1.1-01M). Without a license, data is legally unusable. DataCite includes Rights. Dublin Core includes Rights as a core element. schema.org includes license as recommended. Four sources converge.
Practical Guide
Add a license. 11.3x citation lift — unlicensed data is legally unusable.
A machine-readable license isn't just good practice — it's legally necessary. Data without a license is copyrighted by default, making it unusable. Datasets with SPDX licenses receive 11.3x more citations (RR = 11.32, p < 0.001). The 2.1% of Zenodo datasets without licenses receive almost no citations. The RDA rates this as Essential priority.
For Repositories
- Make license selection required during deposit
- Provide a curated list of SPDX-compatible licenses
- Default to CC-BY-4.0 or CC0 if the depositor doesn't choose
For Depositors
- Always select a license — unlicensed data is legally unusable
- Prefer CC0 (public domain) or CC-BY-4.0 for maximum reuse
- If your funder requires a specific license, check before depositing
Second strongest positive signal (11.3x lift). Essential RDA priority. Legal prerequisite for reuse.
Standards Sources
Convergence score: 4/4 independent sources —
| Standard | Field / Property | Obligation Level |
|---|---|---|
| DataCite 4.6 | #16 Rights | Optional |
| Dublin Core | Rights | Core Element |
| schema.org | license | Recommended |
| RDA FAIR | RDA-R1.1-01M | Essential |
FAIR Principle Alignment
Primary mapping: Reusable (R1.1)
- R1.1: (Meta)data are released with a clear and accessible data usage license
RDA FAIR Data Maturity Model Indicators:
- RDA-R1.1-01M: Metadata includes information about the licence under which the data can be reused
How This Signal Is Measured
Presence of SPDX license identifier or license URL. Binary: machine-readable license present or absent.
Empirical Evidence (Zenodo, n=1.3M)
Per-signal statistics use Zenodo as the primary validation source because it is the largest general-purpose repository with structured DataCite metadata, natural variance across all 25 signals, and available citation/usage data. Domain-specific repositories exhibit ceiling effects or restricted variance that preclude per-signal discrimination. Cross-repository validation is reported separately.
Prevalence
97.9%
of Zenodo datasets
Citation Lift
11.5x
vs. datasets without
Data Source
Zenodo (CERN)
1,328,100 records analyzed
Interpretation: Near-universal on Zenodo with strong lift. The 2.1% without licenses receive almost no citations — confirming that license clarity is essential for reuse.
Quantitative Evidence
Scoring Formula
license_spdx ∈ record → 4 pts
Contribution: 4 of 100 points · Access bucket (0–20)
With Signal Present
1,300,772
datasets (97.9%)
μ = 0.249 citations/dataset
Without Signal
27,328
datasets (2.1%)
μ = 0.022 citations/dataset
Rate Ratio
11.32
95% CI: [10.45–12.26]
P-value
< 0.001
z = 59.44
Significance
Method: Poisson rate ratio · Source: Zenodo (n = 1,328,100)
A — Access Bucket
All signals in this bucket: