License Permissiveness
Open licenses: CC0, CC-BY, MIT, Apache preferred
Justification
Beyond having a license, permissiveness directly impacts reuse potential. CC0 and CC-BY maximize reusability. RDA-R1.1-02M (Important) calls for "standard reuse licence." The practical impact on reuse is well-established in the open science literature.
Practical Guide
Choose open licenses (CC0, CC-BY). Maximizes reuse potential.
Beyond having a license, permissiveness directly impacts who can reuse your data. The 0.21x citation ratio seems counterintuitive — but restrictive licenses (CC-BY-NC, CC-BY-ND) are often used for high-value datasets in competitive fields that naturally attract more citations. SHARE values permissiveness because CC0 and CC-BY remove legal barriers to reuse.
Why this signal matters despite the numbers
The 0.21x citation ratio is expected: restrictive licenses (CC-BY-NC, CC-BY-ND) are used for high-value datasets in competitive fields that attract more citations despite access barriers. SHARE rewards permissiveness because it maximizes reuse potential, not citation counts.
For Repositories
- Recommend CC0 or CC-BY as default license choices
- Explain the reuse implications of each license tier to depositors
- Flag restrictive licenses (NC, ND) with a reuse impact warning
For Depositors
- Choose CC0 (public domain) when possible — it maximizes reuse
- Use CC-BY if you want attribution but no other restrictions
- Only use NC/ND restrictions when legally required by your institution or funder
FAIR-justified: maximizes reuse potential. Negative citation ratio reflects competitive fields, not lower quality.
Standards Sources
Convergence score: 1/4 independent sources —
| Standard | Field / Property | Obligation Level |
|---|---|---|
| RDA FAIR | RDA-R1.1-02M | Important |
FAIR Principle Alignment
Primary mapping: Reusable (R1.1)
- R1.1: (Meta)data are released with a clear and accessible data usage license
RDA FAIR Data Maturity Model Indicators:
- RDA-R1.1-02M: Metadata refers to a standard reuse licence
How This Signal Is Measured
License SPDX code mapped to tier: CC0 > CC-BY > CC-BY-SA > CC-BY-NC > other. Binary: permissive license or not.
Empirical Evidence (Zenodo, n=1.3M)
Per-signal statistics use Zenodo as the primary validation source because it is the largest general-purpose repository with structured DataCite metadata, natural variance across all 25 signals, and available citation/usage data. Domain-specific repositories exhibit ceiling effects or restricted variance that preclude per-signal discrimination. Cross-repository validation is reported separately.
Prevalence
64.9%
of Zenodo datasets
Citation Lift
0.2x
vs. datasets without
Data Source
Zenodo (CERN)
1,328,100 records analyzed
Interpretation: Permissive licenses (CC0, CC-BY) are common but show lower naive lift because Dryad's CC0 requirement shows this signal's value best — eliminating access variance isolates other quality dimensions.
Quantitative Evidence
Scoring Formula
license ∈ {CC0, CC-BY, MIT, Apache} → 4 pts
Contribution: 4 of 100 points · Access bucket (0–20)
With Signal Present
862,143
datasets (64.9%)
μ = 0.109 citations/dataset
Without Signal
465,957
datasets (35.1%)
μ = 0.525 citations/dataset
Rate Ratio
0.21
95% CI: [0.21–0.21]
P-value
< 0.001
z = -409.62
Significance
Method: Poisson rate ratio · Source: Zenodo (n = 1,328,100)
Note: Negative lift expected: restrictive licenses (CC-BY-NC, CC-BY-ND) are used for high-value datasets in competitive fields, which attract more citations despite access barriers.
A — Access Bucket
All signals in this bucket: