Derivative Works
Datasets or analyses built upon this data
Justification
Derivative datasets demonstrate reuse beyond citation. Validation found OR=3.0× per 10-point increase across 76M+ datasets, stable across years, the strongest evidence that metadata quality predicts meaningful reuse.
Practical Guide
Track derivatives. OR = 3.0× per 10-point SHARE increase — strongest quality link.
Derivative datasets — new datasets or analyses built upon existing data — are the strongest evidence linking metadata quality to meaningful reuse. Our validation found OR = 3.0× per 10-point SHARE score increase (p < 0.001, n = 76M+ across 7 repositories). Source datasets have a mean deposit-time score of 36.0 vs. 27.1 for the population (Cohen's d = 1.36). This is SHARE's most compelling evidence that metadata quality drives downstream value.
For Repositories
- Track IsDerivedFrom relationships via DataCite RelatedIdentifier
- Display derivative counts on dataset landing pages
- Enable depositors to link their datasets to source datasets
For Depositors
- When building on existing data, use IsDerivedFrom to credit the source dataset
- Track how many derivative works your data has generated
- Derivative works are the strongest evidence your data is useful — highlight them
Outcome metric with the strongest quality-to-reuse evidence. OR = 3.0× is the most compelling validation of the SHARE framework.
Standards Sources
Convergence score: 2/4 independent sources —
| Standard | Field / Property | Obligation Level |
|---|---|---|
| DataCite 4.6 | #12 RelatedIdentifier (IsDerivedFrom) | Recommended |
| OpenAIRE | Related dataset tracking | Aggregation |
FAIR Principle Alignment
Primary mapping: Outcome metric (not FAIR-derived)
This is an outcome metric not derived from FAIR principles. The R (Reuse) bucket intentionally measures realized impact rather than metadata quality, enabling validation that deposit-time signals predict downstream use.
How This Signal Is Measured
Count of datasets citing this as source via IsDerivedFrom. Binary for v1: any derivatives = 1.
Empirical Evidence (Zenodo, n=1.3M)
Per-signal statistics use Zenodo as the primary validation source because it is the largest general-purpose repository with structured DataCite metadata, natural variance across all 25 signals, and available citation/usage data. Domain-specific repositories exhibit ceiling effects or restricted variance that preclude per-signal discrimination. Cross-repository validation is reported separately.
Data Source
Zenodo (CERN)
1,328,100 records analyzed
Interpretation: Not directly measurable in Zenodo. Derivative works tracked via DataCite IsDerivedFrom relations. Validation found OR=3.0× per 10-point SHARE increase — the strongest evidence linking metadata quality to meaningful downstream reuse.
Cross-repository note: OpenAIRE tracks 73.4M records with cross-dataset relation graphs, enabling derivative tracking at scale.
Quantitative Evidence
Scoring Formula
derivative_datasets.length ≥ 1 → 4 pts
Contribution: 4 of 100 points · Reuse bucket (0–20)
With Signal Present
407
datasets (0.2%)
μ = 0.000 citations/dataset
Without Signal
183,465
datasets (99.8%)
μ = 0 (baseline)
Statistical Test
Logistic regression + t-test
p = < 0.001 · t/z = 7.43
Method: Logistic regression + t-test · Source: Zenodo + DataCite API (n = 183,872)
Note: OR = 3.0× per 10-point SHARE increase (p < 0.001, n = 76M+). Source datasets (n = 35) have mean deposit-time score 36.0 vs. 27.1 population (Cohen’s d = 1.36). Strongest evidence linking metadata quality to meaningful downstream reuse.
R — Reuse Bucket
All signals in this bucket: