Derivative Works

Datasets or analyses built upon this data

Reuse (R)

Outcome metric (not FAIR-derived)

Outcome Metric

Justification

Derivative datasets demonstrate reuse beyond citation. Validation found OR=3.0× per 10-point increase across 76M+ datasets, stable across years, the strongest evidence that metadata quality predicts meaningful reuse.

Practical Guide

outcome

Track derivatives. OR = 3.0× per 10-point SHARE increase — strongest quality link.

Derivative datasets — new datasets or analyses built upon existing data — are the strongest evidence linking metadata quality to meaningful reuse. Our validation found OR = 3.0× per 10-point SHARE score increase (p < 0.001, n = 76M+ across 7 repositories). Source datasets have a mean deposit-time score of 36.0 vs. 27.1 for the population (Cohen's d = 1.36). This is SHARE's most compelling evidence that metadata quality drives downstream value.

For Repositories

Track IsDerivedFrom relationships via DataCite RelatedIdentifier
Display derivative counts on dataset landing pages
Enable depositors to link their datasets to source datasets

For Depositors

When building on existing data, use IsDerivedFrom to credit the source dataset
Track how many derivative works your data has generated
Derivative works are the strongest evidence your data is useful — highlight them

Outcome metric with the strongest quality-to-reuse evidence. OR = 3.0× is the most compelling validation of the SHARE framework.

Standards Sources

Convergence score: 2/4 independent sources —

Adequately justified

Standard	Field / Property	Obligation Level
DataCite 4.6	#12 RelatedIdentifier (IsDerivedFrom)	Recommended
OpenAIRE	Related dataset tracking	Aggregation

FAIR Principle Alignment

Primary mapping: Outcome metric (not FAIR-derived)

This is an outcome metric not derived from FAIR principles. The R (Reuse) bucket intentionally measures realized impact rather than metadata quality, enabling validation that deposit-time signals predict downstream use.

How This Signal Is Measured

Count of datasets citing this as source via IsDerivedFrom. Binary for v1: any derivatives = 1.

Empirical Evidence (Zenodo, n=1.3M)

Per-signal statistics use Zenodo as the primary validation source because it is the largest general-purpose repository with structured DataCite metadata, natural variance across all 25 signals, and available citation/usage data. Domain-specific repositories exhibit ceiling effects or restricted variance that preclude per-signal discrimination. Cross-repository validation is reported separately.

Data Source

Zenodo (CERN)

1,328,100 records analyzed

Interpretation: Not directly measurable in Zenodo. Derivative works tracked via DataCite IsDerivedFrom relations. Validation found OR=3.0× per 10-point SHARE increase — the strongest evidence linking metadata quality to meaningful downstream reuse.

Cross-repository note: OpenAIRE tracks 73.4M records with cross-dataset relation graphs, enabling derivative tracking at scale.

Quantitative Evidence

Scoring Formula

derivative_datasets.length ≥ 1 → 4 pts

Contribution: 4 of 100 points · Reuse bucket (0–20)

With Signal Present

407

datasets (0.2%)

μ = 0.000 citations/dataset

Without Signal

183,465

datasets (99.8%)

μ = 0 (baseline)

Statistical Test

Logistic regression + t-test

p = < 0.001 · t/z = 7.43

Method: Logistic regression + t-test · Source: Zenodo + DataCite API (n = 183,872)

Note: OR = 3.0× per 10-point SHARE increase (p < 0.001, n = 76M+). Source datasets (n = 35) have mean deposit-time score 36.0 vs. 27.1 population (Cohen’s d = 1.36). Strongest evidence linking metadata quality to meaningful downstream reuse.

R — Reuse Bucket

All signals in this bucket:

R1: Discovery Metrics

R2: Access Metrics

R3: Formal Citations

R4: Derivative Works

R5: Community Engagement

← R3: Formal Citations All Signals R5: Community Engagement →