SHARE Score

About
Framework
/

Reuse

/

R4

R4

Derivative Works

Datasets or analyses built upon this data

Reuse (R)
Outcome metric (not FAIR-derived)
Outcome Metric

Justification

Derivative datasets demonstrate reuse beyond citation. Validation found OR=3.0× per 10-point increase across 76M+ datasets, stable across years, the strongest evidence that metadata quality predicts meaningful reuse.

Practical Guide

outcome

Track derivatives. OR = 3.0× per 10-point SHARE increase — strongest quality link.

Derivative datasets — new datasets or analyses built upon existing data — are the strongest evidence linking metadata quality to meaningful reuse. Our validation found OR = 3.0× per 10-point SHARE score increase (p < 0.001, n = 76M+ across 7 repositories). Source datasets have a mean deposit-time score of 36.0 vs. 27.1 for the population (Cohen's d = 1.36). This is SHARE's most compelling evidence that metadata quality drives downstream value.

For Repositories

  • Track IsDerivedFrom relationships via DataCite RelatedIdentifier
  • Display derivative counts on dataset landing pages
  • Enable depositors to link their datasets to source datasets

For Depositors

  • When building on existing data, use IsDerivedFrom to credit the source dataset
  • Track how many derivative works your data has generated
  • Derivative works are the strongest evidence your data is useful — highlight them

Outcome metric with the strongest quality-to-reuse evidence. OR = 3.0× is the most compelling validation of the SHARE framework.

Standards Sources

Convergence score: 2/4 independent sources —

Adequately justified

StandardField / PropertyObligation Level
DataCite 4.6#12 RelatedIdentifier (IsDerivedFrom)
Recommended
OpenAIRERelated dataset tracking
Aggregation

FAIR Principle Alignment

Primary mapping: Outcome metric (not FAIR-derived)

This is an outcome metric not derived from FAIR principles. The R (Reuse) bucket intentionally measures realized impact rather than metadata quality, enabling validation that deposit-time signals predict downstream use.

How This Signal Is Measured

Count of datasets citing this as source via IsDerivedFrom. Binary for v1: any derivatives = 1.

Empirical Evidence (Zenodo, n=1.3M)

Per-signal statistics use Zenodo as the primary validation source because it is the largest general-purpose repository with structured DataCite metadata, natural variance across all 25 signals, and available citation/usage data. Domain-specific repositories exhibit ceiling effects or restricted variance that preclude per-signal discrimination. Cross-repository validation is reported separately.

Data Source

Zenodo (CERN)

1,328,100 records analyzed

Interpretation: Not directly measurable in Zenodo. Derivative works tracked via DataCite IsDerivedFrom relations. Validation found OR=3.0× per 10-point SHARE increase — the strongest evidence linking metadata quality to meaningful downstream reuse.

Cross-repository note: OpenAIRE tracks 73.4M records with cross-dataset relation graphs, enabling derivative tracking at scale.

Quantitative Evidence

Scoring Formula

derivative_datasets.length ≥ 1 → 4 pts

Contribution: 4 of 100 points · Reuse bucket (0–20)

With Signal Present

407

datasets (0.2%)

μ = 0.000 citations/dataset

Without Signal

183,465

datasets (99.8%)

μ = 0 (baseline)

Statistical Test

Logistic regression + t-test

p = < 0.001 · t/z = 7.43

Method: Logistic regression + t-test · Source: Zenodo + DataCite API (n = 183,872)

Note: OR = 3.0× per 10-point SHARE increase (p < 0.001, n = 76M+). Source datasets (n = 35) have mean deposit-time score 36.0 vs. 27.1 population (Cohen’s d = 1.36). Strongest evidence linking metadata quality to meaningful downstream reuse.

ShareScore