Version Tracking
Semantic versioning or version history links
Justification
Version information enables reproducibility by allowing citation of specific data states. Three sources converge.
Practical Guide
Version your data. Enables reproducibility by citing specific data states.
Version tracking enables reproducibility by allowing citation of specific data states. The 0.28x citation ratio reflects citation fragmentation — multi-version datasets split citations across versions. With 8.3% prevalence on Zenodo, versioning is uncommon but growing. SHARE values this as a stewardship commitment signal, not a citation predictor.
Why this signal matters despite the numbers
The 0.28x citation ratio reflects citation fragmentation across versions, not lower quality. Versioned datasets enable reproducibility by letting users cite exact data states — a value that single-citation metrics undercount.
For Repositories
- Support semantic versioning or IsNewVersionOf linking
- Display version history on dataset landing pages
- Map to DataCite #15 Version and schema.org version
For Depositors
- Use version numbers when updating datasets (v1.0, v2.0)
- Link new versions to previous ones using IsNewVersionOf
- Document what changed in each version to help users choose the right one
Reproducibility value is clear. Negative citation ratio reflects fragmentation, not quality. Low prevalence (8.3%).
Standards Sources
Convergence score: 3/4 independent sources —
| Standard | Field / Property | Obligation Level |
|---|---|---|
| DataCite 4.6 | #15 Version | Optional |
| schema.org | version | Recommended |
| RDA FAIR | RDA-R1.2-01M | Important |
FAIR Principle Alignment
Primary mapping: Reusable (R1.2)
- R1.2: (Meta)data are associated with detailed provenance
RDA FAIR Data Maturity Model Indicators:
- RDA-R1.2-01M: Metadata includes provenance information according to community-specific standards
How This Signal Is Measured
Presence of version number or IsNewVersionOf relation. Binary: version info present or absent.
Empirical Evidence (Zenodo, n=1.3M)
Per-signal statistics use Zenodo as the primary validation source because it is the largest general-purpose repository with structured DataCite metadata, natural variance across all 25 signals, and available citation/usage data. Domain-specific repositories exhibit ceiling effects or restricted variance that preclude per-signal discrimination. Cross-repository validation is reported separately.
Prevalence
8.3%
of Zenodo datasets
Citation Lift
0.3x
vs. datasets without
Data Source
Zenodo (CERN)
1,328,100 records analyzed
Interpretation: Version tracking shows negative naive citation lift. Multi-version datasets tend to be living datasets that fragment citations across versions. The signal measures data stewardship commitment, not citation optimization — versioned datasets enable reproducibility by allowing citation of specific data states.
Quantitative Evidence
Scoring Formula
version_number ∈ record → 4 pts
Contribution: 4 of 100 points · Engagement bucket (0–20)
With Signal Present
110,091
datasets (8.3%)
μ = 0.073 citations/dataset
Without Signal
1,218,009
datasets (91.7%)
μ = 0.260 citations/dataset
Rate Ratio
0.28
95% CI: [0.28–0.29]
P-value
< 0.001
z = -112.45
Significance
Method: Poisson rate ratio · Source: Zenodo (n = 1,328,100)
Note: Negative lift reflects citation fragmentation across versions. Versioned datasets enable reproducibility — measuring stewardship commitment, not citation optimization.
E — Engagement Bucket
All signals in this bucket: