Formal Citations
Citation count in published literature
Justification
Dataset citations are the strongest evidence of scholarly reuse. Cross-repository validation demonstrated deposit-time SHARE signals predict reuse outcomes (OR=3.0× per 10-point increase), providing empirical evidence for the framework’s predictive validity.
Practical Guide
Track citations. 15% of Zenodo datasets achieve this — the ultimate reuse metric.
Dataset citations are the strongest evidence of scholarly reuse. Only 15% of Zenodo datasets have any citations, with cited datasets averaging 1.63 citations. Cross-repository validation (OR = 3.0× per 10-point increase, n = 76M+) proved that deposit-time SHARE signals predict reuse outcomes — this is the core evidence for the framework's validity.
For Repositories
- Track citations via DataCite Event Data, OpenCitations, or Crossref
- Display citation counts on dataset landing pages
- Provide citation export formats (BibTeX, RIS) for easy citing
For Depositors
- Include a suggested citation format in your dataset description
- Use your dataset DOI consistently in all publications that use the data
- Cite other datasets you use — this builds the citation ecosystem
Outcome metric — the gold standard for data reuse. Validates the predictive power of deposit-time signals (OR = 3.0×).
Standards Sources
Convergence score: 1/4 independent sources —
| Standard | Field / Property | Obligation Level |
|---|---|---|
| DataCite Event Data | Citation events | API |
| OpenCitations | Open citation data | Open dataset |
| Crossref | Reference linking | Standard |
FAIR Principle Alignment
Primary mapping: Outcome metric (not FAIR-derived)
This is an outcome metric not derived from FAIR principles. The R (Reuse) bucket intentionally measures realized impact rather than metadata quality, enabling validation that deposit-time signals predict downstream use.
How This Signal Is Measured
Citation count via DOI lookup. Binary for v1: any citations = 1.
Empirical Evidence (Zenodo, n=1.3M)
Per-signal statistics use Zenodo as the primary validation source because it is the largest general-purpose repository with structured DataCite metadata, natural variance across all 25 signals, and available citation/usage data. Domain-specific repositories exhibit ceiling effects or restricted variance that preclude per-signal discrimination. Cross-repository validation is reported separately.
Prevalence
15.0%
of Zenodo datasets
Data Source
Zenodo (CERN)
1,328,100 records analyzed
Interpretation: Only 15% of Zenodo datasets have any citations. Cited datasets average 1.63 citations. Cross-repository validation (OR=3.0×, n=76M+) demonstrated deposit-time SHARE signals predict reuse outcomes — the core evidence for framework validity.
Quantitative Evidence
Scoring Formula
log₁₀(citations + 1) × (4 / log₁₀(max_citations))
Contribution: 4 of 100 points · Reuse bucket (0–20)
With Signal Present
199,259
datasets (15.0%)
μ = 1.627 citations/dataset
Without Signal
1,128,841
datasets (85.0%)
μ = 0 (baseline)
Method: N/A — baseline = 0 by definition · Source: Zenodo (n = 1,328,100)
Note: 15% of Zenodo datasets have ≥1 citation (avg 1.63). Baseline = 0 by definition, making RR undefined. Core validation: deposit-time SHARE signals predict reuse outcomes (OR = 3.0× per 10-point increase, n = 76M+).
R — Reuse Bucket
All signals in this bucket: