Formal Citations

Citation count in published literature

Reuse (R)

Outcome metric (not FAIR-derived)

Outcome Metric

Justification

Dataset citations are the strongest evidence of scholarly reuse. Cross-repository validation demonstrated deposit-time SHARE signals predict reuse outcomes (OR=3.0× per 10-point increase), providing empirical evidence for the framework’s predictive validity.

Practical Guide

outcome

Track citations. 15% of Zenodo datasets achieve this — the ultimate reuse metric.

Dataset citations are the strongest evidence of scholarly reuse. Only 15% of Zenodo datasets have any citations, with cited datasets averaging 1.63 citations. Cross-repository validation (OR = 3.0× per 10-point increase, n = 76M+) proved that deposit-time SHARE signals predict reuse outcomes — this is the core evidence for the framework's validity.

For Repositories

Track citations via DataCite Event Data, OpenCitations, or Crossref
Display citation counts on dataset landing pages
Provide citation export formats (BibTeX, RIS) for easy citing

For Depositors

Include a suggested citation format in your dataset description
Use your dataset DOI consistently in all publications that use the data
Cite other datasets you use — this builds the citation ecosystem

Outcome metric — the gold standard for data reuse. Validates the predictive power of deposit-time signals (OR = 3.0×).

Standards Sources

Convergence score: 1/4 independent sources —

Bibliometric

Standard	Field / Property	Obligation Level
DataCite Event Data	Citation events	API
OpenCitations	Open citation data	Open dataset
Crossref	Reference linking	Standard

FAIR Principle Alignment

Primary mapping: Outcome metric (not FAIR-derived)

This is an outcome metric not derived from FAIR principles. The R (Reuse) bucket intentionally measures realized impact rather than metadata quality, enabling validation that deposit-time signals predict downstream use.

How This Signal Is Measured

Citation count via DOI lookup. Binary for v1: any citations = 1.

Empirical Evidence (Zenodo, n=1.3M)

Per-signal statistics use Zenodo as the primary validation source because it is the largest general-purpose repository with structured DataCite metadata, natural variance across all 25 signals, and available citation/usage data. Domain-specific repositories exhibit ceiling effects or restricted variance that preclude per-signal discrimination. Cross-repository validation is reported separately.

Prevalence

15.0%

of Zenodo datasets

Data Source

Zenodo (CERN)

1,328,100 records analyzed

Interpretation: Only 15% of Zenodo datasets have any citations. Cited datasets average 1.63 citations. Cross-repository validation (OR=3.0×, n=76M+) demonstrated deposit-time SHARE signals predict reuse outcomes — the core evidence for framework validity.

Quantitative Evidence

Scoring Formula

log₁₀(citations + 1) × (4 / log₁₀(max_citations))

Contribution: 4 of 100 points · Reuse bucket (0–20)

With Signal Present

199,259

datasets (15.0%)

μ = 1.627 citations/dataset

Without Signal

1,128,841

datasets (85.0%)

μ = 0 (baseline)

Method: N/A — baseline = 0 by definition · Source: Zenodo (n = 1,328,100)

Note: 15% of Zenodo datasets have ≥1 citation (avg 1.63). Baseline = 0 by definition, making RR undefined. Core validation: deposit-time SHARE signals predict reuse outcomes (OR = 3.0× per 10-point increase, n = 76M+).

R — Reuse Bucket

All signals in this bucket:

R1: Discovery Metrics

R2: Access Metrics

R3: Formal Citations

R4: Derivative Works

R5: Community Engagement

← R2: Access Metrics All Signals R4: Derivative Works →