SHARE Framework
The constitutional principles behind the SHARE data sharing metric. How we enable fair, cross-repository comparison of research datasets.
Version 2.0 | Conduct Science, Inc. | NIH Data Sharing Index Challenge Phase 2
25 Universal Signals
The signals, scoring formula, FAIR crosswalk, and scoring tiers.
Researcher-Level Metric
H-index analogy for data sharing. Rewards quantity and quality.
Citation Impact Metric
Measures how much datasets are cited and reused by the community.
The Problem We Solve
Research data repositories differ fundamentally in their metadata schemas. Zenodo supports 15+ optional fields; Dryad has 8; domain-specific repositories vary widely. This creates a critical challenge: how do you fairly compare datasets across repositories with different metadata capabilities?
The SHARE Framework defines 25 universal signals derived from FAIR principles, lets each repository map their fields to these signals via a public "pledge," then scores all datasets against the same standard using a fixed denominator of 25.
Three-Layer Architecture
The SHARE Framework consists of three layers that work together to enable cross-repository scoring:
LAYER 1
Universal Signal Vocabulary
25 signals derived from FAIR principles, published as an open standard. Defines what metadata elements are measured across all repositories.
LAYER 2
Repository Pledge
Each repository publishes a JSON mapping from their metadata fields to universal signals. Self-service onboarding with community validation.
LAYER 3
Scoring Engine
Applies pledges to dataset metadata. For each signal, checks if the mapped field contains valid data. Aggregates results into bucket scores.
Key Design Principles
Fixed Denominator
Score = (signals present / 25) × 100. The denominator is always 25, regardless of how many signals a repository supports. This prevents gaming by declaring signals "not applicable."
Unsupported = Zero
If a repository doesn't support a signal, it scores 0. The denominator stays 25. This accurately reflects metadata capabilities and incentivizes repositories to support more signals.
Same Effort = Same Score
Filling 2 signals in any repository gives identical points. Eliminates repository shopping incentive.
Transparent Ceilings
Each repository's maximum possible score is documented. Researchers know what's achievable per repository.
Self-Service Onboarding
Repositories join by publishing a pledge. Community validation ensures accuracy without bottlenecks.
Federated Governance
Pull request model for pledges. Repository attestation required for official status. Change control gates protect production scoring.
Design Choices
Why 25 signals (5 per bucket)?
The 5-per-bucket structure is a pragmatic design choice for clarity and communication, not a standards-derived requirement. Standards don't suggest equal distribution across FAIR principles. We chose equal bucket sizes as the least assumption-laden approach: in the absence of evidence favoring specific weights, equal weighting is the most defensible default.
Why separate outcome metrics (R bucket)?
The R (Reuse) bucket intentionally measures outcomes (views, downloads, citations) rather than metadata quality. This separation is SHARE's key innovation: it enables validation that deposit-time metadata quality (S+H+A+E) actually predicts downstream reuse. The Zenodo validation (OR=5.73× for derivative creation, 95% CI: 4.97–6.61) confirms this predictive relationship.
System-level vs. depositor-level signals
SHARE scores measure what depositors can control. System-assigned properties (DOIs, timestamps, API availability) are excluded from dataset scoring and instead handled by the repository pledge system. This ensures the Constitutional Rule: “No depositor work = no score.”
Governance & Integrity
The framework includes robust governance mechanisms to ensure accuracy and prevent gaming:
Pledges follow a lifecycle: Community-Assessed (built from public documentation) → Self-Attested (repository confirms mappings) → Verified (independently audited, used for production scoring).
- Community Review: PRs are reviewed for semantic accuracy before merging
- Repository Attestation: DNS TXT, .well-known URL, or cryptographic signature proves ownership
- Version Control: All changes tracked in Git with 7-day propagation delay for breaking changes
Anti-Gaming Design
- Fixed denominator (/25) — Cannot inflate score by declaring signals “not applicable”
- Unsupported = 0 — Missing signals count against score, not removed
- Outcome signals separate — R bucket distinct from deposit-time signals
- Ratchet protection — Scores never decrease: new_score = max(current, recalculated)
Benefits for All Stakeholders
Explore the SHARE Framework
Dive into the signals, view scored datasets, or register your repository.