SHARE Score

About

SHARE Framework

The constitutional principles behind the SHARE data sharing metric. How we enable fair, cross-repository comparison of research datasets.

Version 2.0 | Conduct Science, Inc. | NIH Data Sharing Index Challenge Phase 2

The Problem We Solve

Research data repositories differ fundamentally in their metadata schemas. Zenodo supports 15+ optional fields; Dryad has 8; domain-specific repositories vary widely. This creates a critical challenge: how do you fairly compare datasets across repositories with different metadata capabilities?

The SHARE Framework defines 25 universal signals derived from FAIR principles, lets each repository map their fields to these signals via a public "pledge," then scores all datasets against the same standard using a fixed denominator of 25.

Three-Layer Architecture

The SHARE Framework consists of three layers that work together to enable cross-repository scoring:

LAYER 1

Universal Signal Vocabulary

25 signals derived from FAIR principles, published as an open standard. Defines what metadata elements are measured across all repositories.

LAYER 2

Repository Pledge

Each repository publishes a JSON mapping from their metadata fields to universal signals. Self-service onboarding with community validation.

LAYER 3

Scoring Engine

Applies pledges to dataset metadata. For each signal, checks if the mapped field contains valid data. Aggregates results into bucket scores.

Key Design Principles

Fixed Denominator

Score = (signals present / 25) × 100. The denominator is always 25, regardless of how many signals a repository supports. This prevents gaming by declaring signals "not applicable."

Unsupported = Zero

If a repository doesn't support a signal, it scores 0. The denominator stays 25. This accurately reflects metadata capabilities and incentivizes repositories to support more signals.

Same Effort = Same Score

Filling 2 signals in any repository gives identical points. Eliminates repository shopping incentive.

Transparent Ceilings

Each repository's maximum possible score is documented. Researchers know what's achievable per repository.

Self-Service Onboarding

Repositories join by publishing a pledge. Community validation ensures accuracy without bottlenecks.

Federated Governance

Pull request model for pledges. Repository attestation required for official status. Change control gates protect production scoring.

Design Choices

Why 25 signals (5 per bucket)?

The 5-per-bucket structure is a pragmatic design choice for clarity and communication, not a standards-derived requirement. Standards don't suggest equal distribution across FAIR principles. We chose equal bucket sizes as the least assumption-laden approach: in the absence of evidence favoring specific weights, equal weighting is the most defensible default.

Why separate outcome metrics (R bucket)?

The R (Reuse) bucket intentionally measures outcomes (views, downloads, citations) rather than metadata quality. This separation is SHARE's key innovation: it enables validation that deposit-time metadata quality (S+H+A+E) actually predicts downstream reuse. The Zenodo validation (OR=5.73× for derivative creation, 95% CI: 4.97–6.61) confirms this predictive relationship.

System-level vs. depositor-level signals

SHARE scores measure what depositors can control. System-assigned properties (DOIs, timestamps, API availability) are excluded from dataset scoring and instead handled by the repository pledge system. This ensures the Constitutional Rule: “No depositor work = no score.”

Governance & Integrity

The framework includes robust governance mechanisms to ensure accuracy and prevent gaming:

MechanismDescription
Pledge Status LevelsCommunity-Assessed → Self-Attested → Verified → Deprecated. Only Verified pledges (attested by repository) are used for production scoring.
Repository AttestationCryptographic signature, DNS verification, or well-known URL confirmation ensures repositories own their mappings.
Third-Party AuditIndependent verification before Official status. Checks semantic accuracy, distribution plausibility, and gaming resistance.
Semantic ValidationAutomated tests verify mappings behave as claimed. Format validation, distribution checks, and sample dataset tests.
Change Control GatesTagged releases only, 7-day propagation delay, and impact assessment for all pledge changes.
Vocabulary VersioningExplicit versioning allows vocabulary evolution without breaking historical score comparability.

Pledges follow a lifecycle: Community-Assessed (built from public documentation) → Self-Attested (repository confirms mappings) → Verified (independently audited, used for production scoring).

  • Community Review: PRs are reviewed for semantic accuracy before merging
  • Repository Attestation: DNS TXT, .well-known URL, or cryptographic signature proves ownership
  • Version Control: All changes tracked in Git with 7-day propagation delay for breaking changes

Anti-Gaming Design

  • Fixed denominator (/25) — Cannot inflate score by declaring signals “not applicable”
  • Unsupported = 0 — Missing signals count against score, not removed
  • Outcome signals separate — R bucket distinct from deposit-time signals
  • Ratchet protection — Scores never decrease: new_score = max(current, recalculated)

Benefits for All Stakeholders

StakeholderBenefit
ResearchersProvide complete, high-quality metadata to maximize score. Repository choice based on data fit, not score optimization.
RepositoriesAdd support for more universal signals to increase ceiling and attract depositors. Compete on metadata richness.
FundersComparable scores across repositories enable meaningful compliance tracking and policy evaluation.
InstitutionsFair cross-repository comparison enables meaningful researcher profiles (S-Index) and benchmarking.

Explore the SHARE Framework

Dive into the signals, view scored datasets, or register your repository.