SHARE Score

About
Framework
/

Stewardship

/

S4

S4

Subject Classification

Controlled vocabulary terms (MeSH, LCSH, FOR codes)

Stewardship (S)
Findable (F2), Interoperable (I2)

Justification

Controlled vocabularies enable cross-repository discovery and semantic interoperability. DataCite Recommends the Subject property with support for classification schemes. schema.org includes keywords for Google Dataset Search. Dublin Core includes Subject as a core element. The RDA model specifically addresses vocabulary use in RDA-I2-01M (Important priority).

Practical Guide

must-have

Add keywords. 5.2x citation lift — the easiest high-impact action.

Subject keywords from controlled vocabularies are one of the simplest metadata fields to add and one of the most impactful. Datasets with keywords receive 5.2x more citations (RR = 5.23, p < 0.001). With 85% prevalence on Zenodo, this is already common — but the 15% without keywords are essentially invisible to search.

For Repositories

  • Make subject keywords a required or strongly prompted field
  • Provide controlled vocabulary suggestions (MeSH, LCSH, FOR codes)
  • Map to DataCite #6 Subject and schema.org keywords

For Depositors

  • Add at least 3-5 subject keywords from your field's standard vocabulary
  • Use controlled terms (MeSH for biomedical, LCSH for general) when possible
  • Include both broad and specific terms to maximize discoverability

Strongest positive signal in Stewardship bucket. Easy to implement, high impact, well-adopted (85.2%).

Standards Sources

Convergence score: 4/4 independent sources —

Strongly justified

StandardField / PropertyObligation Level
DataCite 4.6#6 Subject
Recommended
schema.orgkeywords
Recommended
Dublin CoreSubject
Core Element

FAIR Principle Alignment

Primary mapping: Findable (F2), Interoperable (I2)

  • F2: Data are described with rich metadata
  • I2: (Meta)data use vocabularies that follow FAIR principles

RDA FAIR Data Maturity Model Indicators:

  • RDA-I2-01M: Metadata uses FAIR-compliant vocabularies

How This Signal Is Measured

Presence of subject keywords, ideally from controlled vocabularies with scheme identifiers. Binary: at least one subject term present.

Empirical Evidence (Zenodo, n=1.3M)

Per-signal statistics use Zenodo as the primary validation source because it is the largest general-purpose repository with structured DataCite metadata, natural variance across all 25 signals, and available citation/usage data. Domain-specific repositories exhibit ceiling effects or restricted variance that preclude per-signal discrimination. Cross-repository validation is reported separately.

Prevalence

85.2%

of Zenodo datasets

Citation Lift

5.2x

vs. datasets without

Data Source

Zenodo (CERN)

1,328,100 records analyzed

Interpretation: Strong positive signal. Datasets with subject classification receive 5.2x more citations. Keywords enable cross-repository discovery — one of the highest-impact metadata fields.

Quantitative Evidence

Scoring Formula

subject_keywords.length ≥ 1 → 4 pts

Contribution: 4 of 100 points · Stewardship bucket (0–20)

With Signal Present

1,132,179

datasets (85.2%)

μ = 0.277 citations/dataset

Without Signal

195,921

datasets (14.8%)

μ = 0.053 citations/dataset

Rate Ratio

5.23

95% CI: [5.135.33]

P-value

< 0.001

z = 165.79

Significance

Positive association

Method: Poisson rate ratio · Source: Zenodo (n = 1,328,100)

ShareScore