Subject Classification

Controlled vocabulary terms (MeSH, LCSH, FOR codes)

Stewardship (S)

Findable (F2), Interoperable (I2)

Justification

Controlled vocabularies enable cross-repository discovery and semantic interoperability. DataCite Recommends the Subject property with support for classification schemes. schema.org includes keywords for Google Dataset Search. Dublin Core includes Subject as a core element. The RDA model specifically addresses vocabulary use in RDA-I2-01M (Important priority).

Practical Guide

must-have

Add keywords. 5.2x citation lift — the easiest high-impact action.

Subject keywords from controlled vocabularies are one of the simplest metadata fields to add and one of the most impactful. Datasets with keywords receive 5.2x more citations (RR = 5.23, p < 0.001). With 85% prevalence on Zenodo, this is already common — but the 15% without keywords are essentially invisible to search.

For Repositories

Make subject keywords a required or strongly prompted field
Provide controlled vocabulary suggestions (MeSH, LCSH, FOR codes)
Map to DataCite #6 Subject and schema.org keywords

For Depositors

Add at least 3-5 subject keywords from your field's standard vocabulary
Use controlled terms (MeSH for biomedical, LCSH for general) when possible
Include both broad and specific terms to maximize discoverability

Strongest positive signal in Stewardship bucket. Easy to implement, high impact, well-adopted (85.2%).

Standards Sources

Convergence score: 4/4 independent sources —

Strongly justified

Standard	Field / Property	Obligation Level
DataCite 4.6	#6 Subject	Recommended
schema.org	keywords	Recommended
Dublin Core	Subject	Core Element

FAIR Principle Alignment

Primary mapping: Findable (F2), Interoperable (I2)

F2: Data are described with rich metadata
I2: (Meta)data use vocabularies that follow FAIR principles

RDA FAIR Data Maturity Model Indicators:

RDA-I2-01M: Metadata uses FAIR-compliant vocabularies

How This Signal Is Measured

Presence of subject keywords, ideally from controlled vocabularies with scheme identifiers. Binary: at least one subject term present.

Empirical Evidence (Zenodo, n=1.3M)

Per-signal statistics use Zenodo as the primary validation source because it is the largest general-purpose repository with structured DataCite metadata, natural variance across all 25 signals, and available citation/usage data. Domain-specific repositories exhibit ceiling effects or restricted variance that preclude per-signal discrimination. Cross-repository validation is reported separately.

Prevalence

85.2%

of Zenodo datasets

Citation Lift

5.2x

vs. datasets without

Data Source

Zenodo (CERN)

1,328,100 records analyzed

Interpretation: Strong positive signal. Datasets with subject classification receive 5.2x more citations. Keywords enable cross-repository discovery — one of the highest-impact metadata fields.