Description Quality

Substantive abstract (100+ characters)

Harmonization (H)

Findable (F2)

Justification

An informative abstract is the single most impactful metadata field for discovery. DataCite explicitly calls Description "the most important recommended property." Google REQUIRES description for Dataset Search indexing. Dublin Core includes Description as a core element. RDA-F2-01M (Essential) requires rich metadata for discovery.

Practical Guide

must-have

Write a real description. 6x citation lift — the single most important field.

DataCite calls description "the most important recommended property" — and our data confirms it. Datasets with substantive descriptions (100+ characters) receive 6x more citations (RR = 6.00, p < 0.001). With 67% prevalence on Zenodo, a third of datasets still have inadequate descriptions. Google REQUIRES description for Dataset Search indexing. This is the highest-impact action any depositor can take.

For Repositories

Set minimum description length (100+ characters) with a quality prompt
Display character count and quality indicator during deposit
Map to DataCite #17 Description and schema.org description (required)

For Depositors

Write at least 2-3 sentences describing what the dataset contains and how it was collected
Explain the scientific context and potential uses of the data
Include key parameters, sample sizes, and methodology summary in the description

Strongest positive deposit-time signal (6x lift). Required by Google for search indexing. Called "most important" by DataCite.

Standards Sources

Convergence score: 4/4 independent sources —

Strongly justified

Standard	Field / Property	Obligation Level
DataCite 4.6	#17 Description	Recommended ("most important")
schema.org	description	Required
Dublin Core	Description	Core Element

FAIR Principle Alignment

Primary mapping: Findable (F2)

F2: Data are described with rich metadata

RDA FAIR Data Maturity Model Indicators:

RDA-F2-01M: Rich metadata is provided to allow discovery

How This Signal Is Measured

Character count of description field. Threshold: 100+ characters of substantive content. Binary with quality gate.

Empirical Evidence (Zenodo, n=1.3M)

Per-signal statistics use Zenodo as the primary validation source because it is the largest general-purpose repository with structured DataCite metadata, natural variance across all 25 signals, and available citation/usage data. Domain-specific repositories exhibit ceiling effects or restricted variance that preclude per-signal discrimination. Cross-repository validation is reported separately.

Prevalence

67.2%

of Zenodo datasets

Citation Lift

6.0x

vs. datasets without

Data Source

Zenodo (CERN)

1,328,100 records analyzed

Interpretation: The strongest positive signal. Datasets with substantive descriptions (100+ chars) receive 6x more citations. DataCite calls this 'the most important recommended property' — our data confirms it.