Description Quality
Substantive abstract (100+ characters)
Justification
An informative abstract is the single most impactful metadata field for discovery. DataCite explicitly calls Description "the most important recommended property." Google REQUIRES description for Dataset Search indexing. Dublin Core includes Description as a core element. RDA-F2-01M (Essential) requires rich metadata for discovery.
Practical Guide
Write a real description. 6x citation lift — the single most important field.
DataCite calls description "the most important recommended property" — and our data confirms it. Datasets with substantive descriptions (100+ characters) receive 6x more citations (RR = 6.00, p < 0.001). With 67% prevalence on Zenodo, a third of datasets still have inadequate descriptions. Google REQUIRES description for Dataset Search indexing. This is the highest-impact action any depositor can take.
For Repositories
- Set minimum description length (100+ characters) with a quality prompt
- Display character count and quality indicator during deposit
- Map to DataCite #17 Description and schema.org description (required)
For Depositors
- Write at least 2-3 sentences describing what the dataset contains and how it was collected
- Explain the scientific context and potential uses of the data
- Include key parameters, sample sizes, and methodology summary in the description
Strongest positive deposit-time signal (6x lift). Required by Google for search indexing. Called "most important" by DataCite.
Standards Sources
Convergence score: 4/4 independent sources —
| Standard | Field / Property | Obligation Level |
|---|---|---|
| DataCite 4.6 | #17 Description | Recommended ("most important") |
| schema.org | description | Required |
| Dublin Core | Description | Core Element |
FAIR Principle Alignment
Primary mapping: Findable (F2)
- F2: Data are described with rich metadata
RDA FAIR Data Maturity Model Indicators:
- RDA-F2-01M: Rich metadata is provided to allow discovery
How This Signal Is Measured
Character count of description field. Threshold: 100+ characters of substantive content. Binary with quality gate.
Empirical Evidence (Zenodo, n=1.3M)
Per-signal statistics use Zenodo as the primary validation source because it is the largest general-purpose repository with structured DataCite metadata, natural variance across all 25 signals, and available citation/usage data. Domain-specific repositories exhibit ceiling effects or restricted variance that preclude per-signal discrimination. Cross-repository validation is reported separately.
Prevalence
67.2%
of Zenodo datasets
Citation Lift
6.0x
vs. datasets without
Data Source
Zenodo (CERN)
1,328,100 records analyzed
Interpretation: The strongest positive signal. Datasets with substantive descriptions (100+ chars) receive 6x more citations. DataCite calls this 'the most important recommended property' — our data confirms it.
Quantitative Evidence
Scoring Formula
description.length ≥ 500 → 4 pts
Contribution: 4 of 100 points · Harmonization bucket (0–20)
With Signal Present
892,754
datasets (67.2%)
μ = 0.336 citations/dataset
Without Signal
435,346
datasets (32.8%)
μ = 0.056 citations/dataset
Rate Ratio
6.00
95% CI: [5.92–6.08]
P-value
< 0.001
z = 269.04
Significance
Method: Poisson rate ratio · Source: Zenodo (n = 1,328,100)
H — Harmonization Bucket
All signals in this bucket: