SHARE Score

About
Framework
/

Harmonization

/

H5

H5

Description Quality

Substantive abstract (100+ characters)

Harmonization (H)
Findable (F2)

Justification

An informative abstract is the single most impactful metadata field for discovery. DataCite explicitly calls Description "the most important recommended property." Google REQUIRES description for Dataset Search indexing. Dublin Core includes Description as a core element. RDA-F2-01M (Essential) requires rich metadata for discovery.

Practical Guide

must-have

Write a real description. 6x citation lift — the single most important field.

DataCite calls description "the most important recommended property" — and our data confirms it. Datasets with substantive descriptions (100+ characters) receive 6x more citations (RR = 6.00, p < 0.001). With 67% prevalence on Zenodo, a third of datasets still have inadequate descriptions. Google REQUIRES description for Dataset Search indexing. This is the highest-impact action any depositor can take.

For Repositories

  • Set minimum description length (100+ characters) with a quality prompt
  • Display character count and quality indicator during deposit
  • Map to DataCite #17 Description and schema.org description (required)

For Depositors

  • Write at least 2-3 sentences describing what the dataset contains and how it was collected
  • Explain the scientific context and potential uses of the data
  • Include key parameters, sample sizes, and methodology summary in the description

Strongest positive deposit-time signal (6x lift). Required by Google for search indexing. Called "most important" by DataCite.

Standards Sources

Convergence score: 4/4 independent sources —

Strongly justified

StandardField / PropertyObligation Level
DataCite 4.6#17 Description
Recommended ("most important")
schema.orgdescription
Required
Dublin CoreDescription
Core Element

FAIR Principle Alignment

Primary mapping: Findable (F2)

  • F2: Data are described with rich metadata

RDA FAIR Data Maturity Model Indicators:

  • RDA-F2-01M: Rich metadata is provided to allow discovery

How This Signal Is Measured

Character count of description field. Threshold: 100+ characters of substantive content. Binary with quality gate.

Empirical Evidence (Zenodo, n=1.3M)

Per-signal statistics use Zenodo as the primary validation source because it is the largest general-purpose repository with structured DataCite metadata, natural variance across all 25 signals, and available citation/usage data. Domain-specific repositories exhibit ceiling effects or restricted variance that preclude per-signal discrimination. Cross-repository validation is reported separately.

Prevalence

67.2%

of Zenodo datasets

Citation Lift

6.0x

vs. datasets without

Data Source

Zenodo (CERN)

1,328,100 records analyzed

Interpretation: The strongest positive signal. Datasets with substantive descriptions (100+ chars) receive 6x more citations. DataCite calls this 'the most important recommended property' — our data confirms it.

Quantitative Evidence

Scoring Formula

description.length ≥ 500 → 4 pts

Contribution: 4 of 100 points · Harmonization bucket (0–20)

With Signal Present

892,754

datasets (67.2%)

μ = 0.336 citations/dataset

Without Signal

435,346

datasets (32.8%)

μ = 0.056 citations/dataset

Rate Ratio

6.00

95% CI: [5.926.08]

P-value

< 0.001

z = 269.04

Significance

Positive association

Method: Poisson rate ratio · Source: Zenodo (n = 1,328,100)

ShareScore