Justification
Geographic metadata enables spatial discovery and contextual understanding of datasets. DataCite lists GeoLocation as a Recommended property, supporting coordinates, place names, and bounding boxes. schema.org includes spatialCoverage as a recommended property for Google Dataset Search ranking. Dublin Core’s Coverage element encompasses spatial topics. Together, these three independent standards converge on the importance of geographic context for dataset findability.
Practical Guide
Add location data. Essential for spatial datasets, optional otherwise.
Geographic metadata (coordinates, place names, bounding boxes) helps users discover datasets through spatial searches. Our data shows a 0.34x citation ratio — not because location data hurts, but because geo-tagged datasets serve specialized communities (ecology, geosciences) with lower citation norms. If your data has a geographic dimension, tag it.
Why this signal matters despite the numbers
The negative citation ratio (0.34x) reflects community size, not data quality. Geo-tagged datasets serve niche domains with fewer citers. Geographic metadata enables spatial discovery that citation counts don't capture.
For Repositories
- Add optional GeoLocation fields (lat/lon, place name, bounding box)
- Map to DataCite #18 GeoLocation or schema.org spatialCoverage
- Auto-suggest geographic enrichment for ecology and environmental datasets
For Depositors
- Include coordinates or place names if your data has a geographic component
- Use ISO 3166 country codes for standardized geographic tagging
- Add bounding boxes for region-level datasets
High value for spatial datasets, low prevalence (5.7%) means most general repositories can skip this.
Standards Sources
Convergence score: 4/4 independent sources —
| Standard | Field / Property | Obligation Level |
|---|---|---|
| DataCite 4.6 | #18 GeoLocation | Recommended |
| schema.org | spatialCoverage | Recommended |
| Dublin Core | Coverage (spatial) | Core Element |
FAIR Principle Alignment
Primary mapping: Findable (F2)
- F2: Data are described with rich metadata
RDA FAIR Data Maturity Model Indicators:
- RDA-F2-01M: Rich metadata is provided to allow discovery
How This Signal Is Measured
Presence of geographic coordinates (lat/lon), place names, country codes, or bounding boxes in dataset metadata. Binary: present or absent.
Empirical Evidence (Zenodo, n=1.3M)
Per-signal statistics use Zenodo as the primary validation source because it is the largest general-purpose repository with structured DataCite metadata, natural variance across all 25 signals, and available citation/usage data. Domain-specific repositories exhibit ceiling effects or restricted variance that preclude per-signal discrimination. Cross-repository validation is reported separately.
Prevalence
5.7%
of Zenodo datasets
Citation Lift
0.3x
vs. datasets without
Data Source
Zenodo (CERN)
1,328,100 records analyzed
Interpretation: Datasets with geographic context tend to serve specialized communities (ecology, geosciences) with lower citation norms. The negative lift reflects domain specificity, not lower quality. Geographic metadata enables spatial discovery that citations don't capture.
Quantitative Evidence
Scoring Formula
geographic_metadata ∈ record → 4 pts
Contribution: 4 of 100 points · Stewardship bucket (0–20)
With Signal Present
75,888
datasets (5.7%)
μ = 0.085 citations/dataset
Without Signal
1,252,212
datasets (94.3%)
μ = 0.254 citations/dataset
Rate Ratio
0.34
95% CI: [0.33–0.34]
P-value
< 0.001
z = -87.04
Significance
Method: Poisson rate ratio · Source: Zenodo (n = 1,328,100)
S — Stewardship Bucket
All signals in this bucket: