Related Data
Linked datasets, software, or code repositories
Justification
Cross-dataset and data-software links enable integration and computational reproducibility. Four sources converge on this signal.
Practical Guide
Link related datasets and code. Builds the knowledge ecosystem.
Cross-dataset and data-software links enable integration and computational reproducibility. Datasets in connected knowledge graphs receive 16.5x more citations (RR = 16.46, p < 0.001) — the same as publication links because Zenodo captures both in the same field. The real value is ecosystem connectivity: linked datasets are discoverable through multiple entry points.
For Repositories
- Support linked dataset fields (IsPartOf, HasPart, IsSupplementTo)
- Enable linking to code repositories (GitHub, GitLab)
- Map to DataCite #12 RelatedIdentifier with rich relation types
For Depositors
- Link to related datasets, especially if your data is part of a collection
- Link to code repositories (GitHub) used to process or analyze the data
- Use HasPart/IsPartOf for multi-file datasets spanning multiple records
Same 16.5x lift as E1 (shared field in Zenodo). Four standards converge. Builds knowledge ecosystem beyond publication links.
Standards Sources
Convergence score: 4/4 independent sources —
| Standard | Field / Property | Obligation Level |
|---|---|---|
| DataCite 4.6 | #12 RelatedIdentifier (IsPartOf, HasPart) | Recommended |
| Dublin Core | Relation | Core Element |
| schema.org | hasPart / isPartOf | Recommended |
FAIR Principle Alignment
Primary mapping: Interoperable (I3)
- I3: (Meta)data include qualified references to other (meta)data
RDA FAIR Data Maturity Model Indicators:
- RDA-I3-01D: Data includes references to other data
How This Signal Is Measured
Presence of related dataset DOIs or code repository URLs. Binary: at least one linked.
Empirical Evidence (Zenodo, n=1.3M)
Per-signal statistics use Zenodo as the primary validation source because it is the largest general-purpose repository with structured DataCite metadata, natural variance across all 25 signals, and available citation/usage data. Domain-specific repositories exhibit ceiling effects or restricted variance that preclude per-signal discrimination. Cross-repository validation is reported separately.
Prevalence
54.3%
of Zenodo datasets
Citation Lift
16.2x
vs. datasets without
Data Source
Zenodo (CERN)
1,328,100 records analyzed
Interpretation: Same prevalence as E1 because Zenodo related_identifiers field captures both publication and data links. Datasets in connected knowledge graphs receive dramatically more citations, confirming that ecosystem connectivity drives reuse.
Quantitative Evidence
Scoring Formula
related_dataset_doi ∈ record → 4 pts
Contribution: 4 of 100 points · Engagement bucket (0–20)
With Signal Present
720,512
datasets (54.3%)
μ = 0.428 citations/dataset
Without Signal
607,588
datasets (45.7%)
μ = 0.026 citations/dataset
Rate Ratio
16.46
95% CI: [16.20–16.73]
P-value
< 0.001
z = 343.37
Significance
Method: Poisson rate ratio · Source: Zenodo (n = 1,328,100)
Note: Same prevalence as E1: Zenodo’s related_identifiers field captures both publication and data links.
E — Engagement Bucket
All signals in this bucket: