Contributor Roles
Editors, curators, data collectors beyond the creator
Justification
Beyond the primary creator, contributors such as curators, data collectors, and supervisors provide critical provenance context. DataCite Recommends the Contributor property with a rich contributorType vocabulary (DataCurator, DataCollector, Editor, etc.). Dublin Core includes Contributor as one of its 15 core elements. Contributor roles also address RDA-R1.2 provenance requirements.
Practical Guide
List all contributors. Shows team effort and provenance.
Listing contributors beyond the primary creator — curators, data collectors, supervisors — provides critical provenance. The 0.07x citation ratio reflects that multi-contributor datasets tend to be complex, specialized resources. SHARE values this for provenance completeness: knowing who curated, collected, and quality-checked the data builds trust.
Why this signal matters despite the numbers
The 0.07x citation ratio reflects complexity, not poor documentation. Multi-contributor datasets are often large, specialized resources that are harder to reuse but represent higher-quality data stewardship.
For Repositories
- Support contributor fields with role types (DataCurator, DataCollector, Editor)
- Map to DataCite #7 Contributor with contributorType vocabulary
- Prompt depositors to credit all team members
For Depositors
- Credit curators, data collectors, and supervisors — not just PI names
- Specify roles (DataCurator, DataCollector, Editor) when the repository supports it
- Include at least one non-creator contributor if applicable
Adds provenance depth. Low prevalence (8.1%) means most depositors skip it, but it differentiates well-documented datasets.
Standards Sources
Convergence score: 3/4 independent sources —
| Standard | Field / Property | Obligation Level |
|---|---|---|
| DataCite 4.6 | #7 Contributor | Recommended |
| Dublin Core | Contributor | Core Element |
FAIR Principle Alignment
Primary mapping: Findable (F2), Reusable (R1.2)
- F2: Data are described with rich metadata
- R1.2: (Meta)data are associated with detailed provenance
RDA FAIR Data Maturity Model Indicators:
- RDA-R1.2-01M: Metadata includes provenance information according to community-specific standards
How This Signal Is Measured
Presence of contributor entries beyond the primary creator, ideally with role types specified. Binary: at least one non-creator contributor listed.
Empirical Evidence (Zenodo, n=1.3M)
Per-signal statistics use Zenodo as the primary validation source because it is the largest general-purpose repository with structured DataCite metadata, natural variance across all 25 signals, and available citation/usage data. Domain-specific repositories exhibit ceiling effects or restricted variance that preclude per-signal discrimination. Cross-repository validation is reported separately.
Prevalence
8.1%
of Zenodo datasets
Citation Lift
0.1x
vs. datasets without
Data Source
Zenodo (CERN)
1,328,100 records analyzed
Interpretation: Complex datasets requiring multiple contributors (curators, data collectors) may be harder to reuse but represent higher-quality data stewardship. The negative lift reflects reuse complexity, not documentation quality.
Quantitative Evidence
Scoring Formula
non_creator_contributors ≥ 1 → 4 pts
Contribution: 4 of 100 points · Stewardship bucket (0–20)
With Signal Present
107,672
datasets (8.1%)
μ = 0.018 citations/dataset
Without Signal
1,220,428
datasets (91.9%)
μ = 0.264 citations/dataset
Rate Ratio
0.07
95% CI: [0.07–0.07]
P-value
< 0.001
z = -117.88
Significance
Method: Poisson rate ratio · Source: Zenodo (n = 1,328,100)
S — Stewardship Bucket
All signals in this bucket: