Ethical Transparency
IRB approval, consent statements, ethics documentation
Justification
This signal is policy-derived rather than standards-derived. While not present in traditional metadata standards, ethical transparency is required by the NIH Data Management and Sharing Policy (2023), the Common Rule for human subjects research, and GDPR for sensitive data. IRB approval numbers and consent statements provide critical provenance for clinical and biobank datasets.
Practical Guide
Add IRB/ethics info for human subjects data. Policy requirement.
Ethical transparency — IRB approval numbers, consent statements — is rare in general repositories (0.2% prevalence) but mandatory for clinical and biobank datasets under NIH, Common Rule, and GDPR. The 0.39x citation ratio reflects the small, specialized nature of human subjects data. This signal is policy-driven, not citation-driven.
Why this signal matters despite the numbers
The 0.39x citation ratio reflects that ethical transparency is concentrated in clinical datasets with smaller, specialized audiences. This signal is policy-mandated — NIH, Common Rule, and GDPR all require it for human subjects data.
For Repositories
- Add optional fields for IRB approval number and consent statement
- Make ethics documentation required for datasets tagged as human subjects
- Link to NIH DMS Policy guidance for depositors
For Depositors
- Include IRB approval number if your study involved human subjects
- Add a consent statement describing data use permissions
- Reference your institution's IRB protocol number for verifiability
Required for clinical/biobank datasets by law and NIH policy. Not applicable to most physical science or software datasets.
Standards Sources
Convergence score: 1/4 independent sources —
| Standard | Field / Property | Obligation Level |
|---|---|---|
| NIH DMS Policy (2023) | Ethical oversight documentation | Required (policy) |
| Common Rule (45 CFR 46) | IRB approval | Required (regulation) |
| GDPR Article 9 | Consent for sensitive data | Required (regulation) |
FAIR Principle Alignment
Primary mapping: Reusable (R1.2) [Policy-driven]
- R1.2: (Meta)data are associated with detailed provenance
How This Signal Is Measured
Presence of IRB/ethics approval number, consent statement, or ethics documentation reference. Binary: present or absent.
Empirical Evidence (Zenodo, n=1.3M)
Per-signal statistics use Zenodo as the primary validation source because it is the largest general-purpose repository with structured DataCite metadata, natural variance across all 25 signals, and available citation/usage data. Domain-specific repositories exhibit ceiling effects or restricted variance that preclude per-signal discrimination. Cross-repository validation is reported separately.
Prevalence
0.2%
of Zenodo datasets
Citation Lift
0.4x
vs. datasets without
Data Source
Zenodo (CERN)
1,328,100 records analyzed
Interpretation: Very rare in general-purpose repositories. Ethical transparency is primarily relevant for clinical and biobank datasets (validated via MGB Biobank partnership with 165K patients).
Quantitative Evidence
Scoring Formula
ethics_documentation ∈ record → 4 pts
Contribution: 4 of 100 points · Stewardship bucket (0–20)
With Signal Present
2,583
datasets (0.2%)
μ = 0.096 citations/dataset
Without Signal
1,325,517
datasets (99.8%)
μ = 0.244 citations/dataset
Rate Ratio
0.39
95% CI: [0.35–0.45]
P-value
< 0.001
z = -14.68
Significance
Method: Poisson rate ratio · Source: Zenodo (n = 1,328,100)
Note: Policy-driven signal. Low prevalence (0.2%) reflects general-purpose repository; higher in clinical/biobank contexts (validated via MGB Biobank, 165K patients).
S — Stewardship Bucket
All signals in this bucket: