SHARE Score

About
Framework
/

Access

/

A5

A5

Format Openness

Open formats (CSV, JSON, HDF5) vs. proprietary

Access (A)
Interoperable (I1) — placed in Access for practical reasons

Justification

Open formats determine whether users can access data without proprietary software. RDA-I1-01D (Important) requires standardized format. DataCite includes Format. Dublin Core includes Format. Note: Maps to FAIR I1 (Interoperability) but placed in Access because format determines practical accessibility.

Practical Guide

should-have

Use open formats (CSV, JSON, HDF5). Ensures long-term accessibility.

Open formats determine whether users can access data without proprietary software. We couldn't measure format impact in Zenodo's record-level metadata (format is stored at the file level), but domain repositories demonstrate the value: OpenNeuro requires NIfTI (open), SRA requires FASTQ (open), and these repositories show consistently higher SHARE scores. Open formats are a practical accessibility requirement.

Why this signal matters despite the numbers

No citation data available because Zenodo stores format information at the file level, not in record-level metadata. Domain repositories that enforce open formats (OpenNeuro, SRA, GEO) show consistently higher SHARE scores.

For Repositories

  • Accept and recommend open formats (CSV, JSON, HDF5, NIfTI, FASTQ)
  • Flag proprietary formats with a warning during upload
  • Map to DataCite #14 Format and Dublin Core Format

For Depositors

  • Convert proprietary formats to open alternatives before depositing
  • Prefer CSV over Excel, JSON over proprietary schemas, HDF5 over MATLAB
  • Include format documentation (codebook, data dictionary) with your files

Three standards converge (DataCite, Dublin Core, RDA). Not yet measured but domain repos validate it.

Standards Sources

Convergence score: 3/4 independent sources —

Well justified

StandardField / PropertyObligation Level
DataCite 4.6#14 Format
Optional
Dublin CoreFormat
Core Element
RDA FAIRRDA-I1-01D
Important

FAIR Principle Alignment

Primary mapping: Interoperable (I1) — placed in Access for practical reasons

  • I1: (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation

RDA FAIR Data Maturity Model Indicators:

  • RDA-I1-01D: Data uses knowledge representation expressed in standardised format

How This Signal Is Measured

File format classification against open format list. Binary: at least one open format present.

Empirical Evidence (Zenodo, n=1.3M)

Per-signal statistics use Zenodo as the primary validation source because it is the largest general-purpose repository with structured DataCite metadata, natural variance across all 25 signals, and available citation/usage data. Domain-specific repositories exhibit ceiling effects or restricted variance that preclude per-signal discrimination. Cross-repository validation is reported separately.

Data Source

Zenodo (CERN)

1,328,100 records analyzed

Interpretation: Not directly measurable in Zenodo metadata schema (format stored at file level, not record level).

Cross-repository note: Format openness is best measured in domain repositories: OpenNeuro requires NIfTI (open), SRA requires FASTQ (open). Dryad tracks file formats explicitly.

Quantitative Evidence

Scoring Formula

file_formats ⊆ open_formats → 4 pts

Contribution: 4 of 100 points · Access bucket (0–20)

Data Gap

Empirical validation not yet available for this signal

File format information is stored at the individual file level in Zenodo, not in record-level metadata. Computable from file extension analysis of 1.3M records but not yet processed. Domain repositories enforce open formats by design: OpenNeuro (NIfTI), SRA (FASTQ), GEO (CEL/TXT).

Method: Not yet computed · Source: Zenodo (format at file level)

ShareScore