The S-Index

A researcher-level metric for data sharing practices, directly analogous to the h-index for publications.

A researcher has an S-Index of n if n of their datasets have SHARE scores of at least n.

Just like the h-index rewards both quantity and quality of publications, the S-Index rewards researchers who consistently share well-documented datasets.

Example Calculation

Researcher has 8 datasets with SHARE scores:
Sorted descending: [78, 72, 68, 61, 55, 45, 42, 31]
Position:           1   2   3   4   5   6   7   8

Check each position:
  Position 1: 78 ≥ 1? Yes ✓
  Position 2: 72 ≥ 2? Yes ✓
  ...
  Position 8: 31 ≥ 8? Yes ✓

S-Index = 8 (limited by number of datasets)

Citation: S-Index(v1.0) = 8 (as of 2026-01-24, n=8 datasets)

🎯

Key Properties

Rewards Quantity + Quality

Need multiple high-scoring datasets to achieve high S-Index. One perfect dataset isn't enough.

Robust to Outliers

Low-scoring datasets don't drag down a high S-Index. Early-career experiments don't penalize you.

Familiar Model

Works like the h-index researchers already understand. Easy adoption and intuitive comparison.

Gaming Resistant

Can't inflate by adding many low-quality datasets. Encourages genuine improvement.

📊

Interpretation Guide

S-Index	Rating
50+	Exceptional - Prolific with consistently strong practices
30-49	Strong - Substantial portfolio of well-documented datasets
15-29	Developing - Growing portfolio, typical for mid-career
5-14	Early - Building portfolio, new to open data sharing
1-4	Beginning - Just starting to share data openly

⚖

S-Index vs h-Index

Property	h-Index (Publications)	S-Index (Data Sharing)
Definition	h papers with ≥ h citations each	n datasets with SHARE score ≥ n each
What it measures	Publication impact	Data sharing quality and consistency
Scale	Unbounded (typically 0-100+)	0-100 (bounded by SHARE score max)
Cross-field comparison	Difficult (citation norms vary)	Fair (universal signal vocabulary)

Note: The S-Index measures data sharing practices, not research quality. A high S-Index indicates excellent metadata practices across many datasets. It should be used alongside, not instead of, traditional research quality indicators.