Datasets are ranked using a combination of SHARE Score (metadata quality) and Reuse Signals (impact indicators). This dual approach ensures that highly discoverable, well-documented datasets with demonstrated utility rise to the top.
Ranking Algorithm Components:
- SHARE Score (60% weight): The 0-100 score measuring metadata completeness across the 25 universal signals (S, H, A, E, R buckets)
- Reuse Indicator (30% weight): Log-scaled composite of views, downloads, and citations from repository APIs
- Freshness Factor (10% weight): Slight boost for recently published datasets to surface new high-quality contributions
Data Sources:
- Zenodo API (zenodo.org) - Updated daily
- Dryad API (datadryad.org) - Updated daily
- Figshare API (figshare.com) - Updated daily
- DataCite Commons for citation data
Normalization: Reuse metrics are normalized within each repository to account for platform-specific traffic patterns. A dataset with 1,000 downloads on Dryad represents different relative performance than 1,000 downloads on Zenodo.
Algorithm Updates: The ranking formula is continuously refined based on community feedback. Major changes are documented in the changelog.