Published April 22, 2026 | Version v1
Dataset Open

Crossref funder names to ROR IDs

Authors/Creators

  • 1. ROR icon Crossref

Description

Crossref funder names to ROR IDs

This dataset contains funder names from the metadata of scholarly works, matched with ROR IDs for the funding organizations. It is a sample of funder name strings from the Crossref metadata, manually labeled with the correct ROR ID(s). Each funder name can be matched to zero, one, or multiple ROR IDs.

The funder names were extracted from a July 2025 snapshot of the Crossref works data. There are 25,698,253 funder entries with names across 12,433,534 different works. (A single work can have more than one funder entry.) There are 3,004,870 unique names among these. There is skew in this data—some names occur much more often than others. This dataset comprises a weighted sample of funder names, with each weight representing the count of entries with that name that do not already have a funder ID asserted in the Crossref metadata.

A human evaluator manually matched all funder names to ROR IDs using ROR's online search, with ROR data up to April 2026. Active and inactive ROR records were considered, but not Withdrawn ROR records.

Some funder names also have "alternate" matches, to handle cases where funder strings might be ambiguous even for a human evaluator, or a matching strategy might identify a parent organization rather than the direct target—which may be acceptable depending on the use case. For funder names with alternate matches, the dataset includes mappings between those alternate IDs (or "no_match") and the primary matched ID. This enables "relaxed" evaluation of matching methods that does not penalize these ambiguous cases. See the documentation for the crossref-matcher library for more information.

The dataset contains:

  • 3,505 unique funder name strings
    • Total weight: 2,138,538
  • 1,895 (54%) of the names have at least one ROR ID match
  • 151 (4.3%) of the names have at least one “alternate” match

The dataset is provided in two formats: a single JSON-lines (.jsonl) file, and two CSV files.

The JSON-lines file funders-crossref-weighted-with-alternates-2025-07-05.jsonl contains one JSON object per line, with the fields:

  • seq_no (int): zero-based sequence number (index) of the item
  • input (string): the funder name from Crossref’s metadata
  • output (list of strings): matched ROR ID(s) for this item, or an empty list if no match exists
  • alternates (list of string, string pairs): alternate matches for this item (see above)
  • weight (number): number of occurrences of this funder name in the Crossref data submitted without an ID

The two CSV files provide the same data in a tabular format:

funder_matches.csv contains the primary funder name matches, with one row per unique funder name string:

  • name (string): the funder name from Crossref's metadata
  • num_occurrences (int): the count of funder entries with this name in Crossref data (including those submitted with an ID)
  • weight (number): number of occurrences of this funder name in the Crossref data submitted without an ID
  • matched_id (string): the matched ROR ID(s) for this funder name, or "no_match" if no match exists. If there are multiple matched IDs, they are semicolon-separated.

funder_alternate_matches.csv contains the alternate match mappings for funder names that have them:

  • name (string): the funder name
  • relaxed_match_id (string): an alternate ROR ID, or "no_match" that could be considered a valid match for this funder name
  • map_to_id (string): the primary matched ROR ID for this funder name (from funder_matches.csv), or "no_match"

The 151 funder names with alternate matches each have one or more rows in funder_alternate_matches.csv.

Files

funder_matches.csv

Files (765.8 kB)

Name Size Download all
md5:c8ea7ed3be37568df1ccb25e67d07c21
15.3 kB Preview Download
md5:2a20e90af900b41001d69a71026b85db
255.1 kB Preview Download
md5:dd5c986e78bb3cdf49a47673e4db81d8
495.4 kB Download