There is a newer version of the record available.

Published May 28, 2024 | Version 2023.2
Dataset Open

Easy ORCID

Description

The first-party ORCID data dump uses a data structure that is overly complex for most use cases. This Zenodo record contains a derived version that is much more straightforwards, accessible, and smaller. So far, this includes employers, education, external identifiers, and publications linked to PubMed. It adds additional processing to ground employers and educational instutitions using the Research Organization Registry (ROR). It also does some minor string processing, such as standardization of education types (e.g., Bachelor of Science, Master of Science).

It includes a pre-build Gilda index for named entity recognition (NER) and named entity normalization (NEN).

The records_hq.json.gz file is a subset of the full records file that only contains records that have at least one ROR-grounded employer, at least one ROR-grounded education, at least one standardized external identifier, or at least one publication indexed in PubMed. The point of this subset is to remove ORCID records that are generally not possible to match up to any external information.

It is automatically generated with code in https://github.com/cthoyt/orcid_downloader.

Files

Files (1.7 GB)

Name Size Download all
md5:c47066daa72f4f55972a54e5d709c46d
464.2 MB Download
md5:c678e05e31114753d61f85cb55f82569
749.7 MB Download
md5:10abe8f3f68a4ffda29e6d44dd752128
494.7 MB Download

Additional details

Related works

Is derived from
Dataset: 10.23640/07243.24204912.v1 (DOI)
Requires
Software: 10.5281/zenodo.11371784 (DOI)