Easy ORCID

Hoyt, Charles Tapley

doi:10.5281/zenodo.11371268

Published May 28, 2024 | Version 2023.2

Dataset Open

Easy ORCID

Hoyt, Charles Tapley

The first-party ORCID data dump uses a data structure that is overly complex for most use cases. This Zenodo record contains a derived version that is much more straightforwards, accessible, and smaller. So far, this includes employers, education, external identifiers, and publications linked to PubMed. It adds additional processing to ground employers and educational instutitions using the Research Organization Registry (ROR). It also does some minor string processing, such as standardization of education types (e.g., Bachelor of Science, Master of Science).

It includes a pre-build Gilda index for named entity recognition (NER) and named entity normalization (NEN).

The records_hq.json.gz file is a subset of the full records file that only contains records that have at least one ROR-grounded employer, at least one ROR-grounded education, at least one standardized external identifier, or at least one publication indexed in PubMed. The point of this subset is to remove ORCID records that are generally not possible to match up to any external information.

It is automatically generated with code in https://github.com/cthoyt/orcid_downloader.

Files

Files (1.7 GB)

Name	Size	Download all
gilda.tsv.gz md5:c47066daa72f4f55972a54e5d709c46d	464.2 MB	Download
records.json.gz md5:c678e05e31114753d61f85cb55f82569	749.7 MB	Download
records_hq.json.gz md5:10abe8f3f68a4ffda29e6d44dd752128	494.7 MB	Download

Additional details

Is derived from: Dataset: 10.23640/07243.24204912.v1 (DOI)
Requires: Software: 10.5281/zenodo.11371784 (DOI)

	All versions	This version
Views	980	218
Downloads	1,237	82
Data volume	807.2 GB	47.5 GB

Easy ORCID

Authors/Creators

Description

Files

Files (1.7 GB)

Additional details

Related works