Easy ORCID
Creators
Description
The first-party ORCID data dump uses a data structure that is overly complex for most use cases. This Zenodo record contains a derived version that is much more straightforwards, accessible, and smaller. So far, this includes employers, education, external identifiers, and publications linked to PubMed. It adds additional processing to ground employers and educational instutitions using the Research Organization Registry (ROR). It also does some minor string processing, such as standardization of education types (e.g., Bachelor of Science, Master of Science).
It includes a pre-build Gilda index for named entity recognition (NER) and named entity normalization (NEN).
The records_hq.json.gz
file is a subset of the full records file that only contains records that have at least one ROR-grounded employer, at least one ROR-grounded education, at least one standardized external identifier, or at least one publication indexed in PubMed. The point of this subset is to remove ORCID records that are generally not possible to match up to any external information.
It is automatically generated with code in https://github.com/cthoyt/orcid_downloader.
Files
Files
(1.7 GB)
Name | Size | Download all |
---|---|---|
md5:c47066daa72f4f55972a54e5d709c46d
|
464.2 MB | Download |
md5:c678e05e31114753d61f85cb55f82569
|
749.7 MB | Download |
md5:10abe8f3f68a4ffda29e6d44dd752128
|
494.7 MB | Download |
Additional details
Related works
- Is derived from
- Dataset: 10.23640/07243.24204912.v1 (DOI)
- Requires
- Software: 10.5281/zenodo.11371784 (DOI)