DOIBoost Dataset Dump

La Bruzzo, Sandro; Manghi, Paolo; Mannocci, Andrea

doi:10.5281/zenodo.3559699

Published December 2, 2019 | Version 3.0

Dataset Open

DOIBoost Dataset Dump

1. Institute of Information Science and Technology - CNR
2. Knowledge Media Institute - Open University

Research in information science and scholarly communication strongly relies on the availability of openly accessible datasets of metadata and, where possible, their relative payloads. To this end, CrossRef plays a pivotal role by providing free access to its entire metadata collection, and allowing other initiatives to link and enrich its information. Therefore, a number of key pieces of information result scattered across diverse datasets and resources freely available online. As a result of this fragmentation, researchers in this domain end up struggling with daily integration problems producing a plethora of ad-hoc datasets, therefore incurring in a waste of time, resources, and infringing open science best practices.

The latest DOIBoost release is a metadata collection that enriches CrossRef (October 2019 release: 108,048,986 publication records) with inputs from Microsoft Academic Graph (October 2019 release: 76,171,072 publication records), ORCID (October 2019 release: 12,642,131 publication records), and Unpaywall (August 2019 release: 26,589,869 publication records) for the purpose of supporting high-quality and robust research experiments. As a result of DOIBoost, CrossRef records have been "boosted" as follows:

47,254,618 CrossRef records have been enriched with an abstract from MAG;
33,279,428 CrossRef records have been enriched with an affiliation from MAG and/or ORCID;
509,588 CrossRef records have been enriched with an ORCID identifier from ORCID.

This entry consists of two files: doiboost_dump-2019-11-27.tar (contains a set of partXYZ.gz files, each one containing the JSON files relative to the enriched CrossRef records), a schemaAndSample.zip, and termsOfUse.doc (contains details on the terms of use of DOIBoost).

Note that this records comes with two relationships to other results of this experiment:

link to the data paper: for more information on how the dataset is (and can be) generated;
link to the software: to repeat the experiment

Notes

When citing this dataset please cite this record in Zenodo and the relative article: La Bruzzo S., Manghi P., Mannocci A. (2019) OpenAIRE's DOIBoost - Boosting CrossRef for Research. In: Manghi P., Candela L., Silvello G. (eds) Digital Libraries: Supporting Open Science. IRCDL 2019. Communications in Computer and Information Science, vol 988. Springer, doi:10.1007/978-3-030-11226-4_11

Files

schemaAndSample.zip

Files (54.1 GB)

Name	Size
doiboost_dump-2019-11-27.tar md5:ce681a06289c1ec6c6b66ef08dd3c7df	54.1 GB	Download
schemaAndSample.zip md5:1fa427d04764bc60d6dd77b6071c685e	3.9 kB	Preview Download
termsOfUse_dataset.docx md5:d53028310151bed623389fea7fc47baf	72.4 kB	Download

Additional details

Is compiled by: Software: 10.5281/zenodo.1441058 (DOI)
Is supplement to: Preprint: 10.5281/zenodo.1441072 (DOI)

European Commission
OpenAIRE-Advance - OpenAIRE Advancing Open Scholarship 777541

	All versions	This version
Views	5,815	3,864
Downloads	1,697	1,180
Data volume	191.9 TB	165.5 TB

schemaAndSample.zip

Files (54.1 GB)

Related works

Funding

DOIBoost Dataset Dump

Authors/Creators

Description

Notes

Files

schemaAndSample.zip

Files (54.1 GB)

Additional details

Related works

Funding