Dataset Open Access
La Bruzzo, Sandro;
Manghi, Paolo;
Mannocci, Andrea
Research in information science and scholarly communication strongly relies on the availability of openly accessible datasets of metadata and, where possible, their relative payloads. To this end, CrossRef plays a pivotal role by providing free access to its entire metadata collection, and allowing other initiatives to link and enrich its information. Therefore, a number of key pieces of information result scattered across diverse datasets and resources freely available online. As a result of this fragmentation, researchers in this domain end up struggling with daily integration problems producing a plethora of ad-hoc datasets, therefore incurring in a waste of time, resources, and infringing open science best practices.
The latest DOIBoost release is a metadata collection that enriches CrossRef (October 2019 release: 108,048,986 publication records) with inputs from Microsoft Academic Graph (October 2019 release: 76,171,072 publication records), ORCID (October 2019 release: 12,642,131 publication records), and Unpaywall (August 2019 release: 26,589,869 publication records) for the purpose of supporting high-quality and robust research experiments. As a result of DOIBoost, CrossRef records have been "boosted" as follows:
This entry consists of two files: doiboost_dump-2019-11-27.tar (contains a set of partXYZ.gz files, each one containing the JSON files relative to the enriched CrossRef records), a schemaAndSample.zip, and termsOfUse.doc (contains details on the terms of use of DOIBoost).
Note that this records comes with two relationships to other results of this experiment:
Name | Size | |
---|---|---|
doiboost_dump-2019-11-27.tar
md5:ce681a06289c1ec6c6b66ef08dd3c7df |
54.1 GB | Download |
schemaAndSample.zip
md5:1fa427d04764bc60d6dd77b6071c685e |
3.9 kB | Download |
termsOfUse_dataset.docx
md5:d53028310151bed623389fea7fc47baf |
72.4 kB | Download |
All versions | This version | |
---|---|---|
Views | 3,457 | 2,175 |
Downloads | 3,689 | 3,036 |
Data volume | 174.6 TB | 152.4 TB |
Unique views | 2,975 | 1,966 |
Unique downloads | 817 | 583 |