Dataset Open Access

DOIBoost Dataset Dump

La Bruzzo, Sandro; Manghi, Paolo; Mannocci, Andrea


Dublin Core Export

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
  <dc:creator>La Bruzzo, Sandro</dc:creator>
  <dc:creator>Manghi, Paolo</dc:creator>
  <dc:creator>Mannocci, Andrea</dc:creator>
  <dc:date>2019-12-02</dc:date>
  <dc:description>Research in information science and scholarly communication strongly relies on the availability of openly accessible datasets of metadata and, where possible, their relative payloads. To this end, CrossRef plays a pivotal role by providing free access to its entire metadata collection, and allowing other initiatives to link and enrich its information. Therefore, a number of key pieces of information result scattered across diverse datasets and resources freely available online. As a result of this fragmentation, researchers in this domain end up struggling with daily integration problems producing a plethora of ad-hoc datasets, therefore incurring in a waste of time, resources, and infringing open science best practices. 

The latest DOIBoost release is a metadata collection that enriches CrossRef (October 2019 release: 108,048,986 publication records) with inputs from Microsoft Academic Graph (October 2019 release: 76,171,072 publication records), ORCID (October 2019 release: 12,642,131 publication records), and Unpaywall (August 2019 release: 26,589,869 publication records) for the purpose of supporting high-quality and robust research experiments. As a result of DOIBoost, CrossRef records have been "boosted" as follows:


	47,254,618 CrossRef records have been enriched with an abstract from MAG;
	33,279,428 CrossRef records have been enriched with an affiliation from MAG and/or ORCID;
	509,588 CrossRef records have been enriched with an ORCID identifier from ORCID.


This entry consists of two files: doiboost_dump-2019-11-27.tar (contains a set of partXYZ.gz files, each one containing the JSON files relative to the enriched CrossRef records), a schemaAndSample.zip, and termsOfUse.doc (contains details on the terms of use of DOIBoost).

Note that this records comes with two relationships to other results of this experiment: 


	link to the data paper: for more information on how the dataset is (and can be) generated;
	link to the software: to repeat the experiment
</dc:description>
  <dc:description>When citing this dataset please cite this record in Zenodo and the relative article: La Bruzzo S., Manghi P., Mannocci A. (2019) OpenAIRE's DOIBoost - Boosting CrossRef for Research. In: Manghi P., Candela L., Silvello G. (eds) Digital Libraries: Supporting Open Science. IRCDL 2019. Communications in Computer and Information Science, vol 988. Springer, doi:10.1007/978-3-030-11226-4_11</dc:description>
  <dc:identifier>https://zenodo.org/record/3559699</dc:identifier>
  <dc:identifier>10.5281/zenodo.3559699</dc:identifier>
  <dc:identifier>oai:zenodo.org:3559699</dc:identifier>
  <dc:language>eng</dc:language>
  <dc:relation>info:eu-repo/grantAgreement/EC/H2020/777541/</dc:relation>
  <dc:relation>doi:10.5281/zenodo.1441058</dc:relation>
  <dc:relation>doi:10.5281/zenodo.1441072</dc:relation>
  <dc:relation>doi:10.5281/zenodo.1438355</dc:relation>
  <dc:relation>url:https://zenodo.org/communities/openaire</dc:relation>
  <dc:relation>url:https://zenodo.org/communities/openaire-research-graph</dc:relation>
  <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
  <dc:rights>https://creativecommons.org/licenses/by/4.0/legalcode</dc:rights>
  <dc:subject>dataset</dc:subject>
  <dc:subject>CrossRef</dc:subject>
  <dc:subject>Microsoft Academic Graph</dc:subject>
  <dc:subject>Unpaywall</dc:subject>
  <dc:subject>Spark</dc:subject>
  <dc:subject>aggregation</dc:subject>
  <dc:subject>metadata</dc:subject>
  <dc:subject>enrichment</dc:subject>
  <dc:subject>ORCID</dc:subject>
  <dc:title>DOIBoost Dataset Dump</dc:title>
  <dc:type>info:eu-repo/semantics/other</dc:type>
  <dc:type>dataset</dc:type>
</oai_dc:dc>
1,774
976
views
downloads
All versions This version
Views 1,774923
Downloads 976370
Data volume 35.9 TB14.3 TB
Unique views 1,469801
Unique downloads 380170

Share

Cite as