Dataset Open Access

DOIBoost Dataset Dump

La Bruzzo, Sandro; Manghi, Paolo; Mannocci, Andrea


Citation Style Language JSON Export

{
  "publisher": "Zenodo", 
  "DOI": "10.5281/zenodo.3559699", 
  "language": "eng", 
  "title": "DOIBoost Dataset Dump", 
  "issued": {
    "date-parts": [
      [
        2019, 
        12, 
        2
      ]
    ]
  }, 
  "abstract": "<p>Research in information science and scholarly communication strongly relies on the availability of openly accessible datasets of metadata and, where possible, their relative payloads. To this end, CrossRef plays a pivotal role by providing free access to its entire metadata collection, and allowing other initiatives to link and enrich its information. Therefore, a number of key pieces of information result scattered across diverse datasets and resources freely available online. As a result of this fragmentation, researchers in this domain end up struggling with daily integration problems producing a plethora of ad-hoc datasets, therefore incurring in a waste of time, resources, and infringing open science best practices.&nbsp;</p>\n\n<p>The latest DOIBoost release is&nbsp;a metadata collection that enriches CrossRef (October 2019 release: 108,048,986 publication records) with inputs from Microsoft Academic Graph (October 2019 release: 76,171,072 publication records), ORCID (October 2019 release: 12,642,131 publication records), and Unpaywall (August 2019 release: 26,589,869 publication records) for the purpose of supporting high-quality and robust research experiments. As a result of DOIBoost, CrossRef records have been &quot;boosted&quot; as follows:</p>\n\n<ul>\n\t<li>47,254,618 CrossRef records have been enriched with an abstract from MAG;</li>\n\t<li>33,279,428 CrossRef records have been enriched with an affiliation&nbsp;from MAG and/or ORCID;</li>\n\t<li>509,588 CrossRef records have been enriched with an ORCID identifier from ORCID.</li>\n</ul>\n\n<p>This entry consists of two files: <strong>doiboost_dump-2019-11-27.tar&nbsp;</strong>(contains a set of <strong>partXYZ.gz</strong> files, each one containing the JSON files relative to the enriched CrossRef records), a&nbsp;<strong>schemaAndSample.zip</strong>, and <strong>termsOfUse.doc&nbsp;</strong>(contains details on the terms of use of DOIBoost).</p>\n\n<p>Note that this records comes with two relationships to other results of this experiment:&nbsp;</p>\n\n<ol>\n\t<li>link to the data paper: for more information on how the dataset is (and can be) generated;</li>\n\t<li>link to the software: to repeat the experiment</li>\n</ol>", 
  "author": [
    {
      "family": "La Bruzzo, Sandro"
    }, 
    {
      "family": "Manghi, Paolo"
    }, 
    {
      "family": "Mannocci, Andrea"
    }
  ], 
  "note": "When citing this dataset please cite this record in Zenodo and the relative article: La Bruzzo S., Manghi P., Mannocci A. (2019) OpenAIRE's DOIBoost - Boosting CrossRef for Research. In: Manghi P., Candela L., Silvello G. (eds) Digital Libraries: Supporting Open Science. IRCDL 2019. Communications in Computer and Information Science, vol 988. Springer, doi:10.1007/978-3-030-11226-4_11", 
  "version": "3.0", 
  "type": "dataset", 
  "id": "3559699"
}
1,770
976
views
downloads
All versions This version
Views 1,770920
Downloads 976370
Data volume 35.9 TB14.3 TB
Unique views 1,466798
Unique downloads 380170

Share

Cite as