Dataset Open Access

DOIBoost Dataset Dump

La Bruzzo, Sandro; Manghi, Paolo; Mannocci, Andrea


JSON Export

{
  "files": [
    {
      "links": {
        "self": "https://zenodo.org/api/files/65bca59f-100c-4d3f-a2d8-9ed425aeb779/doiboost_dump-2019-11-27.tar"
      }, 
      "checksum": "md5:ce681a06289c1ec6c6b66ef08dd3c7df", 
      "bucket": "65bca59f-100c-4d3f-a2d8-9ed425aeb779", 
      "key": "doiboost_dump-2019-11-27.tar", 
      "type": "tar", 
      "size": 54094131200
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/65bca59f-100c-4d3f-a2d8-9ed425aeb779/schemaAndSample.zip"
      }, 
      "checksum": "md5:1fa427d04764bc60d6dd77b6071c685e", 
      "bucket": "65bca59f-100c-4d3f-a2d8-9ed425aeb779", 
      "key": "schemaAndSample.zip", 
      "type": "zip", 
      "size": 3891
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/65bca59f-100c-4d3f-a2d8-9ed425aeb779/termsOfUse_dataset.docx"
      }, 
      "checksum": "md5:d53028310151bed623389fea7fc47baf", 
      "bucket": "65bca59f-100c-4d3f-a2d8-9ed425aeb779", 
      "key": "termsOfUse_dataset.docx", 
      "type": "docx", 
      "size": 72421
    }
  ], 
  "owners": [
    80373
  ], 
  "doi": "10.5281/zenodo.3559699", 
  "stats": {
    "version_unique_downloads": 380.0, 
    "unique_views": 798.0, 
    "views": 920.0, 
    "version_views": 1770.0, 
    "unique_downloads": 170.0, 
    "version_unique_views": 1466.0, 
    "volume": 14280855572226.0, 
    "version_downloads": 976.0, 
    "downloads": 370.0, 
    "version_volume": 35910713396036.0
  }, 
  "links": {
    "doi": "https://doi.org/10.5281/zenodo.3559699", 
    "conceptdoi": "https://doi.org/10.5281/zenodo.1438355", 
    "bucket": "https://zenodo.org/api/files/65bca59f-100c-4d3f-a2d8-9ed425aeb779", 
    "conceptbadge": "https://zenodo.org/badge/doi/10.5281/zenodo.1438355.svg", 
    "html": "https://zenodo.org/record/3559699", 
    "latest_html": "https://zenodo.org/record/3559699", 
    "badge": "https://zenodo.org/badge/doi/10.5281/zenodo.3559699.svg", 
    "latest": "https://zenodo.org/api/records/3559699"
  }, 
  "conceptdoi": "10.5281/zenodo.1438355", 
  "created": "2019-12-06T15:31:35.081772+00:00", 
  "updated": "2020-01-24T19:25:09.477994+00:00", 
  "conceptrecid": "1438355", 
  "revision": 3, 
  "id": 3559699, 
  "metadata": {
    "access_right_category": "success", 
    "doi": "10.5281/zenodo.3559699", 
    "version": "3.0", 
    "language": "eng", 
    "title": "DOIBoost Dataset Dump", 
    "license": {
      "id": "CC-BY-4.0"
    }, 
    "related_identifiers": [
      {
        "scheme": "doi", 
        "identifier": "10.5281/zenodo.1441058", 
        "relation": "isCompiledBy", 
        "resource_type": "software"
      }, 
      {
        "scheme": "doi", 
        "identifier": "10.5281/zenodo.1441072", 
        "relation": "isSupplementTo", 
        "resource_type": "publication-preprint"
      }, 
      {
        "scheme": "doi", 
        "identifier": "10.5281/zenodo.1438355", 
        "relation": "isVersionOf"
      }
    ], 
    "notes": "When citing this dataset please cite this record in Zenodo and the relative article: La Bruzzo S., Manghi P., Mannocci A. (2019) OpenAIRE's DOIBoost - Boosting CrossRef for Research. In: Manghi P., Candela L., Silvello G. (eds) Digital Libraries: Supporting Open Science. IRCDL 2019. Communications in Computer and Information Science, vol 988. Springer, doi:10.1007/978-3-030-11226-4_11", 
    "relations": {
      "version": [
        {
          "count": 2, 
          "index": 1, 
          "parent": {
            "pid_type": "recid", 
            "pid_value": "1438355"
          }, 
          "is_last": true, 
          "last_child": {
            "pid_type": "recid", 
            "pid_value": "3559699"
          }
        }
      ]
    }, 
    "communities": [
      {
        "id": "openaire"
      }, 
      {
        "id": "openaire-research-graph"
      }
    ], 
    "grants": [
      {
        "code": "777541", 
        "links": {
          "self": "https://zenodo.org/api/grants/10.13039/501100000780::777541"
        }, 
        "title": "OpenAIRE Advancing Open Scholarship", 
        "acronym": "OpenAIRE-Advance", 
        "program": "H2020", 
        "funder": {
          "doi": "10.13039/501100000780", 
          "acronyms": [], 
          "name": "European Commission", 
          "links": {
            "self": "https://zenodo.org/api/funders/10.13039/501100000780"
          }
        }
      }
    ], 
    "keywords": [
      "dataset", 
      "CrossRef", 
      "Microsoft Academic Graph", 
      "Unpaywall", 
      "Spark", 
      "aggregation", 
      "metadata", 
      "enrichment", 
      "ORCID"
    ], 
    "publication_date": "2019-12-02", 
    "creators": [
      {
        "orcid": "0000-0003-2855-1245", 
        "affiliation": "Institute of Information Science and Technology - CNR", 
        "name": "La Bruzzo, Sandro"
      }, 
      {
        "orcid": "0000-0001-7291-3210", 
        "affiliation": "Institute of Information Science and Technology - CNR", 
        "name": "Manghi, Paolo"
      }, 
      {
        "orcid": "0000-0002-5193-7851", 
        "affiliation": "Knowledge Media Institute - Open University", 
        "name": "Mannocci, Andrea"
      }
    ], 
    "access_right": "open", 
    "resource_type": {
      "type": "dataset", 
      "title": "Dataset"
    }, 
    "description": "<p>Research in information science and scholarly communication strongly relies on the availability of openly accessible datasets of metadata and, where possible, their relative payloads. To this end, CrossRef plays a pivotal role by providing free access to its entire metadata collection, and allowing other initiatives to link and enrich its information. Therefore, a number of key pieces of information result scattered across diverse datasets and resources freely available online. As a result of this fragmentation, researchers in this domain end up struggling with daily integration problems producing a plethora of ad-hoc datasets, therefore incurring in a waste of time, resources, and infringing open science best practices.&nbsp;</p>\n\n<p>The latest DOIBoost release is&nbsp;a metadata collection that enriches CrossRef (October 2019 release: 108,048,986 publication records) with inputs from Microsoft Academic Graph (October 2019 release: 76,171,072 publication records), ORCID (October 2019 release: 12,642,131 publication records), and Unpaywall (August 2019 release: 26,589,869 publication records) for the purpose of supporting high-quality and robust research experiments. As a result of DOIBoost, CrossRef records have been &quot;boosted&quot; as follows:</p>\n\n<ul>\n\t<li>47,254,618 CrossRef records have been enriched with an abstract from MAG;</li>\n\t<li>33,279,428 CrossRef records have been enriched with an affiliation&nbsp;from MAG and/or ORCID;</li>\n\t<li>509,588 CrossRef records have been enriched with an ORCID identifier from ORCID.</li>\n</ul>\n\n<p>This entry consists of two files: <strong>doiboost_dump-2019-11-27.tar&nbsp;</strong>(contains a set of <strong>partXYZ.gz</strong> files, each one containing the JSON files relative to the enriched CrossRef records), a&nbsp;<strong>schemaAndSample.zip</strong>, and <strong>termsOfUse.doc&nbsp;</strong>(contains details on the terms of use of DOIBoost).</p>\n\n<p>Note that this records comes with two relationships to other results of this experiment:&nbsp;</p>\n\n<ol>\n\t<li>link to the data paper: for more information on how the dataset is (and can be) generated;</li>\n\t<li>link to the software: to repeat the experiment</li>\n</ol>"
  }
}
1,770
976
views
downloads
All versions This version
Views 1,770920
Downloads 976370
Data volume 35.9 TB14.3 TB
Unique views 1,466798
Unique downloads 380170

Share

Cite as