There is a newer version of this record available.

Dataset Open Access

Bibliometric-Enhanced arXiv: A Data Set for Paper-Based and Citation-Based Tasks

Saier, Tarek; Färber, Michael

JSON-LD ( Export

  "description": "<p>We propose a new data set based on <strong>all publications from all scientific fields available on</strong>. Apart from providing the <strong>papers&#39; plain text</strong>, <strong>in-text citations</strong> were annotated via global identifiers. As far as possible, cited publications were linked to the <strong>Microsoft Academic Graph</strong>. Our data set consists of <strong>over one million documents</strong> and <strong>29.2 million citation contexts</strong>. The data set, which is made freely available for research purposes, not only can enhance the future evaluation of researchpaper-based and citation context-based approaches but also serve as a basis for novel ideas to analyze papers.</p>\n\n<p>More information can be found in our paper <a href=\"\">Bibliometric-Enhanced arXiv: A Data Set for Paper-Based and Citation-Based Tasks</a>.</p>\n\n<p>See <a href=\"\"></a> for the source code which has been used for creating the data set.</p>", 
  "license": "", 
  "creator": [
      "affiliation": "University of Freiburg", 
      "@type": "Person", 
      "name": "Saier, Tarek"
      "affiliation": "University of Freiburg", 
      "@id": "", 
      "@type": "Person", 
      "name": "F\u00e4rber, Michael"
  "url": "", 
  "datePublished": "2019-02-01", 
  "keywords": [
    "scholarly data", 
    "digital libraries", 
  "@context": "", 
  "distribution": [
      "contentUrl": "", 
      "encodingFormat": "gz", 
      "@type": "DataDownload"
  "identifier": "", 
  "@id": "", 
  "@type": "Dataset", 
  "name": "Bibliometric-Enhanced arXiv: A Data Set for Paper-Based and Citation-Based Tasks"
All versions This version
Views 3,437557
Downloads 24,19711,642
Data volume 494.3 TB266.5 TB
Unique views 2,750494
Unique downloads 3,067902


Cite as