{
  "DOI": "10.5281/zenodo.2609187",
  "abstract": "Description\n\nunarXive is a scholarly data set containing publications' full-text, annotated in-text citations, and a citation network.\n\nThe data is generated from all LaTeX sources on arXiv and therefore of higher quality than data generated from PDF files.\n\nTypical use cases are\n\n\n\nCitation recommendation\n\nCitation context analysis\n\nBibliographic analyses\n\nReference string parsing\n\n\nNote: This Zenodo record is an old version of unarXive. You can find the most recent version at https://zenodo.org/record/7752754 and https://zenodo.org/record/7752615\n\nAccess\n\n\u250f\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2513\u2503 \u00a0D O W N L O A D \u00a0 S A M P L E \u00a0\u2009\u2503\u2517\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u251b\n\nTo download the whole data set send an access request and note the following:\n\n\n\nNote: this Zenodo record is a \"full\" version of unarXive, which was generated from all of arXiv.org including non-permissively licensed papers. Make sure that your use of the data is compliant with the paper's licensing terms.\u00b9\n\n\u00b9 For information on papers' licenses use arXiv's bulk metadata access.\n\n\nThe code used for generating the data set is publicly available.\n\nUsage examples for our data set are provided at here on GitHub.\n\nCiting\n\nThis initial version of unarXive is described in the following journal article.\n\nTarek Saier, Michael F\u00e4rber: \"unarXive: A Large Scholarly Data Set with Publications' Full-Text, Annotated In-Text Citations, and Links to Metadata\", Scientometrics, 2020,[link to an author copy]\n\nThe updated version is described in the following conference paper.\n\nTarek Saier, Michael F\u00e4rber. \"unarXive 2022: All arXiv Publications Pre-Processed for NLP, Including Structured Full-Text and Citation Network\", JCDL 2023.[link to an author copy]",
  "author": [
    {
      "family": "Saier",
      "given": "Tarek"
    },
    {
      "family": "F\u00e4rber",
      "given": "Michael"
    }
  ],
  "id": "2609187",
  "issued": {
    "date-parts": [
      [
        "2019",
        "02",
        "01"
      ]
    ]
  },
  "publisher": "Zenodo",
  "title": "Bibliometric-Enhanced arXiv: A Data Set for Paper-Based and Citation-Based Tasks",
  "type": "dataset"
}