{
  "access": {
    "embargo": {
      "active": false,
      "reason": null
    },
    "files": "restricted",
    "record": "public",
    "status": "restricted"
  },
  "created": "2019-03-30T09:28:29.813865+00:00",
  "custom_fields": {},
  "deletion_status": {
    "is_deleted": false,
    "status": "P"
  },
  "files": {
    "enabled": true
  },
  "id": "2609187",
  "is_draft": false,
  "is_published": true,
  "links": {
    "access": "https://zenodo.org/api/records/2609187/access",
    "access_grants": "https://zenodo.org/api/records/2609187/access/grants",
    "access_links": "https://zenodo.org/api/records/2609187/access/links",
    "access_request": "https://zenodo.org/api/records/2609187/access/request",
    "access_users": "https://zenodo.org/api/records/2609187/access/users",
    "archive": "https://zenodo.org/api/records/2609187/files-archive",
    "archive_media": "https://zenodo.org/api/records/2609187/media-files-archive",
    "communities": "https://zenodo.org/api/records/2609187/communities",
    "communities-suggestions": "https://zenodo.org/api/records/2609187/communities-suggestions",
    "doi": "https://doi.org/10.5281/zenodo.2609187",
    "draft": "https://zenodo.org/api/records/2609187/draft",
    "files": "https://zenodo.org/api/records/2609187/files",
    "latest": "https://zenodo.org/api/records/2609187/versions/latest",
    "latest_html": "https://zenodo.org/records/2609187/latest",
    "media_files": "https://zenodo.org/api/records/2609187/media-files",
    "parent": "https://zenodo.org/api/records/2553522",
    "parent_doi": "https://zenodo.org/doi/10.5281/zenodo.2553522",
    "parent_html": "https://zenodo.org/records/2553522",
    "requests": "https://zenodo.org/api/records/2609187/requests",
    "reserve_doi": "https://zenodo.org/api/records/2609187/draft/pids/doi",
    "self": "https://zenodo.org/api/records/2609187",
    "self_doi": "https://zenodo.org/doi/10.5281/zenodo.2609187",
    "self_html": "https://zenodo.org/records/2609187",
    "self_iiif_manifest": "https://zenodo.org/api/iiif/record:2609187/manifest",
    "self_iiif_sequence": "https://zenodo.org/api/iiif/record:2609187/sequence/default",
    "versions": "https://zenodo.org/api/records/2609187/versions"
  },
  "media_files": {
    "enabled": false
  },
  "metadata": {
    "creators": [
      {
        "affiliations": [
          {
            "name": "University of Freiburg"
          }
        ],
        "person_or_org": {
          "family_name": "Saier",
          "given_name": "Tarek",
          "name": "Saier, Tarek",
          "type": "personal"
        }
      },
      {
        "affiliations": [
          {
            "name": "University of Freiburg"
          }
        ],
        "person_or_org": {
          "family_name": "F\u00e4rber",
          "given_name": "Michael",
          "identifiers": [
            {
              "identifier": "0000-0001-5458-8645",
              "scheme": "orcid"
            }
          ],
          "name": "F\u00e4rber, Michael",
          "type": "personal"
        }
      }
    ],
    "description": "<h2><strong>Description</strong></h2>\n<p><strong>unarXive</strong> is a scholarly data set containing <strong>publications' full-text</strong>, annotated <strong>in-text citations</strong>, and a <strong>citation network</strong>.</p>\n<p>The data is <strong>generated from all LaTeX sources on </strong><a href=\"https://arxiv.org/\"><strong>arXiv</strong></a> and therefore of higher quality than data generated from PDF files.</p>\n<p>Typical <strong>use cases</strong> are</p>\n<ul>\n<li>Citation recommendation</li>\n<li>Citation context analysis</li>\n<li>Bibliographic analyses</li>\n<li>Reference string parsing</li>\n</ul>\n<p><strong>Note:</strong> This Zenodo record is an old version of unarXive. You can find the <strong>most recent version</strong> at <a href=\"../record/7752754\">https://zenodo.org/record/7752754</a> and <a href=\"../record/7752615\">https://zenodo.org/record/7752615</a></p>\n<h2><strong>Access</strong></h2>\n<p>\u250f\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2513<br>\u2503 &nbsp;<a href=\"https://github.com/IllDepence/unarXive/blob/legacy_2020/doc/unarXive_sample.tar.bz2\"><strong>D O W N L O A D &nbsp; S A M P L E</strong></a> &nbsp;&thinsp;\u2503<br>\u2517\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u251b</p>\n<p>To download the whole data set send an access request and note the following:</p>\n<blockquote>\n<p><strong>Note</strong>: this Zenodo record is a \"full\" version of unarXive, which was generated from all of arXiv.org <em>including non-permissively licensed papers</em>. Make sure that your use of the data is compliant with the paper's licensing terms.&sup1;</p>\n<p>&sup1; For information on papers' licenses use <a href=\"https://info.arxiv.org/help/bulk_data/index.html\">arXiv's bulk metadata access</a>.</p>\n</blockquote>\n<p>The <strong>code</strong> used for generating the data set is <a href=\"https://github.com/IllDepence/unarXive/tree/legacy_2020/\">publicly available</a>.</p>\n<p><strong>Usage examples</strong> for our data set are provided at <a href=\"https://github.com/IllDepence/unarXive/tree/legacy_2020/#usage-examples\">here on GitHub</a>.</p>\n<h2><strong>Citing</strong></h2>\n<p>This initial version of unarXive is described in the following journal article.</p>\n<p><em>Tarek Saier, Michael F&auml;rber: \"</em><a href=\"http://dx.doi.org/10.1007/s11192-020-03382-z\"><em>unarXive: A Large Scholarly Data Set with Publications' Full-Text, Annotated In-Text Citations, and Links to Metadata</em></a><em>\", Scientometrics, 2020,</em><br>[<a href=\"https://www.aifb.kit.edu/images/f/f9/UnarXive_Scientometrics2020.pdf\">link to an author copy]</a></p>\n<p>The <strong>updated version</strong> is described in the following conference paper.</p>\n<p><em>Tarek Saier, Michael F&auml;rber. \"</em><a href=\"10.1109/JCDL57899.2023.00020\"><em>unarXive 2022: All arXiv Publications Pre-Processed for NLP, Including Structured Full-Text and Citation Network</em></a><em>\", JCDL 2023.</em><br>[<a href=\"https://doi.org/10.48550/arXiv.2303.14957\">link to an author copy</a>]</p>",
    "publication_date": "2019-02-01",
    "publisher": "Zenodo",
    "related_identifiers": [
      {
        "identifier": "http://ceur-ws.org/Vol-2345/paper2.pdf",
        "relation_type": {
          "id": "isdocumentedby",
          "title": {
            "de": "Wird dokumentiert von",
            "en": "Is documented by"
          }
        },
        "scheme": "url"
      }
    ],
    "resource_type": {
      "id": "dataset",
      "title": {
        "de": "Datensatz",
        "en": "Dataset"
      }
    },
    "rights": [
      {
        "description": {
          "en": ""
        },
        "id": "other-at",
        "title": {
          "en": "Other (Attribution)"
        }
      }
    ],
    "subjects": [
      {
        "subject": "scholarly data"
      },
      {
        "subject": "citations"
      },
      {
        "subject": "papers"
      },
      {
        "subject": "arXiv.org"
      },
      {
        "subject": "digital libraries"
      },
      {
        "subject": "dataset"
      }
    ],
    "title": "Bibliometric-Enhanced arXiv: A Data Set for Paper-Based and Citation-Based Tasks"
  },
  "parent": {
    "access": {
      "owned_by": {
        "user": "76230"
      },
      "settings": {
        "accept_conditions_text": "<p>Ensure that your use of the data is compliant with the individual paper's license terms. For information on papers' licenses use <a href=\"https://info.arxiv.org/help/bulk_data/index.html\">arXiv's bulk metadata access</a>.</p>",
        "allow_guest_requests": true,
        "allow_user_requests": true,
        "secret_link_expiration": 0
      }
    },
    "communities": {
      "entries": [
        {
          "access": {
            "member_policy": "open",
            "members_visibility": "public",
            "record_policy": "open",
            "review_policy": "open",
            "visibility": "public"
          },
          "children": {
            "allow": false
          },
          "created": "2019-04-12T13:51:58.685710+00:00",
          "custom_fields": {},
          "deletion_status": {
            "is_deleted": false,
            "status": "P"
          },
          "id": "3eec7ae6-7230-439d-b9eb-58b23297fa67",
          "links": {},
          "metadata": {
            "curation_policy": "<p>Manual curation.</p>\r\n",
            "page": "<p>The idea of natural language processing, short NLP, is to process texts and to understand the texts&#39; conveyed meaning.</p>\r\n\r\n<p>Natural Language Understanding can be seen as a subtask of Natural Language Processing. However, those terms are often used synonymously.</p>\r\n\r\n<p>https://en.wikipedia.org/wiki/Natural_language_processing</p>",
            "title": "Natural Language Processing"
          },
          "revision_id": 0,
          "slug": "natural-language-processing",
          "updated": "2020-06-08T21:07:14.336530+00:00"
        },
        {
          "access": {
            "member_policy": "open",
            "members_visibility": "public",
            "record_policy": "open",
            "review_policy": "open",
            "visibility": "public"
          },
          "children": {
            "allow": false
          },
          "created": "2017-10-02T10:32:04.538991+00:00",
          "custom_fields": {},
          "deletion_status": {
            "is_deleted": false,
            "status": "P"
          },
          "id": "720f2dee-ca44-4843-a1f0-2545f312d7bb",
          "links": {},
          "metadata": {
            "curation_policy": "",
            "description": "A collection of datasets and papers on methods of exploration of artifacts of scientific communication and development of science (bibliometrics, webometrics, altmetrics).",
            "page": "",
            "title": "scientometrics"
          },
          "revision_id": 0,
          "slug": "bibliometrics",
          "updated": "2017-10-03T07:20:34.055432+00:00"
        },
        {
          "access": {
            "member_policy": "open",
            "members_visibility": "public",
            "record_policy": "open",
            "review_policy": "open",
            "visibility": "public"
          },
          "children": {
            "allow": false
          },
          "created": "2019-05-01T19:45:24.860028+00:00",
          "custom_fields": {},
          "deletion_status": {
            "is_deleted": false,
            "status": "P"
          },
          "id": "868c7e8c-c7dd-481a-b9a9-b8828299ffee",
          "links": {},
          "metadata": {
            "curation_policy": "<p>Curation is done manually by the community initiators.</p>\r\n",
            "page": "<p>This community is dealing with datasets, papers, code, and other resources in the scholarly field. Also resources concerning bibliometrics and scientometrics (e.g., publications&#39; metadata, publications&#39; full-texts, analyses of citation contexts, representations of publications) are welcome.</p>",
            "title": "Scholarly Data"
          },
          "revision_id": 0,
          "slug": "scholarly-data",
          "updated": "2019-11-11T09:21:51.855384+00:00"
        }
      ],
      "ids": [
        "3eec7ae6-7230-439d-b9eb-58b23297fa67",
        "720f2dee-ca44-4843-a1f0-2545f312d7bb",
        "868c7e8c-c7dd-481a-b9a9-b8828299ffee"
      ]
    },
    "id": "2553522",
    "pids": {
      "doi": {
        "client": "datacite",
        "identifier": "10.5281/zenodo.2553522",
        "provider": "datacite"
      }
    }
  },
  "pids": {
    "doi": {
      "client": "datacite",
      "identifier": "10.5281/zenodo.2609187",
      "provider": "datacite"
    },
    "oai": {
      "identifier": "oai:zenodo.org:2609187",
      "provider": "oai"
    }
  },
  "revision_id": 24,
  "stats": {
    "all_versions": {
      "data_volume": 498375660582410.0,
      "downloads": 24412,
      "unique_downloads": 3262,
      "unique_views": 4920,
      "views": 5385
    },
    "this_version": {
      "data_volume": 266726028781512.0,
      "downloads": 11652,
      "unique_downloads": 912,
      "unique_views": 916,
      "views": 985
    }
  },
  "status": "published",
  "updated": "2024-04-17T10:47:21.644411+00:00",
  "versions": {
    "index": 2,
    "is_latest": false
  }
}