Published July 22, 2021 | Version 20210722a
Dataset Open

SAGE Rejected Article Tracker Training Data

Creators

  • 1. SAGE Publishing

Description

The enclosed dataset shows metadata for ArXiv preprints uploaded to ArXiv in 2012.

For each preprint, there are 2 rows of search data:

  • ArXiv preprint metadata plus the CrossRef API data for the correct search result (which is the metadata for the published version of that preprint).
  • ArXiv preprint metadata and the metadata for the top incorrect CrossRef API search result for the title and author-names associated with the preprint.

ArXiv preprints are referred to as 'query' documents and CrossRef documents are referred to as 'match' documents.

This dataset is created using the SAGE Rejected Article Tracker and is supplementary to that project. Similar custom datasets can be created using the SAGE Rejected Article Tracker with different parameters (e.g. different timeframes).

Files

clean_training_dataframe.csv

Files (93.1 MB)

Name Size Download all
md5:733428968141b531202f3266071469aa
93.1 MB Preview Download
md5:ff48dc0427da2f0d46a386beb4c79735
4.9 kB Preview Download