Dataset Open Access

SAGE Rejected Article Tracker Training Data

Adam Day

The enclosed dataset shows metadata for ArXiv preprints uploaded to ArXiv in 2012.

For each preprint, there are 2 rows of search data:

  • ArXiv preprint metadata plus the CrossRef API data for the correct search result (which is the metadata for the published version of that preprint).
  • ArXiv preprint metadata and the metadata for the top incorrect CrossRef API search result for the title and author-names associated with the preprint.

ArXiv preprints are referred to as 'query' documents and CrossRef documents are referred to as 'match' documents.

This dataset is created using the SAGE Rejected Article Tracker and is supplementary to that project. Similar custom datasets can be created using the SAGE Rejected Article Tracker with different parameters (e.g. different timeframes).

Files (93.1 MB)
Name Size
clean_training_dataframe.csv
md5:733428968141b531202f3266071469aa
93.1 MB Download
Readme.md
md5:ff48dc0427da2f0d46a386beb4c79735
4.9 kB Download
44
13
views
downloads
All versions This version
Views 4444
Downloads 1313
Data volume 651.9 MB651.9 MB
Unique views 4242
Unique downloads 99

Share

Cite as