Published July 22, 2021
| Version 20210722a
Dataset
Open
SAGE Rejected Article Tracker Training Data
Description
The enclosed dataset shows metadata for ArXiv preprints uploaded to ArXiv in 2012.
For each preprint, there are 2 rows of search data:
- ArXiv preprint metadata plus the CrossRef API data for the correct search result (which is the metadata for the published version of that preprint).
- ArXiv preprint metadata and the metadata for the top incorrect CrossRef API search result for the title and author-names associated with the preprint.
ArXiv preprints are referred to as 'query' documents and CrossRef documents are referred to as 'match' documents.
This dataset is created using the SAGE Rejected Article Tracker and is supplementary to that project. Similar custom datasets can be created using the SAGE Rejected Article Tracker with different parameters (e.g. different timeframes).
Files
clean_training_dataframe.csv
Files
(93.1 MB)
Name | Size | Download all |
---|---|---|
md5:733428968141b531202f3266071469aa
|
93.1 MB | Preview Download |
md5:ff48dc0427da2f0d46a386beb4c79735
|
4.9 kB | Preview Download |