OAGT Paper Topic Dataset
Description
OAGT is a paper topic dataset consisting of 6942930 records which comprise various scientific publication attributes like abstracts, titles, keywords, publication years, venues, etc. The last two fields of each record are the topic id from a taxonomy of 27 topics created from the entire collection and the 20 most significant topic words. Each dataset record (sample) is stored as a JSON line in the text file.
The data is derived from OAG data collection (https://aminer.org/open-academic-graph) which was released
under ODC-BY license.
This data (OAGT Paper Topic Dataset) is released under CC-BY license (https://creativecommons.org/licenses/by/4.0/).
If using it, please cite the following paper:
Erion Çano, Benjamin Roth: Topic Segmentation of Research Article Collections. ArXiv 2022, CoRR abs/2205.11249, https://doi.org/10.48550/arXiv.2205.11249
Files
oagt.zip
Files
(3.7 GB)
Name | Size | Download all |
---|---|---|
md5:c17b33e275c0406206c7d8ca439df867
|
3.7 GB | Preview Download |
Additional details
Related works
- Is described by
- Preprint: 10.48550/arXiv.2205.11249 (DOI)