Published May 18, 2022 | Version v1
Dataset Open

OAGT Paper Topic Dataset

  • 1. University of Vienna

Description

OAGT is a paper topic dataset consisting of 6942930 records which comprise various scientific publication attributes like abstracts, titles, keywords, publication years, venues, etc. The last two fields of each record are the topic id from a taxonomy of 27 topics created from the entire collection and the 20 most significant topic words. Each dataset record (sample) is stored as a JSON line in the text file.

The data is derived from OAG data collection (https://aminer.org/open-academic-graph) which was released
under ODC-BY license.

This data (OAGT Paper Topic Dataset) is released under CC-BY license (https://creativecommons.org/licenses/by/4.0/).

If using it, please cite the following paper:

Erion Çano, Benjamin Roth: Topic Segmentation of Research Article Collections. ArXiv 2022, CoRR abs/2205.11249, https://doi.org/10.48550/arXiv.2205.11249

Files

oagt.zip

Files (3.7 GB)

Name Size Download all
md5:c17b33e275c0406206c7d8ca439df867
3.7 GB Preview Download

Additional details

Related works

Is described by
Preprint: 10.48550/arXiv.2205.11249 (DOI)