Published August 25, 2021 | Version v2
Dataset Open

AG's news corpus (AGNEWS)

Authors/Creators

Description

AG’s news corpus (AGNEWS): This AG’s corpus of news articles was collected from the web. The whole corpus contains 496,835 categorized news articles from more than 2000 news sources. Four largest classes (World, Sports, Business and Sci/Tech) from this corpus were chosen to construct the dataset used in the experiments, using only the title and description fields

The files:
texts.txt: Document set (text). One per line.
score.txt: Document class whose index is associated with texts.txt
split_<k>.pkl:  pandas DataFrame with k-cross validation partition.

The .zip contains all aforementioned files + the tfidf representation in the CSR matrix format.

Files

agnews.zip

Files (5.3 GB)

Name Size Download all
md5:9c21ddb5ad53559db64d3951fdba2404
5.3 GB Preview Download
md5:2adb67c10940abea245a4c49abc10be0
255.2 kB Preview Download
md5:f66413c38b8321ea209603c5b10fdd6e
30.8 MB Preview Download

Additional details