Multi-label Datasets used in "Adapting Transformers for Multi-Label Text Classification"

doi:10.5281/zenodo.7298581

Published March 10, 2022 | Version v2

Dataset Open

Multi-label Datasets used in "Adapting Transformers for Multi-Label Text Classification"

1. Fallah
2. Bellot
3. Bruno
4. Murisasco

The three Multi-Label datasets used in the article "Adapting Transformers for Multi-Label Text Classification".

- AAPD Dataset (ArXiv Academic Paper Dataset) [Yang et al. 2018]¹

- Reuters-21578 Dataset: https://archive.ics.uci.edu/ml/datasets/reuters-21578+text+categorization+collection

- MFHAD (Multilabel French HAL Abstracts Dataset)

¹Pengcheng Yang, Xu Sun, Wei Li, Shuming Ma, Wei Wu, and Houfeng Wang. 2018.
SGM: Sequence Generation Model for Multi-label Classification. In Proceedings
of the 27th International Conference on Computational Linguistics. Association for
Computational Linguistics, Santa Fe, New Mexico, USA, 3915–3926.

Files

AAPD.zip

Files (27.3 MB)

Name	Size	Download all
AAPD.zip md5:11b9a8782c3b017f31b932afcb2a1eeb	18.1 MB	Preview Download
MFHAD.zip md5:bd3e4f97480144f35fc3c0ae7bb58ea2	6.0 MB	Preview Download
Reuters-21578.zip md5:47e9dc181f1446e742c91d519987531e	3.3 MB	Preview Download

900

Views

317

Downloads

Show more details

	All versions	This version
Views	900	119
Downloads	317	34
Data volume	5.2 GB	439.7 MB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

Languages

English

Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: November 7, 2022
Modified: November 7, 2022

Multi-label Datasets used in "Adapting Transformers for Multi-Label Text Classification"

Creators

Description

Files

AAPD.zip

Files (27.3 MB)