TeCla: Text Classification Catalan dataset

Carrino, Casimiro Pio; Rodriguez-Penagos, Carlos Gerardo; Armentano-Oller, Carme

doi:10.5281/zenodo.4761505

Published March 22, 2021 | Version 1.0.1

Dataset Open

TeCla: Text Classification Catalan dataset

1. BSC

If you use this resource in your work, please cite our latest paper:

@inproceedings{armengol-estape-etal-2021-multilingual,
title = "Are Multilingual Models the Best Choice for Moderately Under-resourced Languages? {A} Comprehensive Assessment for {C}atalan",
author = "Armengol-Estap{\'e}, Jordi and
Carrino, Casimiro Pio and
Rodriguez-Penagos, Carlos and
de Gibert Bonet, Ona and
Armentano-Oller, Carme and
Gonzalez-Agirre, Aitor and
Melero, Maite and
Villegas, Marta",
booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",
month = aug,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.findings-acl.437",
doi = "10.18653/v1/2021.findings-acl.437",
pages = "4933--4946",
}

Corpus de notícies en català per a classificació textual, extret del web de l'Agència Catalana de Notícies sota llicència CC-BY-NC-ND

TeCla is a Catalan News corpus for thematic Text Classification tasks. It contains 153.265 articles classified under 30 different categories.

The source data is crawled from the ACN (Catalan News Agency) site: http://www.acn.cat, and used under CC-BY-NC-ND 4.0 licence. The dataset is released under the same licence, and is intended exclusively for training Machine Learning models.

This dataset was developed by BSC TeMU as part of the AINA project, and intended as part of CLUB (Catalan Language Understanding Benchmark).

Files

TeCla_v.1.0.1.zip

Files (109.8 MB)

Name	Size	Download all
TeCla_v.1.0.1.zip md5:46f1ebe0a6a55b90a05912ad602093f2	109.8 MB	Preview Download

	All versions	This version
Views	1,639	557
Downloads	261	98
Data volume	28.4 GB	11.0 GB

TeCla: Text Classification Catalan dataset

Authors/Creators

Description

Files

TeCla_v.1.0.1.zip

Files (109.8 MB)