TECA: Textual Entailment Catalan dataset
Description
If you use this resource in your work, please cite our latest paper:
@inproceedings{armengol-estape-etal-2021-multilingual,
title = "Are Multilingual Models the Best Choice for Moderately Under-resourced Languages? {A} Comprehensive Assessment for {C}atalan",
author = "Armengol-Estap{\'e}, Jordi and
Carrino, Casimiro Pio and
Rodriguez-Penagos, Carlos and
de Gibert Bonet, Ona and
Armentano-Oller, Carme and
Gonzalez-Agirre, Aitor and
Melero, Maite and
Villegas, Marta",
booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",
month = aug,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.findings-acl.437",
doi = "10.18653/v1/2021.findings-acl.437",
pages = "4933--4946",
}
TECA són dos subsets de TE en Català, catalan_TE1 i vilaweb_TE, que contenen 14997 i 6166 parells de premisses i hipòtesis, anotades segons la relació d'inferència que tenen (implicació, contradicció o neutra).
TECa contains two Catalan TE sub-datasets, catalan_TE1 and vilaweb_TE, containing 14997 and 6166 annotated pairs of sentences.
"Textual entailment (TE) in natural language processing is a directional relation between text fragments. The relation holds whenever the truth of one text fragment follows from another text. In the TE framework, the entailing and entailed texts are termed text (t) and hypothesis (h), respectively." From Wikpedia.
In TECa datasets, each sentence has three hypotheses, annotated as follows:
* "0": positive TE (Inference, text entails hypothesis)
* "1": non-TE (Neutral, text does not entail nor contradict)
* "2": negative TE (Contradiction, text contradicts hypothesis).
Source sentences are extracted from the Catalan Textual Corpus (https://doi.org/10.5281/zenodo.4519349), and from Vilaweb newswire.
Both sub-datasets are released under CC-by-4.0 licence.
Files
TECA_v.1.0.2.zip
Files
(1.0 MB)
Name | Size | Download all |
---|---|---|
md5:b6fa4a1e5443868f4e58918460a76883
|
1.0 MB | Preview Download |