There is a newer version of the record available.

Published February 9, 2021 | Version v1
Dataset Open

XQuAD-ca

Description

Professional translation of XQuAD into Catalan

XQuAD (Cross-lingual Question Answering Dataset) is a benchmark dataset for evaluating cross-lingual question answering performance. The dataset consists of a subset of 240 paragraphs and 1190 question-answer pairs from the development set of SQuAD v1.1 (Rajpurkar et al., 2016) together with their professional translations into ten languages: Spanish, German, Greek, Russian, Turkish, Arabic, Vietnamese, Thai, Chinese, and Hindi. Rumanian was added later. We added the 13th language to the corpus using also native, professional catalan translators.

For more information on how XQuAD was created, refer to the paper, On the Cross-lingual Transferability of Monolingual Representations (https://arxiv.org/abs/1910.11856), or visit the webpage https://github.com/deepmind/xquad

Translation into Catalan was commissioned by BSC TeMU (https://temu.bsc.es/) within the AINA project. 

Files

XQuAD-ca.zip

Files (137.8 kB)

Name Size Download all
md5:8c0727616a378e95377b5d0cd2d80087
137.8 kB Preview Download