Published February 29, 2024 | Version v1
Dataset Open

Multi30k_train-ca

  • 1. ROR icon Barcelona Supercomputing Center

Description

Multi30k_train-ca dataset is a professional translation of the train.en.multi30k dataset into Catalan, commissioned by BSC LangTech Unit. 

The Flickr30k is a dataset for sentence-based image description. It includes 31,000 images collected from Flickr, together with 5 reference captions provided by human annotators (https://paperswithcode.com/dataset/flickr30k).

This work was funded by the Departament de la Vicepresidència i de Polítiques Digitals i Territori de la Generalitat de Catalunya within the framework of Projecte AINA.

Files

train.ca.multi30k.txt

Files (2.4 MB)

Name Size Download all
md5:f66e61881719ed30f95e37548935b67d
2.4 MB Preview Download