Published June 8, 2020 | Version version 1.0.0
Dataset Open

Wiki-MLM: Multiple Languages and Modalities

  • 1. University of Bonn
  • 2. Technical Information Library (TIB)
  • 3. Jožef Stefan Institute (JSI)

Description

International organizations and companies encounter web data in a range of modalities and languages. At the same time, applications developed by these users result from pipelines that perform multiple tasks. Systems that handle diverse inputs and multiple objectives hold the promise of limiting complexity in the application work-flow and improving generalization. We present a Wikidata-generated re-source designed to train and evaluate multitask systems on samples in four modalities and three languages

Files

data-description.txt

Files (14.0 GB)

Name Size Download all
md5:e0370476d992138e21926f664ecbcaf5
330 Bytes Preview Download
md5:f9f2f3a63b1ffffd9e872953781b427f
1.2 kB Preview Download
md5:3db2764b0d2fda1feee893c5e198f0c6
1.8 kB Preview Download
md5:ecf66b26ab5959feb025d499adde1fa5
12.7 GB Preview Download
md5:a000a8f2f6c8b0b860f7e06cd6fb1d07
1.3 GB Preview Download
md5:860ab7404e4c9069ce348c89d319a182
417 Bytes Preview Download

Additional details

Funding

Cleopatra – Cross-lingual Event-centric Open Analytics Research Academy 812997
European Commission