There is a newer version of the record available.

Published June 8, 2020 | Version version 1.0.0
Dataset Open

Wiki-MLM: Multiple Languages and Modalities

  • 1. University of Bonn
  • 2. Technical Information Library (TIB)
  • 3. Jožef Stefan Institute (JSI)

Description

International organizations and companies encounter web data in a range of modalities and languages. At the same time, applications developed by these users result from pipelines that perform multiple tasks. Systems that handle diverse inputs and multiple objectives hold the promise of limiting complexity in the application work-flow and improving generalization. We present a Wikidata-generated re-source designed to train and evaluate multitask systems on samples in four modalities and three languages

Files

MLM_v1_eu.zip

Files (14.0 GB)

Name Size Download all
md5:a000a8f2f6c8b0b860f7e06cd6fb1d07
1.3 GB Preview Download
md5:a5c41775e451fdba3177be8f1c1be72c
12.7 GB Preview Download

Additional details

Funding

Cleopatra – Cross-lingual Event-centric Open Analytics Research Academy 812997
European Commission