Published June 30, 2021 | Version v1
Dataset Open

Automatic translation and multilingual cultural heritage retrieval: a case study with transcriptions in Europeana (dataset)

  • 1. Europeana Foundation
  • 2. INESC-ID

Description

The dataset contains all the data required to reproduce the experiments done in the paper "Automatic translation and multilingual cultural heritage retrieval: a case study with transcriptions in Europeana", published in the 25th International Conference on Theory and Practice of Digital Libraries (TPDL'21). In that work we run an experiment using the Europeana CH digital library as a use case, and we evaluated the effectiveness of a multilingual information retrieval strategy using machine translations to English as pivot language. We used the CEF translation service (eTranslation) for the translation of queries and content to English (https://ec.europa.eu/cefdigital/wiki/display/CEFDIGITAL/eTranslation).

The dataset is also available at https://rnd-2.eanadev.org/share/crosslingual-search/, and it is organized in four main folders:

  • queries: sample of 68 queries and their translations to English. The queries were issued in languages other than English from the Europeana Portal, using the Europeana’s 1914-1918 thematic collection, between January and August 2019.
  • transcriptions: sample of 18,257 handwriting transcriptions  and its translations to English. The transcriptions are taken  from the Europeana 1914-1918 thematic collection, and obtained from the Transcribathon crowdsourcing platform (https://europeana.transcribathon.eu/).
  • solr_configuration: Apache Solr search engine configuration used in the experiments (which replicates the one used in Europeana).
  • results: manual evaluation of the query translations, and automatic evaluation of the multilingual retrieval.

 

Files

crosslingual-search.zip

Files (34.2 MB)

Name Size Download all
md5:bae6701105224fddc33da9907accd27e
34.2 MB Preview Download

Additional details

Related works

Is referenced by
Journal article: 10.1007/978-3-030-86324-1_17 (DOI)
Poster: 10.5281/zenodo.5497892 (DOI)
Report: 10.5281/zenodo.5497841 (DOI)