Automatic translation and multilingual cultural heritage retrieval: a case study with transcriptions in Europeana (dataset)
Description
The dataset contains all the data required to reproduce the experiments done in the paper "Automatic translation and multilingual cultural heritage retrieval: a case study with transcriptions in Europeana", published in the 25th International Conference on Theory and Practice of Digital Libraries (TPDL'21). In that work we run an experiment using the Europeana CH digital library as a use case, and we evaluated the effectiveness of a multilingual information retrieval strategy using machine translations to English as pivot language. We used the CEF translation service (eTranslation) for the translation of queries and content to English (https://ec.europa.eu/cefdigital/wiki/display/CEFDIGITAL/eTranslation).
The dataset is also available at https://rnd-2.eanadev.org/share/crosslingual-search/, and it is organized in four main folders:
- queries: sample of 68 queries and their translations to English. The queries were issued in languages other than English from the Europeana Portal, using the Europeana’s 1914-1918 thematic collection, between January and August 2019.
- transcriptions: sample of 18,257 handwriting transcriptions and its translations to English. The transcriptions are taken from the Europeana 1914-1918 thematic collection, and obtained from the Transcribathon crowdsourcing platform (https://europeana.transcribathon.eu/).
- solr_configuration: Apache Solr search engine configuration used in the experiments (which replicates the one used in Europeana).
- results: manual evaluation of the query translations, and automatic evaluation of the multilingual retrieval.
Files
crosslingual-search.zip
Files
(34.2 MB)
Name | Size | Download all |
---|---|---|
md5:bae6701105224fddc33da9907accd27e
|
34.2 MB | Preview Download |
Additional details
Related works
- Is referenced by
- Journal article: 10.1007/978-3-030-86324-1_17 (DOI)
- Poster: 10.5281/zenodo.5497892 (DOI)
- Report: 10.5281/zenodo.5497841 (DOI)