D4.4 Report on Cross-Lingual Content Retrieval Based on Automatic Translation

Sulubacak, Umut; Koponen, Maarit; Laaksonen, Jorma; van Rijsselbergen, Dieter; Tiedemann, Jörg

doi:10.5281/zenodo.4639513

Published March 24, 2021 | Version v1

Project deliverable Open

D4.4 Report on Cross-Lingual Content Retrieval Based on Automatic Translation

1. University of Helsinki
2. Aalto University
3. Limecraft

In this deliverable, we report on our automatic content retrieval experiments and their implications for improving the discoverability of archive content, with a focus on cross-lingual retrieval, but also including our additional cross-modal retrieval tests.

First, we introduce the methods we used to simulate a realistic mixed-language media archive using the raw data from a publicly available collection of annotated images. We discuss the ways in which automatic content retrieval on this archive parallels or diverges from content search in the MeMAD prototype platform (Limecraft Flow), to clarify the extent to which they overlap. Afterwards, we describe how we further processed the data, drawing from our expertise in machine translation and image processing in order to enrich the archive, and to improve content retrieval performance. Next, we provide our experimental findings from using textual metadata translations and automatically-generated image captions to expand the metadata, as well as our tests on performing retrieval beyond using simple textual search queries. Our findings unequivocally validate the utility of metadata translations for cross-lingual content retrieval, and further encourage additional venues for cross-modal and multimodal retrieval methods. We describe these findings in detail alongside the empirical scores we have obtained from our own evaluations, and conclude the report with our general impressions and the lessons we have learned from this study.

Files

D4.4-Report on Cross-Lingual Content Retrieval Based on Automatic Translation.pdf

Files (5.9 MB)

Name	Size	Download all
D4.4-Report on Cross-Lingual Content Retrieval Based on Automatic Translation.pdf md5:2468b43c68da4ac2fb066b4a4c5419b3	5.9 MB	Preview Download

Additional details

European Commission
MeMAD - Methods for Managing Audiovisual Data: Combining Automatic Efficiency with Human Accuracy 780069

	All versions	This version
Views	124	123
Downloads	68	68
Data volume	409.9 MB	409.9 MB

D4.4 Report on Cross-Lingual Content Retrieval Based on Automatic Translation

Authors/Creators

Description

Files

D4.4-Report on Cross-Lingual Content Retrieval Based on Automatic Translation.pdf

Files (5.9 MB)

Additional details

Funding