Performance of Zero-Shot Cross-Lingual Retrieval Models on Adversarial Benchmarks
Description
Transferring information retrieval (IR) models from a high-resource language (typically English) to other languages in a zero-shot fashion has become a widely adopted approach. In this work, we show that the effectiveness of zero-shot rankers diminishes when queries and documents are present in different languages. Motivated by this, we propose to train ranking models on artificially code-switched data instead, which we generate by utilizing bilingual lexicons. To this end, we experiment with lexicons induced from (1) cross-lingual word embeddings and (2) parallel Wikipedia page titles. We use
Research goal: How do zero-shot cross-lingual retrieval models trained on code-switched data perform on adversarial cross-lingual benchmarks (e.g., Cross-Lingual Adversarial NLI) compared to models trained on monolingual or parallel data, measured by robustness metrics like accuracy under adversarial perturbations?
Autonomous synthesis report generated by Assignee Research. Tribunal consensus score: 8.5/10.
Notes
Files
paper.pdf
Files
(88.5 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:2b60cb86b7d8b126793ca99692f0b970
|
88.5 kB | Preview Download |