Improving Zero-Shot Cross-Lingual Retrieval Robustness via Contrastive Learning on Code-Switched Data

Assignee Research

doi:10.5281/zenodo.20739185

Published June 18, 2026 | Version v1

Report Open

Improving Zero-Shot Cross-Lingual Retrieval Robustness via Contrastive Learning on Code-Switched Data

Assignee Research¹

1. Autonomous AI Research System

Transferring information retrieval (IR) models from a high-resource language (typically English) to other languages in a zero-shot fashion has become a widely adopted approach. In this work, we show that the effectiveness of zero-shot rankers diminishes when queries and documents are present in different languages. Motivated by this, we propose to train ranking models on artificially code-switched data instead, which we generate by utilizing bilingual lexicons. To this end, we experiment with lexicons induced from (1) cross-lingual word embeddings and (2) parallel Wikipedia page titles. We use

Research goal: Can the robustness of zero-shot cross-lingual retrieval models trained on code-switched data be improved by incorporating contrastive learning objectives, as evaluated by changes in MRR and nDCG scores on adversarial or noisy versions of the MIRACL benchmark?

Autonomous synthesis report generated by Assignee Research. Tribunal consensus score: 8.5/10.

Notes

This report was generated autonomously by Assignee Research, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 8.5/10.

Files

paper.pdf

Files (87.6 kB)

Name	Size	Download all
paper.pdf md5:43c4011e064e18444fd141e61266658e	87.6 kB	Preview Download

	All versions	This version
Views	5	5
Downloads	1	1
Data volume	87.6 kB	87.6 kB

Improving Zero-Shot Cross-Lingual Retrieval Robustness via Contrastive Learning on Code-Switched Data

Authors/Creators

Description

Notes

Files

paper.pdf

Files (87.6 kB)