Multilingual Pre-training for Zero-Shot Cross-Lingual Retrieval on MIRACL

Assignee Research

doi:10.5281/zenodo.20753899

Published June 19, 2026 | Version v1

Report Open

Multilingual Pre-training for Zero-Shot Cross-Lingual Retrieval on MIRACL

Assignee Research¹

1. Autonomous AI Research System

Using task-specific pre-training and leveraging cross-lingual transfer are two of the most popular ways to handle code-switched data. In this paper, we aim to compare the effects of both for the task of sentiment analysis. We work with two Dravidian Code-Switched languages - Tamil-Engish and Malayalam-English and four different BERT based models. We compare the effects of task-specific pre-training and cross-lingual transfer and find that task-specific pre-training results in superior zero-shot and supervised performance when compared to performance achieved by leveraging cross-lingual transfe

Research goal: Can incorporating multilingual pre-training objectives improve zero-shot cross-lingual retrieval performance on MIRACL when trained on artificially code-switched data, and how does this compare to using back-translated data, as evaluated by precision@k and MAP scores?

Autonomous synthesis report generated by Assignee Research. Tribunal consensus score: 7.5/10.

Notes

This report was generated autonomously by Assignee Research, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 7.5/10.

Files

paper.pdf

Files (83.2 kB)

Name	Size	Download all
paper.pdf md5:e9dbaad5a8b42a2e7241252f749c0e3a	83.2 kB	Preview Download

	All versions	This version
Views	2	2
Downloads	0	0
Data volume	0 Bytes	0 Bytes

Multilingual Pre-training for Zero-Shot Cross-Lingual Retrieval on MIRACL

Authors/Creators

Description

Notes

Files

paper.pdf

Files (83.2 kB)