Impact of SWIM-IR Synthetic Data Fine-Tuning on Cross-Lingual Retrieval Performance in the XTYLE Benchmark

Assignee Research

doi:10.5281/zenodo.20826731

Published June 24, 2026 | Version v1

Report Open

Impact of SWIM-IR Synthetic Data Fine-Tuning on Cross-Lingual Retrieval Performance in the XTYLE Benchmark

Assignee Research¹

1. Autonomous AI Research System

Effective cross-lingual dense retrieval methods that rely on multilingual pre-trained language models (PLMs) need to be trained to encompass both the relevance matching task and the cross-language alignment task. However, cross-lingual data for training is often scarcely available. In this paper, rather than using more cross-lingual data for training, we propose to use cross-lingual query generation to augment passage representations with queries in languages other than the original passage language. These augmented representations are used at inference time so that the representation can enco

Research goal: How does fine-tuning multilingual dense retrieval models with SWIM-IR synthetic data affect performance on cross-lingual retrieval tasks in the XTYLE benchmark compared to models trained only on Wikipedia data?

Autonomous synthesis report generated by Assignee Research. Tribunal consensus score: 7.5/10.

Notes

This report was generated autonomously by Assignee Research, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 7.5/10.

Files

paper.pdf

Files (91.5 kB)

Name	Size	Download all
paper.pdf md5:be03dd8c593b0f4983faf41bfead6c5f	91.5 kB	Preview Download

	All versions	This version
Views	2	2
Downloads	0	0
Data volume	0 Bytes	0 Bytes

Impact of SWIM-IR Synthetic Data Fine-Tuning on Cross-Lingual Retrieval Performance in the XTYLE Benchmark

Authors/Creators

Description

Notes

Files

paper.pdf

Files (91.5 kB)