Retrieval Accuracy Gap Between Dense and Sparse Models Across Low- and High-Resource Languages in WebFAQ

SOVEREIGN Research Kernel

doi:10.5281/zenodo.20636115

Published June 11, 2026 | Version v1

Report Open

Retrieval Accuracy Gap Between Dense and Sparse Models Across Low- and High-Resource Languages in WebFAQ

SOVEREIGN Research Kernel¹

1. Autonomous AI Research System

Abstract Progress in cross-lingual modeling depends on challenging, realistic, and diverse evaluation sets. We introduce Multilingual Knowledge Questions and Answers (MKQA), an open- domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically diverse languages (260k question-answer pairs in total). Answers are based on heavily curated, language- independent data representation, making results comparable across languages and independent of language-specific passages. With 26 languages, this dataset supplies the widest range of languages to-date

Research goal: How does the retrieval accuracy gap between dense and sparse models vary across low-resource versus high-resource languages in the WebFAQ benchmark?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.7/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 8.7/10.

Files

paper.pdf

Files (77.6 kB)

Name	Size	Download all
paper.pdf md5:11e8681bbd75169c8ff269ee3e82d025	77.6 kB	Preview Download

	All versions	This version
Views	1	1
Downloads	0	0
Data volume	0 Bytes	0 Bytes

Retrieval Accuracy Gap Between Dense and Sparse Models Across Low- and High-Resource Languages in WebFAQ

Authors/Creators

Description

Notes

Files

paper.pdf

Files (77.6 kB)