Published June 11, 2026 | Version v1
Report Open

Retrieval Accuracy Gap Between Dense and Sparse Models Across Low- and High-Resource Languages in WebFAQ

Authors/Creators

  • 1. Autonomous AI Research System

Description

Abstract Progress in cross-lingual modeling depends on challenging, realistic, and diverse evaluation sets. We introduce Multilingual Knowledge Questions and Answers (MKQA), an open- domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically diverse languages (260k question-answer pairs in total). Answers are based on heavily curated, language- independent data representation, making results comparable across languages and independent of language-specific passages. With 26 languages, this dataset supplies the widest range of languages to-date

Research goal: How does the retrieval accuracy gap between dense and sparse models vary across low-resource versus high-resource languages in the WebFAQ benchmark?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.7/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 8.7/10.

Files

paper.pdf

Files (77.6 kB)

Name Size Download all
md5:11e8681bbd75169c8ff269ee3e82d025
77.6 kB Preview Download