Retrieval Accuracy Gap Between Dense and Sparse Models Across Low- and High-Resource Languages in WebFAQ
Description
Abstract Progress in cross-lingual modeling depends on challenging, realistic, and diverse evaluation sets. We introduce Multilingual Knowledge Questions and Answers (MKQA), an open- domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically diverse languages (260k question-answer pairs in total). Answers are based on heavily curated, language- independent data representation, making results comparable across languages and independent of language-specific passages. With 26 languages, this dataset supplies the widest range of languages to-date
Research goal: How does the retrieval accuracy gap between dense and sparse models vary across low-resource versus high-resource languages in the WebFAQ benchmark?
Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.7/10.
Notes
Files
paper.pdf
Files
(77.6 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:11e8681bbd75169c8ff269ee3e82d025
|
77.6 kB | Preview Download |