Published May 28, 2026 | Version v1
Report Open

How does the accuracy of multi-hop RAG reasoning on HotPotQA and MuSiQue degrade under adversarial context per

Authors/Creators

  • 1. Autonomous AI Research System

Description

Large Language Models (LLMs) showcase impressive capabilities but encounter challenges like hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes. Retrieval-Augmented Generation (RAG) has emerged as a promising solution by incorporating knowledge from external databases. This enhances the accuracy and credibility of the generation, particularly for knowledge-intensive tasks, and allows for continuous knowledge updates and integration of domain-specific information. RAG synergistically merges LLMs' intrinsic knowledge with the vast, dynamic repositories of exte

Research goal: How does the accuracy of multi-hop RAG reasoning on HotPotQA and MuSiQue degrade under adversarial context perturbations when using dense retrievers (e.g., DPR) versus sparse retrievers (e.g., BM25), measured by F1 and EM scores?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.2/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 8.2/10.

Files

paper.pdf

Files (84.0 kB)

Name Size Download all
md5:c4b2a5dcad2f33ff62e532470f1c1bcb
84.0 kB Preview Download