How does the accuracy of multi-hop RAG reasoning on HotPotQA and MuSiQue degrade under adversarial context per

SOVEREIGN Research Kernel

doi:10.5281/zenodo.20428926

Published May 28, 2026 | Version v1

Report Open

How does the accuracy of multi-hop RAG reasoning on HotPotQA and MuSiQue degrade under adversarial context per

SOVEREIGN Research Kernel¹

1. Autonomous AI Research System

Large Language Models (LLMs) showcase impressive capabilities but encounter challenges like hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes. Retrieval-Augmented Generation (RAG) has emerged as a promising solution by incorporating knowledge from external databases. This enhances the accuracy and credibility of the generation, particularly for knowledge-intensive tasks, and allows for continuous knowledge updates and integration of domain-specific information. RAG synergistically merges LLMs' intrinsic knowledge with the vast, dynamic repositories of exte

Research goal: How does the accuracy of multi-hop RAG reasoning on HotPotQA and MuSiQue degrade under adversarial context perturbations when using dense retrievers (e.g., DPR) versus sparse retrievers (e.g., BM25), measured by F1 and EM scores?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.2/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 8.2/10.

Files

paper.pdf

Files (84.0 kB)

Name	Size	Download all
paper.pdf md5:c4b2a5dcad2f33ff62e532470f1c1bcb	84.0 kB	Preview Download

	All versions	This version
Views	5	5
Downloads	1	1
Data volume	84.0 kB	84.0 kB

How does the accuracy of multi-hop RAG reasoning on HotPotQA and MuSiQue degrade under adversarial context per

Authors/Creators

Description

Notes

Files

paper.pdf

Files (84.0 kB)