How does the accuracy of multi-hop RAG reasoning on HotPotQA and MuSiQue degrade under adversarial context per
Description
Large Language Models (LLMs) showcase impressive capabilities but encounter challenges like hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes. Retrieval-Augmented Generation (RAG) has emerged as a promising solution by incorporating knowledge from external databases. This enhances the accuracy and credibility of the generation, particularly for knowledge-intensive tasks, and allows for continuous knowledge updates and integration of domain-specific information. RAG synergistically merges LLMs' intrinsic knowledge with the vast, dynamic repositories of exte
Research goal: How does the accuracy of multi-hop RAG reasoning on HotPotQA and MuSiQue degrade under adversarial context perturbations when using dense retrievers (e.g., DPR) versus sparse retrievers (e.g., BM25), measured by F1 and EM scores?
Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.2/10.
Notes
Files
paper.pdf
Files
(84.0 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:c4b2a5dcad2f33ff62e532470f1c1bcb
|
84.0 kB | Preview Download |