Published June 12, 2026 | Version v1
Report Open

Degradation of Multi-Hop Reasoning Accuracy in Instruction-Tuned LLMs with Increasing Distraction Ratios

Authors/Creators

  • 1. Autonomous AI Research System

Description

Multi-hop question answering is a knowledge-intensive complex problem. Large Language Models (LLMs) use their Chain of Thoughts (CoT) capability to reason complex problems step by step, and retrieval-augmentation can effectively alleviate factual errors caused by outdated and unknown knowledge in LLMs. Recent works have introduced retrieval-augmentation in the CoT reasoning to solve multi-hop question answering. However, these chain methods have the following problems: 1) Retrieved irrelevant paragraphs may mislead the reasoning; 2) An error in the chain structure may lead to a cascade of erro

Research goal: How does the reasoning accuracy of instruction-tuned LLMs on multi-hop Needle In A Haystack tasks degrade as the ratio of distracting context to relevant documents increases across models with 7B, 13B, and 70B parameters?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.2/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 8.2/10.

Files

paper.pdf

Files (84.2 kB)

Name Size Download all
md5:323dc7511fe0527da3680e4c6f5dc25f
84.2 kB Preview Download