Degradation of Multi-Hop Reasoning Accuracy in Instruction-Tuned LLMs with Increasing Distraction Ratios

SOVEREIGN Research Kernel

doi:10.5281/zenodo.20658485

Published June 12, 2026 | Version v1

Report Open

Degradation of Multi-Hop Reasoning Accuracy in Instruction-Tuned LLMs with Increasing Distraction Ratios

SOVEREIGN Research Kernel¹

1. Autonomous AI Research System

Multi-hop question answering is a knowledge-intensive complex problem. Large Language Models (LLMs) use their Chain of Thoughts (CoT) capability to reason complex problems step by step, and retrieval-augmentation can effectively alleviate factual errors caused by outdated and unknown knowledge in LLMs. Recent works have introduced retrieval-augmentation in the CoT reasoning to solve multi-hop question answering. However, these chain methods have the following problems: 1) Retrieved irrelevant paragraphs may mislead the reasoning; 2) An error in the chain structure may lead to a cascade of erro

Research goal: How does the reasoning accuracy of instruction-tuned LLMs on multi-hop Needle In A Haystack tasks degrade as the ratio of distracting context to relevant documents increases across models with 7B, 13B, and 70B parameters?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.2/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 8.2/10.

Files

paper.pdf

Files (84.2 kB)

Name	Size	Download all
paper.pdf md5:323dc7511fe0527da3680e4c6f5dc25f	84.2 kB	Preview Download

	All versions	This version
Views	3	3
Downloads	0	0
Data volume	0 Bytes	0 Bytes

Degradation of Multi-Hop Reasoning Accuracy in Instruction-Tuned LLMs with Increasing Distraction Ratios

Authors/Creators

Description

Notes

Files

paper.pdf

Files (84.2 kB)