How does the accuracy of Tree of Reviews on MuSiQue at 128K context degrade when the number of distractor pass

SOVEREIGN Research Kernel

doi:10.5281/zenodo.20437593

Published May 29, 2026 | Version v1

Report Open

How does the accuracy of Tree of Reviews on MuSiQue at 128K context degrade when the number of distractor pass

SOVEREIGN Research Kernel¹

1. Autonomous AI Research System

Long-context capability is considered one of the most important abilities of LLMs, as a truly long context-capable LLM shall enable its users to effortlessly process many originally exhausting tasks -e.g., digesting a long-form document to find answers v.s., directly asking an LLM about it.However, existing realtask-based long-context evaluation benchmarks have a few major shortcomings.For instance, some Needle-in-a-Haystack-like benchmarks are too synthetic, and therefore do not represent the real world usage of LLMs.While some real-task-based benchmarks like Long-Bench avoid this problem, su

Research goal: How does the accuracy of Tree of Reviews on MuSiQue at 128K context degrade when the number of distractor passages is increased from 5 to 20, relative to chain-based retrieval, using Llama-3-128K?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.0/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 8.0/10.

Files

paper.pdf

Files (85.2 kB)

Name	Size	Download all
paper.pdf md5:4c4722aa3d911a98ec63319133605afe	85.2 kB	Preview Download

	All versions	This version
Views	2	2
Downloads	2	2
Data volume	170.4 kB	170.4 kB

How does the accuracy of Tree of Reviews on MuSiQue at 128K context degrade when the number of distractor pass

Authors/Creators

Description

Notes

Files

paper.pdf

Files (85.2 kB)