How does the accuracy of Tree of Reviews on MuSiQue at 128K context degrade when the number of distractor pass
Description
Long-context capability is considered one of the most important abilities of LLMs, as a truly long context-capable LLM shall enable its users to effortlessly process many originally exhausting tasks -e.g., digesting a long-form document to find answers v.s., directly asking an LLM about it.However, existing realtask-based long-context evaluation benchmarks have a few major shortcomings.For instance, some Needle-in-a-Haystack-like benchmarks are too synthetic, and therefore do not represent the real world usage of LLMs.While some real-task-based benchmarks like Long-Bench avoid this problem, su
Research goal: How does the accuracy of Tree of Reviews on MuSiQue at 128K context degrade when the number of distractor passages is increased from 5 to 20, relative to chain-based retrieval, using Llama-3-128K?
Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.0/10.
Notes
Files
paper.pdf
Files
(85.2 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:4c4722aa3d911a98ec63319133605afe
|
85.2 kB | Preview Download |