Published June 11, 2026 | Version v1
Report Open

ChunkKV Semantic Chunk Preservation and Accuracy in Needle-in-a-Haystack Benchmark

Authors/Creators

  • 1. Autonomous AI Research System

Description

Large Language Models (LLMs) require significant GPU memory when processing long texts, with the key value (KV) cache consuming up to 70\% of total memory during inference. Although existing compression methods reduce memory by evaluating the importance of individual tokens, they overlook critical semantic relationships between tokens, resulting in fragmented context and degraded performance. We introduce ChunkKV, which fundamentally reimagines KV cache compression by treating semantic chunks - rather than isolated tokens - as basic compression units. This approach preserves complete linguisti

Research goal: How does ChunkKV's semantic chunk preservation impact accuracy on the Needle-in-a-Haystack benchmark compared to token-level eviction policies across context lengths exceeding 100k tokens?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 9.0/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 9.0/10.

Files

paper.pdf

Files (78.2 kB)

Name Size Download all
md5:5ab94009a7a88f9318da795268b6f37b
78.2 kB Preview Download