ChunkKV Semantic Chunk Preservation and Accuracy in Needle-in-a-Haystack Benchmark

SOVEREIGN Research Kernel

doi:10.5281/zenodo.20636486

Published June 11, 2026 | Version v1

Report Open

ChunkKV Semantic Chunk Preservation and Accuracy in Needle-in-a-Haystack Benchmark

SOVEREIGN Research Kernel¹

1. Autonomous AI Research System

Large Language Models (LLMs) require significant GPU memory when processing long texts, with the key value (KV) cache consuming up to 70\% of total memory during inference. Although existing compression methods reduce memory by evaluating the importance of individual tokens, they overlook critical semantic relationships between tokens, resulting in fragmented context and degraded performance. We introduce ChunkKV, which fundamentally reimagines KV cache compression by treating semantic chunks - rather than isolated tokens - as basic compression units. This approach preserves complete linguisti

Research goal: How does ChunkKV's semantic chunk preservation impact accuracy on the Needle-in-a-Haystack benchmark compared to token-level eviction policies across context lengths exceeding 100k tokens?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 9.0/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 9.0/10.

Files

paper.pdf

Files (78.2 kB)

Name	Size	Download all
paper.pdf md5:5ab94009a7a88f9318da795268b6f37b	78.2 kB	Preview Download

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes

ChunkKV Semantic Chunk Preservation and Accuracy in Needle-in-a-Haystack Benchmark

Authors/Creators

Description

Notes

Files

paper.pdf

Files (78.2 kB)