Think Less, Store Smarter: A Theoretical Framework for Type-Aware KV Cache Quantization in Large Reasoning Models

Nekkalapu, Raviteja

doi:10.5281/zenodo.19500668

Published April 10, 2026 | Version v4

Preprint Open

Think Less, Store Smarter: A Theoretical Framework for Type-Aware KV Cache Quantization in Large Reasoning Models

Nekkalapu, Raviteja

This paper introduces the Think-Answer Quantization Gap (TAQG), a theoretical framework proving that uniform KV cache quantization is provably suboptimal for large reasoning models whenever think-phase and answer-phase tokens differ in pairwise cosine redundancy. The framework is direction-agnostic: it prescribes fewer bits for whichever phase exhibits higher redundancy. Empirical validation on DeepSeek-R1-Distill-Qwen-1.5B reveals a surprising model-size-dependent redundancy reversal, where answer-phase tokens exhibit higher redundancy than think-phase tokens - opposite to findings on the full 671B model. Code and experimental data are included.

Files

taqg_paper.pdf

Files (556.5 kB)

Name	Size	Download all
taqg_paper.pdf md5:dacadcd1a809bb823395225539b63c34	289.7 kB	Preview Download
TAQG_Zenodo_v3.zip md5:4f85b7343c747ba1533cbbd2e9fcf140	266.8 kB	Preview Download

107

Views

Downloads

Show more details

	All versions	This version
Views	107	3
Downloads	86	1
Data volume	103.5 MB	289.7 kB

More info on how stats are collected....

DOI

Resource type

Preprint

Publisher

Zenodo

Languages

English

License: MIT License

A short and simple permissive license with conditions only requiring preservation of copyright and license notices. Licensed works, modifications, and larger works may be distributed under different terms and without source code. Read more

Technical metadata

Created: April 10, 2026
Modified: April 10, 2026

Think Less, Store Smarter: A Theoretical Framework for Type-Aware KV Cache Quantization in Large Reasoning Models

Authors/Creators

Description

Files

taqg_paper.pdf

Files (556.5 kB)