Published April 27, 2026 | Version v1
Preprint Open

CrystalCache: Cross-Domain Transfer from Cognitive Memory Crystallization to KV Cache Eviction in Long-Context LLMs

Authors/Creators

Description

Abstract

  The Key–Value (KV) cache of long-context Large Language Models (LLMs) grows linearly with context length and is now     the    dominant memory bottleneck of long-context inference; at 128K tokens a single batch of bf16 KV for Llama-3-8B
  already exceeds the model weights themselves. Existing eviction methods fall into two generations. The first
  generation (H2O, SnapKV, StreamingLLM, Scissorhands) summarises each token by a single scalar and evicts at token
  granularity, producing "coverage holes" over semantically coherent passages. The second generation (ChunkKV, EpiCache,
   CAOTE, DefensiveKV, PyramidKV) advances along a single axis each — fixed-size grouping, signal fusion, or robust
  aggregation of repeated observations — but none simultaneously satisfies the four structural requirements of dynamic
  semantic boundaries, two independent scoring dimensions, an explicit rarity signal, and progressive (rather than
  binary) retention.

  We propose CrystalCache, a KV-cache eviction algorithm derived from the structural predictions of the Crystallization
  Memory Framework: that any system serving a memory function should describe each item along at least two independent
  axes (analogous to a crystal's structural extent and formation strength) and should organise items as a multi-branch
  trunk rather than a single block. CrystalCache instantiates these predictions in four concurrent design moves: (1) it
  builds trunks — semantic units bounded by sentence punctuation and refined by co-attention — rather than fixed-size
  chunks or utterance clusters; (2) it scores each trunk along two independently computed dimensions, an associative
  crystallization term D (structural centrality in the trunk graph) and an encoding impact term M_i (attention salience
  plus a Von Restorff rarity term), and composes them as Score = max(D, α · normalize(log(1 + M_i))), providing two
  independent survival paths; (3) it injects an explicit token-frequency rarity signal U_i = 1 / (1 + log(1 + c_i))
  directly into the score, a signal absent from all four contemporaneous works; and (4) it replaces binary retention
  with a two-stage branch dissolution procedure that performs proportional retention between trunks and M_i-ranked
  retention within trunks.

  On Llama-3.1-8B-Instruct, Mistral-7B-Instruct-v0.3, and Qwen3-8B, across Needle-in-a-Haystack and a Delayed
  Association diagnostic at retention budgets β ∈ {0.3, 0.5}, CrystalCache wins all 3 × 2 × 2 = 12 retrieval comparisons
   against H2O, SnapKV, ChunkKV, StreamingLLM, and PyramidKV; on Qwen3-8B Needle (β = 0.5) it doubles the best baseline
  (0.333 vs. 0.167) and quadruples the weakest (vs. 0.083). Ablations identify the Von Restorff rarity term as the
  single most impactful component (−0.383 when removed), confirm that trunk-level eviction outperforms token-level
  (−0.317 when T_max = 1), and confirm that the dual-dimension max composition strictly beats either dimension alone. On
   the broader-coverage LongBench suite, CrystalCache is competitive but not leading, a trade-off we attribute to the
  spatial-coverage cost of trunk-level retention and discuss honestly as a limitation. The end-to-end system delivers
  50–70% steady-state decode memory savings; the prefill overhead (54–64% at 16K–32K) stems entirely from a CPU-NumPy
  O(n²) co-attention edge extraction and is engineering, not algorithmic.

  Beyond the empirical result, the consistency of the 12/12 cross-model, cross-task, cross-budget gains constitutes a
  computational corroboration of the structural predictions of the Crystallization Memory Framework: when a system
  serves a memory function, structural principles derived from biological memory transfer non-trivially to its design.

Files

CRYSTALCACHE__CROSS_DOMAIN_TRANSFER_FROM_COGNITIVE_MEMORY_CRYSTALLIZATION_TO_KV_CACHE_EVICTION_IN_LONG_CONTEXT_LLMS.pdf