Published January 16, 2026 | Version v1
Preprint Open

Implementation of Persistent Latent Memory for Decoder Transformers

Authors/Creators

Description

Persistent memory is crucial for enabling Large Language Models (LLMs) to retain and expand knowledge over the long term, eyond the limits of a restricted context window. This work builds upon the theoretical Neuromorphic Cognitive Architecture and presents an implementation of persistent latent memory for Transformers. The memory combines a sharp representation of memory traces as latent vector centers (LTM: 64-dimensional keys, STM: 16-dimensional keys) with a compressed 3D terrain (483), in which information diffuses and is homeostatically balanced. A two-phase reading mechanism utilizes a TerrainPrior module (3D prior) and MemoryAttention (RBF kernel attention) with controlled integration into the decoder (gating). Memory writing occurs segment-wise and is weighted by a combination of novelty, prediction error, and emotional salience; we included separate short-term (STM) and long-term memory (LTM) with periodic consolidation ("sleep") instead of hard deletion. In the experimental section, we simulate long-term operation (on the order of months) using synthetic data and measure key metrics: retention (information preservation), interference (mixing of traces), growth of memory centers, fatigue (need for consolidation), and the evolution of the distributed memory terrain (H3). The results show that the proposed memory can preserve knowledge outside the context window for thousands of interactions without significant degradation.
An ablation study confirms the benefit of diffusion (eliminates local saturation) and the STM layer (filters noise before writing to LTM), while the TerrainPrior module surprisingly did not yield an improvement in retrieval accuracy. We discuss the implications of these findings and outline further research directions – particularly the integration of memory into full LLMs and lifelong learning without retraining the model’s main weights. The main contributions of this work are two operational invariants that are key for long-term LLM memory:
• Constant data size: The memory module is pre-allocated to a fixed volume (the entire structure is ∼500 MB in the prototype architecture, of which effectively used data is ∼1.3 MB/user) and does not increase with operating time or the amount of stored data.
• Constant low read/write latency: Read/write operations occur in microseconds and are independent of the memory "age" (number of interactions) and the volume of stored data, thanks to fixed capacity and local computation.

Files

Cognitive_memory_tests_EN.pdf

Files (5.2 MB)

Name Size Download all
md5:72b2933fba082bafedc36bec847faf16
5.2 MB Preview Download
md5:b8c9a441f4261922c4be36fbaea8126c
14.3 kB Preview Download
md5:7031d34e35167bb5554f4b31c3a26606
20.1 kB Preview Download

Additional details

Dates

Other
2026-01-16