Published May 2, 2026 | Version v1
Preprint Open

Benchmarking Persistent Project Memory in Local Language Models: CAG vs RAG vs DAG over 30 Sequential Tasks

Description

We present CAG-Bench, a longitudinal benchmark for evaluating how well local language models maintain project context across a sequence of 30 interdependent software development tasks. We compare three context strategies — fresh source-only retrieval (RAG), fixed-workflow generation (DAG), and Context Accumulation Generation (CAG), in which validated project decisions are written to a persistent memory store and reused on subsequent tasks. On qwen2.5-coder:7b (3 trials), CAG achieves a composite score of 48.5 vs. 29.6 (RAG) and 28.5 (DAG), with continuity recall of 54.2% vs. 17.1% and 17.0% respectively; the effect is replicated directionally at 3B scale (10 trials). Within the CAG family we evaluate four memory-selection variants: an unbounded dump (cag), a label-free deployable retriever (cag_scoped_promptonly), and two diagnostic upper bounds that use answer-key metadata in their selection logic (cag_scopedcag_oracle_memory). The deployable retriever still substantially outperforms RAG/DAG (composite 42.6, continuity 41.0), but trails the label-informed diagnostics by ~20 percentage points of memory recall — a real and previously hidden retrieval gap. We introduce memory_usage_rate, a diagnostic metric measuring whether selected memory concepts appear in the model's final answer. Across CAG variants, memory usage declines over later project phases (62.6% → 45.8% → 38.1% for base CAG), suggesting that memory uptake — not merely memory retrieval — is a central bottleneck. Because our benchmark uses grounded task-defined memory rather than model-generated memory, these results isolate retrieval and uptake from memory-formation errors. Benchmark data, scoring code, raw outputs, and figures are released under AGPL-3.0-or-later.

Files

paper.pdf

Files (7.2 MB)

Name Size Download all
md5:aac398b70987c213c273c56e00db6a60
7.2 MB Preview Download

Additional details

Related works

Is supplemented by
Software: https://github.com/GuideboardLabs/cag-bench (Other)