Benchmarking Persistent Project Memory in Local Language Models: CAG vs RAG vs DAG over 30 Sequential Tasks
Authors/Creators
Description
We present CAG-Bench, a longitudinal benchmark for evaluating how well local language models maintain project context across a sequence of 30 interdependent software development tasks. We compare three context strategies — fresh source-only retrieval (RAG), fixed-workflow generation (DAG), and Context Accumulation Generation (CAG), in which validated project decisions are written to a persistent memory store and reused on subsequent tasks. On qwen2.5-coder:7b (3 trials), CAG achieves a composite score of 48.5 vs. 29.6 (RAG) and 28.5 (DAG), with continuity recall of 54.2% vs. 17.1% and 17.0% respectively; the effect is replicated directionally at 3B scale (10 trials). Within the CAG family we evaluate four memory-selection variants: an unbounded dump (cag), a label-free deployable retriever (cag_scoped_promptonly), and two diagnostic upper bounds that use answer-key metadata in their selection logic (cag_scoped, cag_oracle_memory). The deployable retriever still substantially outperforms RAG/DAG (composite 42.6, continuity 41.0), but trails the label-informed diagnostics by ~20 percentage points of memory recall — a real and previously hidden retrieval gap. We introduce memory_usage_rate, a diagnostic metric measuring whether selected memory concepts appear in the model's final answer. Across CAG variants, memory usage declines over later project phases (62.6% → 45.8% → 38.1% for base CAG), suggesting that memory uptake — not merely memory retrieval — is a central bottleneck. Because our benchmark uses grounded task-defined memory rather than model-generated memory, these results isolate retrieval and uptake from memory-formation errors. Benchmark data, scoring code, raw outputs, and figures are released under AGPL-3.0-or-later.
Files
paper.pdf
Files
(7.2 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:aac398b70987c213c273c56e00db6a60
|
7.2 MB | Preview Download |
Additional details
Related works
- Is supplemented by
- Software: https://github.com/GuideboardLabs/cag-bench (Other)
Software
- Repository URL
- https://github.com/GuideboardLabs/cag-benchhttps://github.com/GuideboardLabs/cag-bench
- Programming language
- Python
- Development Status
- Active