Benchmarking Persistent Project Memory in Local Language Models: CAG vs RAG vs DAG over 30 Sequential Tasks

Canfield, Seth

doi:10.5281/zenodo.19979272

Published May 2, 2026 | Version v1

Preprint Open

Benchmarking Persistent Project Memory in Local Language Models: CAG vs RAG vs DAG over 30 Sequential Tasks

Canfield, Seth (Researcher)

We present CAG-Bench, a longitudinal benchmark for evaluating how well local language models maintain project context across a sequence of 30 interdependent software development tasks. We compare three context strategies — fresh source-only retrieval (RAG), fixed-workflow generation (DAG), and Context Accumulation Generation (CAG), in which validated project decisions are written to a persistent memory store and reused on subsequent tasks. On qwen2.5-coder:7b (3 trials), CAG achieves a composite score of 48.5 vs. 29.6 (RAG) and 28.5 (DAG), with continuity recall of 54.2% vs. 17.1% and 17.0% respectively; the effect is replicated directionally at 3B scale (10 trials). Within the CAG family we evaluate four memory-selection variants: an unbounded dump (cag), a label-free deployable retriever (cag_scoped_promptonly), and two diagnostic upper bounds that use answer-key metadata in their selection logic (cag_scoped, cag_oracle_memory). The deployable retriever still substantially outperforms RAG/DAG (composite 42.6, continuity 41.0), but trails the label-informed diagnostics by ~20 percentage points of memory recall — a real and previously hidden retrieval gap. We introduce memory_usage_rate, a diagnostic metric measuring whether selected memory concepts appear in the model's final answer. Across CAG variants, memory usage declines over later project phases (62.6% → 45.8% → 38.1% for base CAG), suggesting that memory uptake — not merely memory retrieval — is a central bottleneck. Because our benchmark uses grounded task-defined memory rather than model-generated memory, these results isolate retrieval and uptake from memory-formation errors. Benchmark data, scoring code, raw outputs, and figures are released under AGPL-3.0-or-later.

Files

paper.pdf

Files (7.2 MB)

Name	Size	Download all
paper.pdf md5:aac398b70987c213c273c56e00db6a60	7.2 MB	Preview Download

Additional details

Is supplemented by: Software: https://github.com/GuideboardLabs/cag-bench (Other)

Repository URL: https://github.com/GuideboardLabs/cag-benchhttps://github.com/GuideboardLabs/cag-bench
Programming language: Python
Development Status: Active

	All versions	This version
Views	106	106
Downloads	83	83
Data volume	705.1 MB	705.1 MB

Benchmarking Persistent Project Memory in Local Language Models: CAG vs RAG vs DAG over 30 Sequential Tasks

Authors/Creators

Description

Files

paper.pdf

Files (7.2 MB)

Additional details

Related works

Software