Memory Archive: A Memory-Grounded Training Paradigm for Computer Use Agents
Authors/Creators
Description
Memory Archive: A Memory-Grounded Training Paradigm for Computer Use Agents
This paper introduces the Memory Archive training paradigm, an end-to-end data architecture and training pipeline that addresses the structural failures of standard Computer Use Agent (CUA) training. Currently, most CUA systems rely on behavioural cloning followed by outcome-supervised RL, leading to intent blindness and a severe representational mismatch between training and deployment formats.
The central thesis of this paradigm is Format Consistency. The system centers around a compiled task guide called 'memory.md'—a structured document containing step-by-step procedural reasoning, execution commands, and visual state references. This architecture threads this single artifact through four critical stages of the agent lifecycle:
- Pre-Training (Format Internalization): The base model learns the grammar of GUI actuation events and step-level multimodal alignment.
- Supervised Fine-Tuning (SFT): The model is trained with retrieved memories in context, treating actuation artifacts ('CommandEvent' JSONs) as first-class training targets alongside reasoning.
- Post-Training (Memory Adherence RL): Utilizes Group Relative Policy Optimization (GRPO) driven by a novel three-component reward function (Step Alignment, Visual Grounding, and Outcome Consistency) and a VLM-generated Process Reward Model (PRM).
- Inference-Time Retrieval: A two-stage retrieval stack (Bi-encoder HNSW + Cross-encoder) dynamically pulls relevant memories. The agent tracks execution deviation and autonomously compiles new 'memory.md' files upon task success, endogenously growing its own training corpus.
Furthermore, the paradigm introduces a mechanism for in-training evaluation via self-generated memories, allowing researchers to detect overfitting, underfitting, and context-awareness without relying on static external benchmarks. This document provides full mathematical formulations, data construction specifications, algorithm details, and hyperparameter guidance for implementing the architecture.
Files
memory_archive_paradigm.pdf
Files
(1.3 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:aeca3b45d2587db62e22dac1a249d085
|
1.3 MB | Preview Download |
Additional details
Related works
- Is supplemented by
- Software: https://github.com/nullvoider07/Memory-Archive (URL)
Software
- Repository URL
- https://github.com/nullvoider07/Memory-Archive
- Programming language
- Rust , Python
- Development Status
- Active
References
- Bonatti et al. WindowsAgentArena: Evaluating Multi-Modal OS Agents at Scale. arXiv:2409.08264, 2024.
- HyMEM. Hybrid Self-evolving Structured Memory for GUI Agents. arXiv:2603.10291, 2025.
- Sarch et al. (ICAL). VLM Agents Generate Their Own Memories: Distilling Experience into Embodied Programs of Thought. arXiv:2406.14596, 2024.
- Lightman et al. Let's Verify Step by Step. OpenAI / ICLR 2024, 2023.
- Luo et al. Improve Mathematical Reasoning via Automated Process Supervision. arXiv:2406.06592, 2024.
- Qin et al. UI-TARS: Pioneering Automated GUI Interaction with Native Agents. arXiv:2501.12326, 2025.
- Shao et al. DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. arXiv:2402.03300, 2024.
- SkillRL. Evolving Agents via Recursive Skill-Augmented Reinforcement Learning. arXiv:2602.08234, 2025.
- UI-R1. Enhancing GUI Agent Reasoning with Action-Focused Reinforcement Learning. 2025.
- Xie et al. OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments. arXiv:2404.07972, 2024.
- Xu et al. A-MEM: Agentic Memory for LLM Agents. arXiv:2502.12110, 2025.
- Zandieh et al. TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate. Google Research / NYU / Google DeepMind. ICLR 2026. arXiv:2504.19874, 2026.