Published May 14, 2026 | Version v1.0

Memory Archive: A Memory-Grounded Training Paradigm for Computer Use Agents

Authors/Creators

Description

Memory Archive: A Memory-Grounded Training Paradigm for Computer Use Agents

This paper introduces the Memory Archive training paradigm, an end-to-end data architecture and training pipeline that addresses the structural failures of standard Computer Use Agent (CUA) training. Currently, most CUA systems rely on behavioural cloning followed by outcome-supervised RL, leading to intent blindness and a severe representational mismatch between training and deployment formats.

The central thesis of this paradigm is Format Consistency. The system centers around a compiled task guide called 'memory.md'—a structured document containing step-by-step procedural reasoning, execution commands, and visual state references. This architecture threads this single artifact through four critical stages of the agent lifecycle:

  • Pre-Training (Format Internalization): The base model learns the grammar of GUI actuation events and step-level multimodal alignment.
  • Supervised Fine-Tuning (SFT): The model is trained with retrieved memories in context, treating actuation artifacts ('CommandEvent' JSONs) as first-class training targets alongside reasoning.
  • Post-Training (Memory Adherence RL): Utilizes Group Relative Policy Optimization (GRPO) driven by a novel three-component reward function (Step Alignment, Visual Grounding, and Outcome Consistency) and a VLM-generated Process Reward Model (PRM).
  • Inference-Time Retrieval: A two-stage retrieval stack (Bi-encoder HNSW + Cross-encoder) dynamically pulls relevant memories. The agent tracks execution deviation and autonomously compiles new 'memory.md' files upon task success, endogenously growing its own training corpus.

Furthermore, the paradigm introduces a mechanism for in-training evaluation via self-generated memories, allowing researchers to detect overfitting, underfitting, and context-awareness without relying on static external benchmarks. This document provides full mathematical formulations, data construction specifications, algorithm details, and hyperparameter guidance for implementing the architecture.

Files

memory_archive_paradigm.pdf

Files (1.3 MB)

Name Size Download all
md5:aeca3b45d2587db62e22dac1a249d085
1.3 MB Preview Download

Additional details

Related works

Is supplemented by
Software: https://github.com/nullvoider07/Memory-Archive (URL)

Software

Repository URL
https://github.com/nullvoider07/Memory-Archive
Programming language
Rust , Python
Development Status
Active

References

  • Bonatti et al. WindowsAgentArena: Evaluating Multi-Modal OS Agents at Scale. arXiv:2409.08264, 2024.
  • HyMEM. Hybrid Self-evolving Structured Memory for GUI Agents. arXiv:2603.10291, 2025.
  • Sarch et al. (ICAL). VLM Agents Generate Their Own Memories: Distilling Experience into Embodied Programs of Thought. arXiv:2406.14596, 2024.
  • Lightman et al. Let's Verify Step by Step. OpenAI / ICLR 2024, 2023.
  • Luo et al. Improve Mathematical Reasoning via Automated Process Supervision. arXiv:2406.06592, 2024.
  • Qin et al. UI-TARS: Pioneering Automated GUI Interaction with Native Agents. arXiv:2501.12326, 2025.
  • Shao et al. DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. arXiv:2402.03300, 2024.
  • SkillRL. Evolving Agents via Recursive Skill-Augmented Reinforcement Learning. arXiv:2602.08234, 2025.
  • UI-R1. Enhancing GUI Agent Reasoning with Action-Focused Reinforcement Learning. 2025.
  • Xie et al. OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments. arXiv:2404.07972, 2024.
  • Xu et al. A-MEM: Agentic Memory for LLM Agents. arXiv:2502.12110, 2025.
  • Zandieh et al. TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate. Google Research / NYU / Google DeepMind. ICLR 2026. arXiv:2504.19874, 2026.