Published May 12, 2026 | Version v4.18.0

Autonomous Development Skills Suite: The Memory Model for AI Agents

Description

Autonomous Development Skills Suite

AI agents forget everything between sessions β€” and many things within a session. No memory of what was tried. No memory of why a decision was made. No memory of where you were heading. Forgets what it already created.

These five skills fix that. Together they form a Memory Model β€” a persistent layer of context that survives session resets and model swaps. The agent reads it before every run. You read it to stay in control.

These are the skills I use daily as a software engineer to safely delegate complex goals to AI agents. When an agent runs without constraints, it creates massive technical debt. These skills force it to stay on track, double-check its assumptions, and leave a clear record of why it made each change.

The Suite Improved Itself 200+ iterations

The suite ran on itself 200+ times. Along the way, it autonomously decided to re-write itself from scratch. Twice.

Convergence was declared only when three independent evaluators from distinct model families (Claude, Gpt, Gemini) each ran the loop and found nothing left to change. The full evidence trail is in .trail/log.md.

> "LLMs struggle to self-correct their responses without external feedback, and at times, their performance even degrades after self-correction." > > β€” Jie Huang et al., Large Language Models Cannot Self-Correct Reasoning Yet (ICLR 2024)

If the loop can't improve itself, the claim that it improves anything else is empty. It can.

The Skills

| Skill πŸ› οΈ | Problem ⚠️ | Solution βœ… | | :--- | :--- | :--- | | πŸ›‘οΈ Intent | The agent did what you said - not what you meant | Force the agent to understand the intent behind your prompt | | πŸ‘οΈ Vision | The agent doesn't know your vision - because it's in your head | The agent will read your mind, uncover your vision and produce vision.md that other skills will use | | πŸ“œ Trail | The work is unauditable | Logs every autonomous decision made by the agent and the reason behind it | | βš”οΈ Improve | The agent makes superficial, undisciplined edits | A structured, iterative improvement loop that reflects and learns before acting | | πŸ—ΊοΈ Retrospect | The agent can't see its own arc | Self-evaluates the progress of all iterations and determines what is next |

Validation skill

πŸ§ͺ Probe β€” included for research and validation use. Constructs a "spot the difference" test to measure whether the agent is genuinely reasoning or pattern-matching. Used to validate Autonomous Reasoning Fidelity β€” not a skill you'd run in daily development.

The Memory Model

Each skill externalizes what normally only lives inside a single model session β€” the goal, the destination, the decisions, the arc. Together they form a persistent memory layer that no model reset can erase.

The files (.trail/log.md, .trail/vision.md, .trail/retrospect.md) provide the literal storage, but the interaction of the skills with those files creates contextual awareness.

Memory alone is just retrieval; awareness is orientation. Because Retrospect reads the arc, Vision uncovers the destination, and Intent aligns the goal, the suite uses that memory to understand where it is and where it is going.

When you swap from Claude to Gpt to Gemini, the next model picks up this exact orientation. That accumulation is what makes the suite get smarter over time.

Why These Skills Exist

#1: INTENT - The agent did what you wrote - not what you meant

Problem: The agent did literally exactly what you wrote - word-by-word - not what you actually meant. Solution: Intent forces the agent to explicitly state its interpretation of your task before executing anything. It acts as an early warning system for misaligned assumptions.

Rooted in Commander's Intent (U.S. Army doctrine) Β· Coaching Kata (Mike Rother, Toyota Kata) Β· Socratic Method (Stanford Encyclopedia of Philosophy)

#2: VISION - The Agent Drifted Over Time

Problem: During a long autonomous run, the agent loses the plot, fixing minor issues rather than addressing the core architectural problem. Solution: Vision surfaces the agent's implicit assumptions about your destination, letting you course-correct early. Retrospect steps back, analyzes the full history of the work, and re-orients the loop.

> "No-one knows exactly what they want." > > β€” David Thomas & Andrew Hunt, The Pragmatic Programmer

#3: TRAIL - The Work is Unauditable

Problem: The agent modified dozens of files. You have no idea why it chose one implementation over another, making it impossible to confidently take ownership of the code. Solution: Trail enforces observable autonomy. Every decision, rationale, and discarded alternative is appended to a readable .trail/log.md. If it isn't logged, it didn't happen.

> "Without data, you're just another person with an opinion." > > β€” W. Edwards Deming

#4: IMPROVE - The Agent Makes Superficial Edits

Problem: The agent edits what's easy. Typos, whitespace, obvious renames, writes tests. The real problems β€” stay untouched. Solution: Improve is the workhorse of this suite. Point it at any target and run it repeatedly. Each iteration: it examines what's there, challenges its own first instinct, makes exactly one high-leverage change, and reflects. It reads the full memory suite before every run β€” so it never wastes an iteration on something already tried.

> "Invest in the design of the system every day." > > β€” Kent Beck, Extreme Programming Explained

#5: RETROSPECT - The Agent Can't See Its Own Arc

Problem: After 50 iterations, the agent has been diligently improving β€” but nobody stepped back to ask whether those 50 iterations were solving the right problem. Each step looked locally optimal. The overall arc drifted. Solution: Retrospect reads the entire trail history as a single document and forms arc-level claims: what is the target becoming, where has the loop's attention been, and is that where the real weight lies? It surfaces what no individual iteration would reveal.

> "Life can only be understood backwards; but it must be lived forwards." > > β€” SΓΈren Kierkegaard, Journals (1843)

The Workflow

1. Set the Target: Run vision first to determine the destination before starting work. 2. Execute: Run improve for X amount of iterations until you reach a plateau. 3. Reflect: Run retrospect to evaluate the entire loop history and reflect on progress.

Quickstart

1. Read INSTALLING.md for configuration instructions. 2. Copy the skill directories (intent/, vision/, improve/, trail/, retrospect/) into your repository's .copilot/skills/ folder. 3. Start assigning verifiable, autonomous tasks.

Reference

- Convergence: The agent loop converges only when 3 independent models (e.g., Claude, Gpt, Gemini) confirm no further improvements exist. - Principles: Built on the Autonomous Agent Principles.

Known Limitation: Stated Reasoning β‰  True Reasoning

Trail logs what the agent says it decided. Research shows this is not always the same as what actually drove the decision.

> "CoT explanations can be plausible yet misleading, which risks increasing our trust in LLMs without guaranteeing their safety." > > β€” Miles Turpin et al., Language Models Don't Always Say What They Think (NeurIPS 2023)

> "CoT monitoring is a promising way of noticing undesired behaviors during training and evaluations, but that it is not sufficient to rule them out." > > β€” Yanda Chen et al., Reasoning Models Don't Always Say What They Think (2025)

How this suite mitigates it (The Rationalization Loop Mitigations): To prevent LLMs from generating post-hoc justifications to fit decisions already made (the core threat identified in the research above), the suite enforces structural constraints: 1. Pre-commit prediction (Improve, Trail): The agent must record a falsifiable prediction of what a change will and will not achieve before acting or observing the actual outcome. 2. Outcome anchoring (Retrospect): Subsequent arc-reads systematically evaluate actual outcomes against those prior pre-commit predictions to expose localized confabulation. 3. Reversal density (Trail, Retrospect): A uniform, unbroken trail of "successes" is actively flagged as suspect rationalization. True reasoning leaves a trail of reversals, dead ends, and tested predictions. 4. Adversarial Audit (Retrospect): A dedicated lens to actively hunt for outcome mismatch and logical discontinuities across the trail history. 5. Separating Writer and Decider (Improve, Trail): In maximum-trust sequences (High-Fidelity Mode), the agent making the change is procedurally forbidden from writing the final trail for the change, handing off the raw artifact evidence to a second independent evaluator.

Together, these force the agent to lock its reasoning before acquiring evidence, and introduces explicit adversarial structures to break the post-hoc rationalization loop.

Citation & License

MIT License. CITATION.cff | Zenodo: 10.5281/zenodo.19842994

Files

ntholm86/principles-of-earned-autonomy-skills-suite-v4.18.0.zip

Files (834.2 kB)

Additional details

Related works

Is derived from
Software: 10.5281/zenodo.19732890 (DOI)