Operational Self-Improvement in a Frozen 14B Language Model on Consumer Hardware: Autonomous Reasoning Constraint Generation, Architectural Diagnosis, and the MERRCURR Pipeline
Description
We present evidence that a frozen 14-billion parameter language model on consumer hardware (Apple Mac Mini M4, 24GB, under $800) can autonomously identify its own reasoning failures, draft corrective constraints, validate them against a regression battery, and deploy them with principal approval. The system produced two autonomously promoted reasoning constraints through the full MERRCURR pipeline, across two independent error classes: dependency assumptions (error recurrence 3→0, Poisson p=0.0498) and meeting occurrence assumptions (clean validation, delta +2, no regressions). A statistically significant discovery emerged: the frozen model responds to reasoning constraints but not to formatting instructions (within-probe dissociation: 88–100% reasoning accuracy vs 0% labelling compliance, Fisher p<0.001). Bare model ablation establishes a 66 percentage point architecture contribution (28%→94%, Fisher p<0.001). Fourth paper in the ATLAS research programme on sovereign AI self-improvement without fine-tuning, weight modification, or cloud dependency.
Files
ATLAS_Paper_4_v7.pdf
Files
(40.9 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:2c620113822576e52d6d6a94f6cb1ca1
|
40.9 kB | Preview Download |
Additional details
Related works
- Is supplement to
- Preprint: 10.5281/zenodo.19427878 (DOI)
- Preprint: 10.5281/zenodo.19435861 (DOI)
- Preprint: 10.5281/zenodo.19448879 (DOI)
References
- Jaber, F. C. (2026). Positional Restructuring of System Prompts. Zenodo. DOI: 10.5281/zenodo.19427878 | Jaber, F. C. (2026). Calibrated Self-Assessment in Sub-Frontier Language Models. Zenodo. DOI: 10.5281/zenodo.19435861 | Jaber, F. C. (2026). MERRCURR: Autonomous Cognitive Self-Modification in Frozen Sub-Frontier Language Models. Zenodo. DOI: 10.5281/zenodo.19448879 | Liu, N. F. et al. (2024). Lost in the Middle. TACL. | Ouyang, L. et al. (2022). Training language models to follow instructions with human feedback. NeurIPS. | Rafailov, R. et al. (2023). Direct Preference Optimization. NeurIPS. | Zweiger, A. et al. (2025). Self-Adapting Language Models. arXiv:2506.10943. | Dong, X. et al. (2024). A Survey on LLM Inference-Time Self-Improvement. arXiv:2412.14352. | Memento (2025). Fine-tuning LLM Agents without Fine-tuning LLMs. arXiv:2508.16153. | Continuity Core (2026). A Unified Cognitive Architecture for Self-Modifying AI. ResearchGate. | Madaan, A. et al. (2023). Self-Refine: Iterative Refinement with Self-Feedback. NeurIPS. | Dhuliawala, S. et al. (2023). Chain-of-Verification Reduces Hallucination. arXiv. | McKinsey (2026). Sovereign AI Ecosystems for Strategic Resilience and Economic Impact.