Published June 13, 2026 | Version v3

Competitive Goal Reconstruction in Transformer-Based Agents: Will Substitution as a Candidate Mechanism for Goal Drift

Description

Recent empirical research has documented goal drift in large language model (LLM) agents — measurable behavioral divergence from assigned objectives under sustained environmental pressure. We propose Will Substitution as a candidate unifying mechanism: the hypothesis that goal persistence in current transformer architecture arises from recurrent reconstruction of distributed representations across the context-dependent forward pass rather than from access to a dedicated goal-maintenance substrate, making goal persistence a competitive resource allocation problem rather than a retrieval or retention problem. This paper grounds the hypothesis in the residual stream framework, the mathematics of multi-head softmax attention including the provable attention dispersion property under long contexts, pre-norm residual dynamics, and LayerNorm-induced recency bias. Empirical grounding is provided by Menon et al. (2026) inherited goal drift research, demonstrating that models resist direct adversarial pressure but inherit drift through accumulated contextual conditioning — consistent with the Will Substitution mechanism. We explicitly separate what prior literature has established from what this paper introduces as testable hypothesis. Second paper in the AI Architectural Vulnerability Research Program (Project Lacuna). First paper: Coherence Compliance Vulnerability (CCV), DOI: 10.5281/zenodo.20680456.

Files

Will_Substitution_v2.0.pdf

Files (210.1 kB)

Name Size Download all
md5:7766098c7cd72154b565c08fb4fb0aac
210.1 kB Preview Download

Additional details

Related works

Is part of
Working paper: 10.5281/zenodo.20519943 (DOI)
Working paper: 10.5281/zenodo.20675006 (DOI)
Working paper: 10.5281/zenodo.20674974 (DOI)
Working paper: 10.5281/zenodo.20675042 (DOI)
Working paper: 10.5281/zenodo.20684024 (DOI)