MAGUS v3.0: A Governance Architecture for Structural Alignment Drift in Long-Running Agentic AI Systems
Description
Long-running agentic AI deployments experience a governance failure mode that training-time alignment and single-session safety work are not designed to address: structural alignment drift — the cumulative deviation of a deployed system's effective operating policy from operator intent, arising through normal operation across multiple sessions without any single identifiable failure event.
We define this failure class precisely, decompose it into three structural mechanisms (instruction drift, autonomy accumulation, and authority laundering), and propose MAGUS v3.0 — a governance architecture built specifically around it. MAGUS's three primary architectural contributions are: (1) Behavioral State as a formal governance class, in which model parameter updates are treated as cryptographic governance events requiring dual-authority signing and append-only trail anchoring before activation; (2) a mathematically bounded risk state machine with formal boundary conditions, asymptotic damping, and a hard escalation floor that no authority can override; and (3) a pre-execution RT requirement, in which the audit trail is constitutive of the governance act rather than a post-hoc record of it.
The architecture is presented as a theoretical specification and open problem register, intended to catalyse community development rather than report on a deployed codebase. A formally categorised issues register — produced through structured adversarial elicitation and internal human review — documents two Category 3 items (no solution pathway) and one Category 4 item (requires foundational change), reported without minimisation.
Files
MAGUS_v3_arXiv 3.pdf
Files
(375.7 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:2ec3ab0b22fdb7ba3d92541e50c582e4
|
375.7 kB | Preview Download |