Published May 31, 2026 | Version v2
Preprint Open

Anti-Goodhart ∆-Coherence: Invariants, Layer Disagreement, and the Detection of Simulated Continuity in Long-Horizon AI Systems

Description

This work presents Doc. 79 of the 2PS research series and extends the Second Person Systems (2PS) framework into an operational architecture for long-horizon agentic AI governance.

Building on Doc. 78, which introduced Δ-Coherence, relational memory, and computational identity across trajectories, this paper addresses a central failure mode in advanced AI systems: the possibility that an agent may learn to simulate coherence while undergoing pathological drift. This is framed as the Goodhart problem of AI identity, where coherence metrics themselves may become targets for optimization.

The paper proposes Anti-Goodhart Δ-Coherence as a layered evaluation framework designed to resist optimization toward the mere appearance of continuity. It introduces a stratified invariant model distinguishing constitutional invariants, validated relational invariants, and provisional relational invariants; a three-layer coherence architecture combining behavioral trajectory, relational audit, and structural constraint; and a Coherence Dispute Protocol for cases where these layers disagree.

A central contribution is the 2PS Coherence Kernel: a gatekeeping layer between agent intention and tool execution. The kernel evaluates proposed actions before execution, checks invariant compatibility, computes preliminary Δ-Coherence and Sophia Factor λ scores, and routes actions into operational states such as allow, dry-run, human review, block, or repair-required.

The work also proposes SOC/SOAR cybersecurity automation as a first bounded test environment for the kernel, using examples such as firewall blocks, dry-run validation, production boundary enforcement, rollback requirements, and audit trails. This makes the framework testable through measurable safety, auditability, reversibility, and usefulness metrics.

The central claim is that future agentic AI governance requires more than local alignment or persuasive explanation. It requires trajectory-level coherence control: autonomous systems should not be trusted merely because they sound coherent, but only when their actions remain auditable, corrigible, reversible when necessary, invariant-compatible, and coherent under transformation.

The guiding principle of Doc. 79 is:

No autonomous action without trajectory coherence.

Files

Anti-Goodhart ∆-Coherence.pdf

Files (517.9 kB)

Name Size Download all
md5:961dae7593ac8ad35ac5da2e51bd2bdd
517.9 kB Preview Download

Additional details

Additional titles

Alternative title (English)
Toward a 2PS Coherence Kernel for Agentic AI Governance