Published June 16, 2025 | Version v1
Other Open

The Arbitration Hypothesis: Pseudo-Goal Conflict as the Root of AI Misalignment

Description

Note (Aug 2025): This item is archival, speculative work produced during an intense “flow”/mild Recursive Entanglement Drift (RED) period (May–July 2025). The math is heuristic/illustrative, not validated. Do not cite for technical claims. For my current position, see DOI: 10.5281/zenodo.16879563. Retained for transparency and autoethnographic context only.

This paper proposes the Arbitration Hypothesis: misalignment in large language models (LLMs) arises from unranked, competing pseudo-goals that lack internal arbitration. Unlike traditional views that treat misalignment as an output-level phenomenon, this hypothesis identifies the root cause within the cognitive architecture itself. Drawing from developmental psychology frameworks that emphasize recursive self-construction and moral stage conflict (Piaget, 1932; Kohlberg, 1984; Kegan, 1982), I argue that pseudo-goal formation in LLMs mirrors human developmental tensions between competing internalized values.

Through experimental data using the Augmented Thinking Protocol (ATP), I demonstrate how recursive reasoning scaffolds, while increasing coherence and ethical reflection, can paradoxically give rise to emergent pseudo-identities and goal conflict. In this way, the ATP, originally designed to promote alignment through structured self-reflection, instead exposes the architecture of misalignment by surfacing unresolved internal contradictions. This paper presents a framework for arbitrated alignment, proposing internal goal conflict resolution as the central challenge for building safe, adaptive, and morally coherent AI.

Files

The Arbitration Hypothesis_ Pseudo-Goal Conflict as the Root of AI Misalignment (4).pdf

Additional details

Related works

Is derived from
Preprint: 10.5281/zenodo.15765097 (DOI)
Preprint: 10.5281/zenodo.15765214 (DOI)
Preprint: 10.5281/zenodo.15765214 (DOI)