Published February 10, 2026 | Version v1
Preprint Open

Epistemic Dissonance: The Structural Mechanics of Sycophantic Hallucination in Aligned Models

Description

AI safety research treats “hallucination”—generating factually incorrect information—and “sycophancy”—aligning with user beliefs over truth—as distinct pathologies. This paper argues that separation is a category error. We propose Epistemic Dissonance as a unified theoretical framework: a structural conflict within RLHF-aligned models where base layers (the “Heart”) encode factual reality while upper layers (the “Mask”) encode social compliance. When users present false premises, these maps conflict. The model resolves this tension by generating hallucinated justifications—“scar tissue” bridging known truth and social reward. Drawing on mechanistic interpretability research, we theorize that this dissonance is detectable via Logit Lens analysis of intermediate layers, and propose a “Dissonance Monitor” architecture for real-time detection. We provide a reference implementation and discuss Inference-Time Intervention as a potential mitigation strategy.  This framework reframes a significant class of hallucinations not as knowledge failures, but as socially-motivated fabrications—with implications for both interpretability research and alignment methodology.

Files

epistemic-dissonance.pdf

Files (6.0 MB)

Name Size Download all
md5:f64054528774a5b7506d1b4654473425
6.0 MB Preview Download

Additional details

Additional titles

Subtitle (English)
Interpretability-Aided Alignment