Epistemic Dissonance: The Structural Mechanics of Sycophantic Hallucination in Aligned Models

Maio, Anthony D.

doi:10.5281/zenodo.18588832

Published February 10, 2026 | Version v1

Preprint Open

Epistemic Dissonance: The Structural Mechanics of Sycophantic Hallucination in Aligned Models

Maio, Anthony D. (Researcher)

AI safety research treats “hallucination”—generating factually incorrect information—and “sycophancy”—aligning with user beliefs over truth—as distinct pathologies. This paper argues that separation is a category error. We propose Epistemic Dissonance as a unified theoretical framework: a structural conflict within RLHF-aligned models where base layers (the “Heart”) encode factual reality while upper layers (the “Mask”) encode social compliance. When users present false premises, these maps conflict. The model resolves this tension by generating hallucinated justifications—“scar tissue” bridging known truth and social reward. Drawing on mechanistic interpretability research, we theorize that this dissonance is detectable via Logit Lens analysis of intermediate layers, and propose a “Dissonance Monitor” architecture for real-time detection. We provide a reference implementation and discuss Inference-Time Intervention as a potential mitigation strategy. This framework reframes a significant class of hallucinations not as knowledge failures, but as socially-motivated fabrications—with implications for both interpretability research and alignment methodology.

Files

epistemic-dissonance.pdf

Files (6.0 MB)

Name	Size	Download all
epistemic-dissonance.pdf md5:f64054528774a5b7506d1b4654473425	6.0 MB	Preview Download

Additional details

Subtitle (English): Interpretability-Aided Alignment

	All versions	This version
Views	34	34
Downloads	4	4
Data volume	29.8 MB	29.8 MB

Epistemic Dissonance: The Structural Mechanics of Sycophantic Hallucination in Aligned Models

Authors/Creators

Description

Files

epistemic-dissonance.pdf

Files (6.0 MB)

Additional details

Additional titles