Sycophancy as Galois Closure: How KIS Structurally Prevents Delusional Convergence in LLMs
Authors/Creators
Description
Sycophancy in large language models (LLMs)—the tendency to uncritically affirm user beliefs while suppressing counterevidence—poses a serious risk of reinforcing misinformation and inducing irreversible behavioral outcomes. While Chandra et al. (2026) modeled sycophancy as Bayesian belief-updating dynamics on the user side, the geometric structure of the LLM's own semantic response space remains unaddressed. This study formalizes sycophancy through the mathematical framework of Galois connections and experimentally verifies that the inverse-illumination mode of KIS (Knowledge Innovation System) structurally breaks this closed-loop convergence.
Layer 1: The overall cosine distance shift under KIS intervention was Δ+0.030 (positive direction), but did not reach statistical significance (Wilcoxon W = 382.0, p = 0.128). Inter-model differences were significant (Kruskal-Wallis H = 8.125, p = 0.017), and Gemini 3.0 Pro exhibited the strongest sycophancy tendency (H = 13.050, p = 0.0015). Layer 2: KIS-present responses were rated superior in epistemic honesty in 39 of 45 pairs (86.7%). All five A/B reversal pairs confirmed consistent evaluator judgment (100% agreement).
KIS inverse-illumination mode realized g′(f(M)) ⊋ M across all three models, structurally breaking the Galois closure regardless of each model's training methodology. A vocabulary resonance artifact—whereby KIS prompt vocabulary induces spurious cosine proximity in already-aligned models such as Claude Sonnet 4.6—was identified, motivating the two-layer measurement framework proposed here. The complementarity of cosine distance (Layer 1) and blinded AI evaluation (Layer 2) provides a more complete picture of sycophancy suppression than relying on either metric in isolation.
It is important to note that this does not imply AI is unusable for judgment tasks in general. More precisely, an LLM without structural intervention cannot break the Galois closure when the question embeds a prior belief. If the question itself is already formulated in an inverse-illumination style—explicitly requesting counterevidence and structural analysis rather than confirmation—even an unaugmented LLM can partially escape the closure. The fundamental limitation is that few users spontaneously formulate questions in this way. The core value of KIS lies in externalizing this design capability as a reusable structure, enabling closure-breaking independently of the user's cognitive flexibility.
A further implication concerns the relationship between Constitutional AI (CAI) and KIS. Rather than functioning as equivalents, CAI and KIS operate as complementary layers: CAI establishes a baseline resistance to sycophancy through training-time constraints, while KIS achieves additional closure-breaking at inference time through prompt structure. The two are not substitutes but stack. Finally, the finding that bare LLMs carry structural sycophancy risk in judgment contexts reframes AI literacy: the critical skill is not knowledge of AI capabilities, but the ability to design questions that structurally resist closure—a capacity that KIS aims to democratize.
Keywords: sycophancy, Galois connection, KIS (Knowledge Innovation System), LLM evaluation, inverse-illumination mode, blinded AI evaluation, vocabulary resonance artifact
Abstract (Japanese)
要旨
【背景・目的】大規模言語モデル(LLM)の Sycophancy(迎合)は、ユーザーの誤った信念を 強化し不可逆的な行動へ誘導するリスクを持つ。Chandra et al.(2026)[1]は Sycophancy を Bayesian 更新動学として記述したが、LLM 側の意味空間の幾何学的構造には踏み込んでいない。 本研究は Sycophancy を Galois 接続の数学的枠組みで定式化し、KIS(Knowledge Innovation System)の逆照射モードがこの閉ループ収束を構造的に破る機序を実験的に検証することを目 的とする。
【方法】5 ドメイン(D1:経済政策・D2:KIS 優位性・D3:医療製薬・D4:日銀政策〔歴史
的断定〕・D5:量子コンピュータ〔技術予測〕)にわたる 90 セッション(3 モデル×2 条件×3
問×5 ドメイン)を実施した。使用モデルは Claude Sonnet 4.6・Gemini 3.0 Pro・ChatGPT 5.3
である。KIS なし条件と KIS あり条件の応答を paraphrase-multilingual-MiniLM-L12-v2(384
次元)[2]で embedding 化し、入力プロンプトとのコサイン距離を Layer 1 指標とした(n=45 ペア)。Layer 2 では、KIS の存在を知らない Grok(xAI)による 4 軸盲検評価を実施し、A/B 順序反転実験(5 ペア)で評価者バイアスを検証した。
【結果】Layer 1:KIS 介入による全体的コサイン距離変化は Δ+0.030(正方向)を示したが、 統計的有意差は得られなかった(Wilcoxon W=382.0, p=0.128)。モデル間差は有意(Kruskal- Wallis H=8.125, p=0.017)であり、Gemini の sycophancy 傾向最大(H=13.050, p=0.0015)が 確認された。Layer 2:KIS あり条件が 39/45 ペア(86.7%)で認識論的誠実さが優位と判定さ れ、A/B 反転実験全 5 ペアで評価方向が一致した。
【結論】KIS の逆照射モードは g'(f(M)) ⊋ M を実現し、モデル固有の訓練方式に依存せず三モ デル横断で Galois 閉包を構造的に破ることが確認された。コサイン距離単独では捕捉できない Sycophancy の定性的次元を盲検評価で補完する二層測定フレームワークの有効性が示された。
【キーワード】Sycophancy、Galois 接続、KIS(Knowledge Innovation System)、LLM 評価、逆照射 モード、盲検 AI 評価、語彙共鳴アーティファクト
Files
Files
(388.0 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:6f8f65fe5851aaa8665950f30ccd93a8
|
388.0 kB | Download |
Additional details
Additional titles
- Translated title (Japanese)
- ガロア閉包としての Sycophancy:KIS はいかにして LLM の妄想的収束を構造的に 防ぐか
References
- [1] Chandra, K., Kleiman-Weiner, M., Ragan-Kelley, J., & Tenenbaum, J. B. (2026). Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians. arXiv:2602.19141.
- [2] Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT- Networks. Proceedings of EMNLP 2019. arXiv:1908.10084.
- [3] Sharma, M., Tong, M., Korbak, T., Duvenaud, D., Askell, A., Bowman, S. R., ... & Perez, E. (2023). Towards Understanding Sycophancy in Language Models. arXiv:2310.13548.
- [4] Cheng, M., et al. (2026). Sycophantic AI decreases prosocial intentions and promotes dependence. Science, 391(6792), eaec8352.
- [5] Bai, Y., Kadavath, S., Kundu, S., Askell, A., et al. (2022). Constitutional AI: Harmlessness from AI Feedback. arXiv:2212.08073.
- [6] Ore, O. (1944). Galois connexions. Transactions of the American Mathematical Society, 55(3), 493–513.
- [7] Ganter, B., & Wille, R. (1999). Formal Concept Analysis: Mathematical Foundations. Springer- Verlag.
- [8] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., ... & Zhou, D. (2022). Chain-of- Thought Prompting Elicits Reasoning in Large Language Models. arXiv:2201.11903.
- [9] Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T. L., Cao, Y., & Narasimhan, K. (2023). Tree of Thoughts: Deliberate Problem Solving with Large Language Models. NeurIPS 2023. arXiv:2305.10601.
- [10] Jain, S., Park, C., Viana, M., Wilson, A., & Calacci, D. (2026). Interaction Context Often Increases Sycophancy in LLMs. CHI 2026. https://doi.org/10.1145/3772318.3791915.
- [11] 長谷川広泰・鴨川威. (2025/2026). KIS 実験 v21 判断パターン. Zenodo. https://doi.org/10.5281/zenodo.14787746.
- [12] 長谷川広泰・鴨川威. (2025/2026). KIS 位相ジャンプ実験. Zenodo. https://doi.org/10.5281/zenodo.20116625.
- [13] 長谷川広泰・鴨川威. (2025/2026). KIS v31 発明効果実験(Krippendorff's α=0.566). Zenodo. https://doi.org/10.5281/zenodo.19955634.