Right Answer, Wrong Question: Semantic Hallucination and a Definition-First Architecture for Personalized AI

Lee, Taekyung

doi:10.5281/zenodo.20652517

Published June 12, 2026 | Version v1

Preprint Open

Right Answer, Wrong Question: Semantic Hallucination and a Definition-First Architecture for Personalized AI

Lee, Taekyung¹

1. Independent Researcher

This paper identifies a failure mode in personalized AI that existing hallucination defenses do not address. Large language model hallucination is commonly framed as a failure of factual grounding: the answer is false, fabricated, or unsupported by evidence. The paper argues that personalized AI also fails in a different way — by misidentifying what the user means by key terms. A response can be factually correct and logically coherent while still answering the wrong operative definition. The paper names this failure semantic hallucination: confident generation under an incorrect operative definition, without disclosure or clarification. The term is deliberately narrow. Ambiguity before the system commits to a meaning is not counted, and a clarification question is not counted; the failure occurs only when the model silently selects the wrong meaning and answers as if the user's intended meaning had been captured.

The central contribution is DF-SSMA, a Definition-First Socio-Semantic Multi-Agent Architecture. The architecture separates semantic grounding from factual grounding. Before any retrieval or debate, it detects high-risk ambiguous terms, retrieves user-specific definition priors from a Personal Semantic Memory, compares them with public and technical definitions, and either locks the operative definition or asks a definitional clarification question. A Social Semantic Calibration layer then labels and translates private meanings into socially and technically intelligible language, limiting private-language drift without erasing the user's meaning. Only after this definition-first layer does the system invoke factual retrieval and role-differentiated reasoning through Advocate, Challenger, and Mediator agents — the multi-agent structure developed in the author's companion work on sycophantic spiraling, now placed after definition locking so that the agents argue about the right premise.

A Monte Carlo simulation compares nine conditions, from a baseline model through RAG-only, memory-only, definition-only, and multi-agent-only ablations to the full architecture, across ambiguous-term tasks and eight stress scenarios. The central result is modular specificity: neither of the two dominant existing defenses reduces semantic hallucination. Retrieval-augmented generation and multi-agent debate both leave the semantic hallucination rate at the baseline level, while definition-first locking drives it to zero in the synthetic setting. That zero is not free, and the cost is reported rather than hidden: roughly three-fifths of conversations are converted into clarification turns, and the Personal Semantic Memory prior is what keeps that burden bounded — without it, the definition-only system reaches zero only by clarifying on essentially every conversation. In the combined worst-case scenario, normalized composite hallucination risk falls from 0.5778 for the baseline to 0.0789 for the full architecture, an 86.3% reduction.

Several controls test whether the result is an artifact. A negative control that keeps the locking mechanism but selects definitions at random stays at chance level, showing the reduction comes from inference accuracy, not from the lock itself. A misspecified-memory stress test replaces the stored definition with a confidently wrong one: a memory-only system degrades toward chance exactly as a skeptic would predict, while the full architecture detects the conflict between wrong memory and independent retrieval and routes it to clarification, keeping semantic hallucination at essentially zero even when every stored definition is wrong. The composite ranking holds across 99.7% of the metric weight space and across all 512 Latin Hypercube samples of the joint parameter space. A small twelve-item pilot on real generated text shows the construct is measurable outside simulation: a baseline model silently committed to a wrong definition on eight of twelve items, while a definition-first strategy committed on none and clarified exactly the genuinely ambiguous items.

The paper does not claim that deployed systems would achieve these numerical reductions, that all hallucination is semantic, or that the simulation substitutes for human-facing validation. It tests structural plausibility under declared synthetic assumptions, states eight falsification conditions with pre-specified quantitative thresholds, and includes a pre-registration-ready LLM-in-the-loop validation protocol. The design rule it advances is a matter of sequence: define before retrieving, clarify before guessing, calibrate before personalizing, challenge before agreeing, and mediate before concluding. Before answering, define.

Keywords: semantic hallucination, definition-first grounding, Personal Semantic Memory, personalized AI, operative definition, ambiguity, clarification, retrieval-augmented generation, multi-agent systems, Advocate Challenger Mediator, sycophancy, private-language drift, social semantic calibration, AI safety, Monte Carlo simulation, falsifiability.

Files

1. Lee_2026__31_DF_SSMA_v1.pdf

Files (5.7 MB)

Name	Size	Download all
1. Lee_2026__31_DF_SSMA_v1.pdf md5:0c8b8c6072ef8dceb82939e07b23b641	2.4 MB	Preview Download
2. df_ssma_simulation_package_v5.zip md5:bb5be8197d89eb1502d34d6df9d7e51b	3.3 MB	Preview Download

Additional details

Cites: Preprint: 10.5281/zenodo.19396300 (DOI)

[1] Huang, L. et al. (2025). A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. ACM Transactions on Information Systems. DOI: 10.1145/3703155.
[2] Lewis, P. et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems.
[3] Asai, A., Wu, Z., Wang, Y., Sil, A., & Hajishirzi, H. (2024). Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. International Conference on Learning Representations.
[4] Kim, H. J., Kim, Y., Park, C., Kim, J., Park, C., Yoo, K. M., Lee, S.-g., & Kim, T. (2024). Aligning Language Models to Explicitly Handle Ambiguity. Proceedings of EMNLP 2024.
[5] Salemi, A., Mysore, S., Bendersky, M., & Zamani, H. (2024). LaMP: When Large Language Models Meet Personalization. Proceedings of ACL 2024.
[6] Tan, Z. et al. (2025). In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialogue Agents. Proceedings of ACL 2025.
[7] Du, Y., Li, S., Torralba, A., Tenenbaum, J. B., & Mordatch, I. (2024). Improving Factuality and Reasoning in Language Models through Multiagent Debate. Proceedings of ICML 2024.
[8] Chiang, C.-W., Lu, Z., Li, Z., & Yin, M. (2024). Enhancing AI-Assisted Group Decision Making through LLM-Powered Devil's Advocate. Proceedings of the 29th International Conference on Intelligent User Interfaces.
[9] Sharma, M. et al. (2024). Towards Understanding Sycophancy in Language Models. International Conference on Learning Representations.
[10] Lee, T. (2026). Sycophantic Chatbots Cause Delusional Spiraling, but Multi-Agent Architectures Substantially Reduce It: A Response to Chandra et al. (2026). Companion manuscript by the present author. Zenodo. DOI: 10.5281/zenodo.19396300.
[11] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2023). ReAct: Synergizing Reasoning and Acting in Language Models. International Conference on Learning Representations.
[12] Rao, S., & Daumé III, H. (2018). Learning to Ask Good Questions: Ranking Clarification Questions using Neural Expected Value of Perfect Information. Proceedings of ACL 2018, 2737–2746.
[13] Aliannejadi, M., Zamani, H., Crestani, F., & Croft, W. B. (2019). Asking Clarifying Questions in Open-Domain Information-Seeking Conversations. Proceedings of SIGIR 2019, 475–484.
[14] Zamani, H., Mitra, B., Chen, E., Lueck, G., Diaz, F., Bennett, P. N., Craswell, N., & Dumais, S. T. (2020). Analyzing and Learning from User Interactions for Search Clarification. Proceedings of SIGIR 2020, 1181–1190.
[15] Min, S., Michael, J., Hajishirzi, H., & Zettlemoyer, L. (2020). AmbigQA: Answering Ambiguous Open-domain Questions. Proceedings of EMNLP 2020, 5783–5797. DOI: 10.18653/v1/2020.emnlp-main.466.
[16] Kuhn, L., Gal, Y., & Farquhar, S. (2022). CLAM: Selective Clarification for Ambiguous Questions with Generative Language Models. arXiv:2212.07769.
[17] Zhang, T., Qin, P., Deng, Y., Huang, C., Lei, W., Liu, J., Jin, D., Liang, H., & Chua, T.-S. (2024). CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models. Proceedings of ACL 2024, 10746–10766. DOI: 10.18653/v1/2024.acl-long.578.
[18] Zhang, S., Dinan, E., Urbanek, J., Szlam, A., Kiela, D., & Weston, J. (2018). Personalizing Dialogue Agents: I have a dog, do you have pets too? Proceedings of ACL 2018, 2204–2213.
[19] Mazaré, P.-E., Humeau, S., Raison, M., & Bordes, A. (2018). Training Millions of Personalized Dialogue Agents. Proceedings of EMNLP 2018, 2775–2779.
[20] Park, J. S., O'Brien, J. C., Cai, C. J., Morris, M. R., Liang, P., & Bernstein, M. S. (2023). Generative Agents: Interactive Simulacra of Human Behavior. Proceedings of UIST 2023. DOI: 10.1145/3586183.3606763.
[21] Zhong, W., Guo, L., Gao, Q., Ye, H., & Wang, Y. (2024). MemoryBank: Enhancing Large Language Models with Long-Term Memory. Proceedings of the AAAI Conference on Artificial Intelligence, 38(17), 19724–19731. DOI: 10.1609/aaai.v38i17.29946.
[22] Guan, J., Wu, J., Li, J.-N., Cheng, C., & Wu, W. (2025). A Survey on Personalized Alignment — The Missing Piece for Large Language Models in Real-World Applications. Findings of the Association for Computational Linguistics: ACL 2025, 5313–5333.
[23] Amershi, S., Weld, D., Vorvoreanu, M., Fourney, A., Nushi, B., Collisson, P., Suh, J., Iqbal, S., Bennett, P. N., Inkpen, K., Teevan, J., Kikin-Gil, R., & Horvitz, E. (2019). Guidelines for Human-AI Interaction. Proceedings of CHI 2019. DOI: 10.1145/3290605.3300233.
[24] Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? Proceedings of FAccT 2021, 610–623. DOI: 10.1145/3442188.3445922.
[25] Weidinger, L. et al. (2021). Ethical and Social Risks of Harm from Language Models. arXiv:2112.04359.
[26] Liao, Q. V., & Sundar, S. S. (2022). Designing for Responsible Trust in AI Systems: A Communication Perspective. Proceedings of FAccT 2022, 1257–1268. DOI: 10.1145/3531146.3533182.
[27] Navigli, R. (2009). Word Sense Disambiguation: A Survey. ACM Computing Surveys, 41(2), Article 10. DOI: 10.1145/1459352.1459355.
[28] Raganato, A., Camacho-Collados, J., & Navigli, R. (2017). Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison. Proceedings of EACL 2017, 99–110.
[29] Zettlemoyer, L. S., & Collins, M. (2005). Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars. Proceedings of UAI 2005, 658–666.
[30] Berant, J., Chou, A., Frostig, R., & Liang, P. (2013). Semantic Parsing on Freebase from Question-Answer Pairs. Proceedings of EMNLP 2013, 1533–1544.
[31] Larson, S., Mahendran, A., Peper, J. J., Clarke, C., Lee, A., Hill, P., Kummerfeld, J. K., Leach, K., Laurenzano, M. A., Tang, L., & Mars, J. (2019). An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction. Proceedings of EMNLP-IJCNLP 2019, 1311–1316.
[32] Fanous, A. et al. (2025). SycEval: Evaluating LLM Sycophancy. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 8(1), 893–900. DOI: 10.1609/aies.v8i1.36598.
[33] Perez, E. et al. (2023). Discovering Language Model Behaviors with Model-Written Evaluations. Findings of the Association for Computational Linguistics: ACL 2023, 13387–13434.
[34] Bai, Y. et al. (2022). Constitutional AI: Harmlessness from AI Feedback. arXiv:2212.08073.
[35] Vaswani, A. et al. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems.
[36] Liang, P. et al. (2023). Holistic Evaluation of Language Models. Transactions on Machine Learning Research.
[37] Grimm, V., Railsback, S. F., Vincenot, C. E., Berger, U., Gallagher, C., DeAngelis, D. L., Edmonds, B., Ge, J., Giske, J., Groeneveld, J., Johnston, A. S. A., Milles, A., Nabe-Nielsen, J., Polhill, J. G., Radchuk, V., Rohwäder, M.-S., Stillman, R. A., Thiele, J. C., & Ayllón, D. (2020). The ODD Protocol for Describing Agent-Based and Other Simulation Models: A Second Update to Improve Clarity, Replication, and Structural Realism. Journal of Artificial Societies and Social Simulation, 23(2), 7. DOI: 10.18564/jasss.4259.

	All versions	This version
Views	11	11
Downloads	11	11
Data volume	29.9 MB	29.9 MB

1. Lee_2026__31_DF_SSMA_v1.pdf

Files (5.7 MB)

Related works

References

Right Answer, Wrong Question: Semantic Hallucination and a Definition-First Architecture for Personalized AI

Authors/Creators

Description

Files

1. Lee_2026__31_DF_SSMA_v1.pdf

Files (5.7 MB)

Additional details

Related works

References