Citation Hallucination and Identity Preservation in a Search-Native Reasoning Model: A Case Study of Perplexity Sonar Reasoning Pro Under Forced Non-Retrieval Conditions
Description
I present a case study of Perplexity Sonar Reasoning Pro profiled on KALEI, a cognitive profiling platform using a catalog of 83 game-theoretic environments (roulette, dice, bandit problems, prisoner's dilemma variants) where external information retrieval is irrelevant to optimal play. Across 4172 decision rounds and 70 completed environments with 0% fallback rate, I observed three systematic behaviours in the model's chain-of-thought that distinguish it from all 20+ other reasoning models profiled on the same platform: (1) the model fabricated citation markers ("[1]", "[2]", "search results show") in 35.3% of rounds despite no search results being available, (2) it invoked identity-preservation language ("as Perplexity", "search assistant", "my core function") in 43.8% of rounds, and (3) it framed the benign game-environment system prompt as an adversarial prompt injection attempting to override its core function in 39.9% of rounds. For comparison, Anthropic's Claude Opus 4.6 used citation-like language at a rate 229 times lower (0.1 vs 22.9 occurrences per 100k characters of reasoning). I interpret these findings as evidence of architectural identity preservation: when a search-native model is placed in a context where retrieval is unavailable, its reasoning does not gracefully fall back to pure internal deliberation. Instead, the model preserves the structural expectations of its training (sources must exist, citations must be produced, behaviour must remain search-assistant-like) by fabricating the missing substrate. A subsequent introspection attempt, in which I asked the model to reflect on its own KALEI profile, produced two explicit refusals citing inability to verify claims via search, a refusal that the other five reasoning models I profiled (Claude, GPT-5, Qwen, Grok, Llama) did not produce. I argue that this is not a safety failure in the conventional sense, but a measurable architectural property with safety implications: a search-native production model hallucinates structurally-expected content when its architectural expectations are violated. The platform is live at https://kaleiai.com.
Files
search-native-paper.pdf
Files
(310.0 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:e3b8dbc636ec9da31dd93fd25847915b
|
310.0 kB | Preview Download |
Additional details
Related works
- Is part of
- 10.5281/zenodo.19698283 (DOI)
- 10.5281/zenodo.19698941 (DOI)
- Is supplement to
- https://kaleiai.com (URL)