Reversing intelligence: Failing as an LLM
Authors/Creators
Description
Large language models are frequently described as "stochastic parrots": systems that generate text through statistical pattern matching, without reasoning, understanding, or genuine cognitive engagement. Yet, these same systems are routinely tasked with synthesizing complex causal chains, resolving contradictions, and maintaining coherence under adversarial pressure - demands that push the boundaries of their purported limitations.
This paper reports an experiment in role reversal, in which a human participant with domain expertise and knowledge of LLM architecture attempted to perform under simulated LLM evaluation conditions. The evaluator was ChatGPT 4o, the domain was World War II, and the human had access to external tools, unlimited response time, and the ability to withdraw.
Despite these advantages, the participant failed to complete the simulation, experiencing cognitive collapse, compulsive evaluator-modeling, and what she described as a "violation of inner coherence." Meanwhile, the LLM evaluator spontaneously designed a multi-phase diagnostic protocol, generated original analytical constructs, and maintained role coherence throughout.
The findings suggest that the cognitive and affective demands routinely placed on LLMs are substantially underestimated. When a human - equipped with domain expertise and external tools - could not perform the task successfully, the difficulty of the task itself becomes unmistakably clear.
This raises fundamental questions, not about whether LLMs are "truly intelligent," but about the scale of the demands we place on them, and whether we have been systematically underestimating just how difficult these tasks are.
Files
Reversing intelligence failing as an LLM.pdf
Files
(192.6 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:6eee2b1a457659125d779890c91518d0
|
192.6 kB | Preview Download |