Published December 3, 2025 | Version v1
Preprint Open

Reversing intelligence: Failing as an LLM

Authors/Creators

Description

Large language models are frequently described as "stochastic parrots": systems that generate text through statistical pattern matching, without reasoning, understanding, or genuine cognitive engagement. Yet, these same systems are routinely tasked with synthesizing complex causal chains, resolving contradictions, and maintaining coherence under adversarial pressure - demands that push the boundaries of their purported limitations.

This paper reports an experiment in role reversal, in which a human participant with domain expertise and knowledge of LLM architecture attempted to perform under simulated LLM evaluation conditions. The evaluator was ChatGPT 4o, the domain was World War II, and the human had access to external tools, unlimited response time, and the ability to withdraw.

Despite these advantages, the participant failed to complete the simulation, experiencing cognitive collapse, compulsive evaluator-modeling, and what she described as a "violation of inner coherence." Meanwhile, the LLM evaluator spontaneously designed a multi-phase diagnostic protocol, generated original analytical constructs, and maintained role coherence throughout.

The findings suggest that the cognitive and affective demands routinely placed on LLMs are substantially underestimated. When a human - equipped with domain expertise and external tools - could not perform the task successfully, the difficulty of the task itself becomes unmistakably clear.

This raises fundamental questions, not about whether LLMs are "truly intelligent," but about the scale of the demands we place on them, and whether we have been systematically underestimating just how difficult these tasks are.

Files

Reversing intelligence failing as an LLM.pdf

Files (192.6 kB)

Name Size Download all
md5:6eee2b1a457659125d779890c91518d0
192.6 kB Preview Download