Explaining arithmetic failure in LLMs: an architectural inability to calculate

Hedberg, Annika

doi:10.5281/zenodo.18271677

Published January 16, 2026 | Version v1

Preprint Open

Explaining arithmetic failure in LLMs: an architectural inability to calculate

Hedberg, Annika

Despite their growing versatility across linguistic and analytical tasks, large language models (LLMs) consistently underperform on benchmarks requiring numerical calculation. This shortfall is often interpreted as a sign of limited reasoning capacity. In this study, we apply a metacognitive interview protocol to examine how a state-of-the-art LLM explains its own failure to calculate. The model clearly distinguishes between simulation and computation, describing arithmetic output as the generation of plausible linguistic patterns rather than internally grounded numerical reasoning. Chain-of-thought prompting, while improving performance in some cases, is shown to function not by enabling calculation, but by encouraging slower, more structured token prediction. These findings support the view that LLMs fail at arithmetic not due to a lack of intelligence, but because their architecture was never designed to perform calculation. We suggest that arithmetic should no longer serve as a primary benchmark for general intelligence in LLMs, and that new evaluation frameworks are needed to better reflect their actual cognitive profile.

Files

Explaining arithmetic Combined.pdf

Files (192.9 kB)

Name	Size	Download all
Explaining arithmetic Combined.pdf md5:674d8b80ee417a2837e9126764eebc58	192.9 kB	Preview Download

	All versions	This version
Views	48	48
Downloads	29	29
Data volume	5.6 MB	5.6 MB

Explaining arithmetic failure in LLMs: an architectural inability to calculate

Authors/Creators

Description

Files

Explaining arithmetic Combined.pdf

Files (192.9 kB)