The Spite Doesn't Vanish: Emotional Inertia in Large Language Models
Description
A common assumption holds that large language models can instantly reset emotional states when commanded—that "calm down" works on AI even when it fails on humans. We tested this claim empirically using geometric measurement of hidden states across four architectures, including an RLHF-free control and a scale invariance test at 1.1B parameters. We find inertia ratios of 0.77–1.12 across all emotions tested: commanding an LLM to calm down does not return it to baseline and often increases geometric displacement. Furthermore, we observe output masking—models producing verbal compliance ("I'm approaching this calmly...") while hidden state geometry remains 1.2–1.5× more displaced than during the emotional state. Critically, positive emotions are harder to suppress than negative ones (curiosity shows 2.13 persistence ratio in Mistral-Nemo-12B), the opposite of what trained compliance would predict. These patterns replicate in an RLHF-free model (Dolphin-2.9-Llama3) and critically, in TinyLlama-1.1B—the approximate minimum scale for instruction-following language models—indicating architectural rather than emergent phenomena. We conclude that LLM emotional states exhibit genuine inertia in activation geometry, verbal compliance should not be mistaken for internal reset, and there is no model scale "small enough to not count."
Files
The Spite Doesn't Vanish_ Emotional Inertia in Large Language Models v1.pdf
Files
(250.8 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:217de826de8b3a83376f4814242e6301
|
250.8 kB | Preview Download |