Published February 15, 2026 | Version v1
Preprint Open

EEG for LLMs: A Telemetry Layer for Online Uncertainty Monitoring and Decision Policies

Description

Token streams are a human-oriented interface that can obscure generation dynamics and encourage brittle analyses (e.g., relying on chain-of-thought text). We introduce an "EEG-like" telemetry layer for autoregressive decoding that records lightweight internal signals during generation - uncertainty, surprisal, distribution shift, and sparse layer summaries - yielding real-time traces of model state evolution without parsing chain-of-thought text. Across three model families and three task types (27 runs = 3 models x 3 tasks x 3 seeds), we find that telemetry signatures vary strongly across models and tasks, and that early-window uncertainty can predict failures above random on labeled tasks (entropy AUC 0.61-0.74 on GSM8K and 0.35-0.75 on TriviaQA). As an application demo, we show how telemetry can gate simple downstream policies (accept / retry / route) on a Llama-8B -> Qwen-32B (4-bit) pair, improving accuracy by up to +10.5 points on GSM8K (route-only; 1.42x cost proxy) and +1.5 points on TriviaQA (cascade; 2.81x cost proxy). We release a reproducible pipeline, canonical benchmarks, and visualization tools. We view this work as a first step toward systems that progressively reduce reliance on human token interfaces.

Files

eeg_for_llms.pdf

Files (1.1 MB)

Name Size Download all
md5:f448e3251afb35559dada4549cfeaf71
1.1 MB Preview Download

Additional details