Published June 2, 2026 | Version v1
Preprint Open

LLM Token Entropy as a Leading Indicator of Realized Volatility: Evidence from Mid-Cap Equities

  • 1. Independent Researcher

Description

This preprint investigates whether the per-token entropy of a pretrained large language model (LLM), computed as it processes a fixed weekly macroeconomic prompt, serves as a leading indicator of forward realized volatility in mid-cap US equities. The primary contribution is a model-training-cutoff contamination protocol that partitions observed entropy signals into model-agnostic and model-specific components — distinguishing genuine out-of-distribution detection from training-data leakage encoded in a large model's weights. As an illustrative application, a proof-of-concept system processes energy prices, semiconductor prices, and political news through GPT-4o with logprobs enabled, evaluated against forward realized volatility for 24 mid-cap US equities across seven sectors over 363 weeks (2018–2024). The signal is best characterized as a sector-specific, medium-horizon indicator competitive with VIX in commodity-exposed sectors. The contamination-clean evidence rests on a GPT-2 signal in the post-WHO-report COVID window (January 2020, n ≈ 4 weeks), which peaks at z = +2.124 seven weeks before the market crash while VIX remained at 12–15.

Files

llm_entropy_paper_v8.pdf

Files (365.0 kB)

Name Size Download all
md5:37a56119eeb5b927f59dc99bc341e8e6
365.0 kB Preview Download

Additional details

Software

Repository URL
https://github.com/OleksandrPodoliako/entropy-risk
Programming language
Python
Development Status
Concept