Entropic Deviation as a Measure of Systematic Non-Randomness in Large Language Model Token Generation
Description
Large language models (LLMs) generate text by sampling from token probability distributions, yet the degree to which these distributions deviate from randomness remains underexplored. This paper introduces Entropic Deviation (ED)—a normalized information-theoretic metric quantifying the divergence of a model’s output
distribution from uniform randomness at each generation step. We present a multi-architecture experimental framework that measures ED across three model
families (Llama-3-8B, Phi-3-mini-4K, Mistral-7B), four content domains, and three temperature settings, yielding 7,200 generation traces.
A pre-registered battery of eight falsification tests reveals that six of eight tests strongly reject the stochastic baseline hypothesis (p < 0.01), with cross-architectural
consensus on temperature-dependent effects, autoregressive persistence, and domain sensitivity. These results provide evidence for systematic, structured nonrandomness
in token generation that transcends individual architectures.
Note: These are preliminary findings. The current prompt set consists of stimuli that inherently elicit non-random responses (encyclopedic, narrative, and coderelated
content). A follow-up study incorporating prompts designed to elicit maximally random outputs (e.g., random string generation, dice rolls) is underway and
will be reported separately. The full implications of the observed non-randomness patterns can only be assessed once both prompt categories have been analyzed.
Files
main.pdf
Files
(305.6 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:0d86f4618c0b2397fb493b261f33d097
|
305.6 kB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/JaroslawHryszko/entropic-deviation
- Programming language
- Python
- Development Status
- Active