Introducing Background Temperature to Characterise Hidden Randomness in Large Language Models

Messina, Alberto; Scotta, Stefano

doi:10.5281/zenodo.17279584

Published October 6, 2025 | Version v2

Preprint Open

Introducing Background Temperature to Characterise Hidden Randomness in Large Language Models

1. RAI - Radiotelevisione Italiana
2. RAI - CRITS

Even when decoding with temperature T=0, large language models (LLMs) can produce divergent outputs for identical inputs. Recent work by Thinking Machines Lab highlights implementation-level sources of nondeterminism, including batch-size variation, kernel non-invariance, and floating-point non-associativity. In this short note we formalize this behavior by introducing the notion of background temperature Tbg, the effective temperature induced by an implementation-dependent perturbation process observed even when nominal T=0. We provide clean definitions, show how Tbg relates to a stochastic perturbation governed by the inference environment I, and propose an empirical protocol to estimate Tbg via the equivalent temperature Tn(I) of an ideal reference system. We conclude with a set of pilot experiments run on a representative pool from the major LLM providers that demonstrate the idea and outline implications for reproducibility, evaluation, and deployment.

Files

Background_Temperature_in_LLMs___arxiV.pdf

Files (854.3 kB)

Name	Size	Download all
Background_Temperature_in_LLMs___arxiV.pdf md5:8e0e444d2d762a190436e75d6ff21a2e	854.3 kB	Preview Download

Additional details

Berk Atil, Sarp Aykent, Alexa Chittams, Lisheng Fu, Rebecca J. Passonneau, Evan Radcliffe, Guru Rajan Rajagopal, Adam Sloan, Tomasz Tudrej, Ferhan Ture, Zhe Wu, Lixinyu Xu, and Breck Baldwin. Non-determinism of "deterministic" llm settings. arXiv, 2408.04667, 2025.
Horace He and Thinking Machines Lab. Defeating nondeterminism in llm inference. Thinking Machines Lab blog, 2025.
Mandar Joshi, Eunsol Choi, Daniel Weld, and Luke Zettlemoyer. Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1601–1611, Vancouver, Canada, July 2017. Association for Computational Linguistics
Shuyin Ouyang, Jie M. Zhang, Mark Harman, and Meng Wang. An empirical study of the non-determinism of chatgpt in code generation. In arXiv preprint, volume 2308.02828
S. Price and D. L. Cote. Document analysis with llms: Assessing performance, bias, and nondeterminism in decision making. In ICPRAM 2025: Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods, pages 207–214, 202
Nikita Ravi, Abhinav Goel, James C. Davis, and George K. Thiruvathukal. Improving the reproducibility of deep learning software: An initial investigation through a case study analysis. arXiv preprint, arXiv:2505.03165, 2025
Sanjif Shanmugavelu, Mathieu Taillefumier, Christopher Culver, Oscar Hernandez, Mark Coletti, and Ada Sedova. Impacts of floating-point non-associativity on reproducibility for hpc and deep learning applications. arXiv preprint, arXiv:2408.05148, 2024
Yifan Song, Guoyin Wang, Sujian Li, and Bill Yuchen Lin. Evaluation of llms should not ignore non-determinism. arXiv, 2407.10457, 2024

	All versions	This version
Views	193	117
Downloads	160	94
Data volume	154.1 MB	103.4 MB

Introducing Background Temperature to Characterise Hidden Randomness in Large Language Models

Authors/Creators

Description

Files

Background_Temperature_in_LLMs___arxiV.pdf

Files (854.3 kB)

Additional details

References