Fathom Monitor: Per-Token Hallucination Detection via Coherence Divergence in Sparse Autoencoder Feature Space
Description
This technical disclosure describes Fathom Monitor, a system and method for detecting hallucination-risk tokens in large language model (LLM) outputs at the time of generation, using a mechanistic signal derived from the geometric structure of sparse autoencoder (SAE) feature activations.
The core innovation is the use of C_delta — the divergence between late-layer and early-layer feature coherence — as a per-token hallucination indicator. When C_delta exceeds a calibrated threshold at a given token position, that token is flagged as uncertain or high-risk and annotated inline.
Empirical validation on TruthfulQA (n=50, Gemma-2-2B): C_delta discriminates hallucination with p=0.040, Cohen's d=0.407. Depth (K) is blind to hallucination (p=0.931).
This document constitutes a public technical disclosure establishing prior art. Related provisional patents: US 64/020,489 (March 29, 2026) and US 64/021,113 (March 30, 2026). Builds on Zenodo records doi:10.5281/zenodo.19326175 and doi:10.5281/zenodo.19364702.
Notes
Files
fathom_monitor_disclosure.pdf
Files
(12.6 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:49dcdc4cb7c33830a140c469a16365a9
|
12.6 kB | Preview Download |
Additional details
Related works
- Is supplement to
- Preprint: 10.5281/zenodo.19364702 (DOI)
- Preprint: 10.5281/zenodo.19326175 (DOI)