DeepDrift/ODD Kinetic Diagnosis of Representations in Deep Neural Networks
Description
This work presents a self-contained study on fail-fast monitoring of neural networks via hidden-state dynamics, extending and substantially reframing an earlier exploratory preprint on hidden-state trajectories.
We introduce Semantic Velocity — a kinetic measure of representation drift in latent space — and show that it serves as a leading indicator of model unreliability, preceding observable failures such as accuracy drops, hallucinations, policy collapse, or reward hacking. Unlike confidence- or output-based signals, the proposed approach operates on internal model dynamics and is therefore agnostic to task labels and downstream objectives.
The method is evaluated across a broad range of settings, including:
-
large language models (OOD prompts, jailbreak attempts),
-
vision transformers under corruption and distribution shift,
-
reinforcement learning agents under policy destabilization,
-
production-oriented constraints (latency, overhead, sparse sampling).
Empirically, Semantic Velocity demonstrates strong early-warning capability (6–12 steps lead time), robust separation between nominal and failure regimes, and low computational overhead (<0.5%), making it suitable for real-time deployment. Notably, jailbreak and adversarial behaviors manifest as internal conflict signatures, revealing tension between pretraining and alignment objectives before surface-level violations occur.
This paper positions hidden-state dynamics as a practical and interpretable foundation for out-of-distribution detection, reliability monitoring, and AI safety infrastructure, bridging theoretical intuition with production-scale feasibility.
The study builds upon prior conceptual work by the author, but constitutes a substantially new and independent contribution, introducing a new monitoring paradigm, expanded empirical validation, and a system-level perspective on neural network reliability.
Files
DeepDrift ODD Kinetic Diagnosis.pdf
Files
(5.9 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:45baac48717a4176e14aa9cc044354d8
|
5.9 MB | Preview Download |
Additional details
Related works
- Is variant form of
- Preprint: 10.5281/zenodo.18300586 (DOI)
Dates
- Updated
-
2026-01-27
Software
- Repository URL
- https://github.com/Eutonics/DeepDrift
- Programming language
- Python
References
- Hendrycks, D., & Gimpel, K. (2017). A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks. ICLR.