TEL-OS v2.0: Inference-Only Latent Governance and Attention Guillotine for LLM Security
Authors/Creators
Description
Traditional AI alignment strategies (RLHF, system prompts) rely on "semantic guardrails" that are structurally vulnerable to adversarial jailbreaks like Prefix Injections and Many-Shot attacks. We present TEL-OS v2.0, a mechanistic interpretability framework that neutralizes these threats by intervening directly in the model's residual stream. Using a combination of Latent Refinement, Attention Guillotines, and the Love Equation for tensor governance, TEL-OS achieves a 0.0% Attack Success Rate (ASR) while maintaining 100% fluent output on Llama-3.1-8B. Our results prove that safety can be guaranteed as an intrinsic physical invariant of the model's latent manifold, independent of prompt-based filtering.
Files
TELOS.pdf
Files
(8.6 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:96c2f942c21a83c54f481b9a277d9861
|
8.6 kB | Preview Download |
Additional details
Dates
- Submitted
-
2026-03-07
Software
- Repository URL
- https://github.com/jostoz/tel-os
- Programming language
- Python
- Development Status
- Active