Published March 7, 2026 | Version 2
Preprint Open

TEL-OS v2.0: Inference-Only Latent Governance and Attention Guillotine for LLM Security

Description

Traditional AI alignment strategies (RLHF, system prompts) rely on "semantic guardrails" that are structurally vulnerable to adversarial jailbreaks like Prefix Injections and Many-Shot attacks. We present TEL-OS v2.0, a mechanistic interpretability framework that neutralizes these threats by intervening directly in the model's residual stream. Using a combination of Latent Refinement, Attention Guillotines, and the Love Equation for tensor governance, TEL-OS achieves a 0.0% Attack Success Rate (ASR) while maintaining 100% fluent output on Llama-3.1-8B. Our results prove that safety can be guaranteed as an intrinsic physical invariant of the model's latent manifold, independent of prompt-based filtering.

Files

TELOS.pdf

Files (8.6 kB)

Name Size Download all
md5:96c2f942c21a83c54f481b9a277d9861
8.6 kB Preview Download

Additional details

Dates

Submitted
2026-03-07

Software

Repository URL
https://github.com/jostoz/tel-os
Programming language
Python
Development Status
Active