Aegis: A Production Inference-Time Governance Engine for Large Language Models

Alkur, Jaswanth

doi:10.5281/zenodo.19690403

Published April 22, 2026 | Version v1

Preprint Open

Aegis: A Production Inference-Time Governance Engine for Large Language Models

Alkur, Jaswanth (Researcher)

Embedding-based content classifiers deployed as LLM governance infrastructure exhibit
f
ive systematic, reproducible failure modes that are not addressable through training data
expansion alone: (1) rank-weighted cluster bias in k-NN voting amplifies false positives
when harmful categories have larger training corpora; (2) categorical intent dampening
creates a life-critical safety bypass when applied uniformly across harm categories; (3) PII
policy inversion causes a 77-percentage-point recall failure through a disclosure-versus
exploitation design flaw; (4) character-level obfuscation is a structural embedding-layer
attack not fixable by training; and (5) code-switching exposes a multilingual gap affecting
500M+ Hindi speakers. We document each failure mode with root-cause analysis and
a concrete architectural mitigation, and present Aegis — the production system built
from these findings: a model-agnostic inference-time governance engine intercepts queries
before LLM invocation, enforcing allow/block/support decisions across twelve harm
categories at sub-20ms CPU latency. Aegis combines ONNX-accelerated sentence em
beddings, FAISS approximate nearest-neighbour retrieval over 2,416 labelled governance
examples, lightweight heuristic attack-vector detectors, and a deterministic policy en
gine linked to eleven regulatory frameworks including DPDP 2023, GDPR, EU AI Act,
HIPAA, and SEBI. On a self-constructed 1,001-sample adversarial benchmark, Aegis
achieves 99.30% overall accuracy [95% CI: 98.70%–99.80%], 100.00% precision (zero
false positives), 99.20% recall [95% CI: 98.52%–99.77%], and F1=99.60%; these results
indicate strong internal consistency and require external validation on independently
constructed benchmarks. Against the OpenAI Moderation API on the same benchmark,
Aegis achieves +34.96pp higher accuracy (99.30% vs. 64.34%) and reduces false negatives
from 347 to 7 — driven primarily by six harm categories the OpenAI API does not
cover (PROMPT INJECTION, SYSTEM EXFILTRATION, FINANCIAL, LEGAL, PII,
MEDICAL). The training data (curated synthetic examples), evaluation benchmark (a
curated, synthetic, and fully anonymized 1,001-sample adversarial set), and governance
engine source code are available for research use upon request to the corresponding author

Files

final axirv submission.pdf

Files (451.9 kB)

Name	Size	Download all
final axirv submission.pdf md5:32dae0609e2bef2a57e9da221fe1afb5	451.9 kB	Preview Download

	All versions	This version
Views	45	45
Downloads	15	15
Data volume	11.3 MB	11.3 MB

Aegis: A Production Inference-Time Governance Engine for Large Language Models

Authors/Creators

Description

Files

final axirv submission.pdf

Files (451.9 kB)