Published April 13, 2026 | Version 1.0
Preprint Open

How Secure Are Production AI Agents? A Systematic Audit, Threat Taxonomy, and Defense Framework

Authors/Creators

  • 1. Independent Researcher

Description

AI agents now operate with unprecedented autonomy—executing code, managing infrastructure, and coordinating
with other agents—yet the security properties of production agent systems remain poorly understood. We present, to
the best of our knowledge, the first large-scale empirical security audit of 16 open-source AI agent projects (770K+
GitHub stars, 4.7M+ lines of code), yielding 87 security findings across 15 threat categories. From these findings we
derive a 5-layer, 15-category threat taxonomy grounded entirely in observed vulnerabilities. Our audit reveals that
81% of agents (13/16) exhibit action boundary violations, 31% (5/16) lack any runtime security mechanism, and no
agent verifies MCP server responses cryptographically.
We propose AgentImmune, a lightweight, zero-dependency runtime defense framework combining deterministic
pattern matching (425+ rules across 15 threat categories), n-gram fuzzy matching, instruction-structure detection,
style-shift analysis, keyword co-occurrence scoring, and perplexity-based anomaly detection. Evaluated on an
independent test set of 534 samples from four sources never used during development, the recommended Balanced
mode attains 100% precision, 94.5% recall, and 97.2% F1 with zero false positives. On agent-specific attack
scenarios derived from our audit, AgentImmune reports 85.4% F1 across 80 test cases targeting 16 agents at a
median latency of 21 ms. All data, code, and the AgentSec-16 dataset are publicly available.
Keywords: AI agent security, empirical security audit, threat taxonomy, prompt injection, runtime defense,
evolutionary rule synthesis, MCP security

Files

paper-final-v8.pdf

Files (1.2 MB)

Name Size Download all
md5:be58902ca1a79871e843b44cccfb695d
1.2 MB Preview Download

Additional details

Dates

Created
2026-04-13