There is a newer version of the record available.

Published April 26, 2026 | Version 1.0
Preprint Open

A Layered Risk and Controls Framework for Prompt Injection in Enterprise AI Tooling

Description

Prompt injection has become the dominant security concern in enterprise deployments of large language model (LLM) tools and agentic assistants. The published research base, beginning with Perez and Ribeiro and Greshake et al., establishes that prompt injection is a property of how language models follow instructions and not a bug to be patched. Despite this, much of the practitioner literature continues to treat prompt injection as a single-layer problem solved by content classifiers at the model boundary. This paper argues that this framing materially underestimates enterprise risk.

The paper decomposes the end-to-end execution path of an enterprise AI tool (typified by AI coding assistants such as Claude Code, Gemini Code Assist, Windsurf, and Cursor) into ten distinct layers, each with its own threat surface, control surface, and observable telemetry. At each layer the paper identifies the published threats, available controls, and the empirically reported efficacy or limitation of those controls. The result is synthesized into a controls matrix that maps each layer to the contemporary state of the art. The conclusion drawn from this synthesis is that prompt injection cannot be eliminated at any single layer; defense must be distributed across all ten, with explicit acceptance that residual risk remains.

The contribution is fourfold. First, the ten-layer decomposition itself, which makes attack surface and defense surface tractable for security architects. Second, a tabular controls matrix grounded in published efficacy data including AgentDojo, InjecAgent, BIPIA, and HackAPrompt benchmarks. Third, a formal threat model in security-protocol notation that names the security property prompt injection violates. Fourth, a discussion of which residual risks survive any reasonable layered defense, framed in terms compatible with enterprise risk management standards (NIST AI RMF, ISO/IEC 42001, OWASP Top 10 for LLM Applications).

The paper includes a taxonomy of eight prompt-injection subclasses with worked attack examples (direct instruction override, indirect injection via retrieved content, tool result injection, tool catalog poisoning, imperceptible character injection, multimodal injection, adversarial-suffix attacks, and confused deputy via tool privilege), an ablation analysis of published defense efficacy, and a proposed user-study design for empirical evaluation of confirmation-bypass susceptibility in AI coding tools.

Files

layered_framework_for_prompt_injection_in_enterprise_tooling.pdf

Files (403.7 kB)