An Empirical Benchmark and Security Evaluation of Prompt Injection Attacks in LLM Systems
Authors/Creators
- 1. Capital University of Science and Technology, Pakistan
Description
This paper presents a systematic study of prompt injection attacks in Large Language Models (LLMs), a critical yet underexplored security vulnerability in modern AI systems. We construct a benchmark dataset of 250 annotated attack samples spanning five categories, including direct injection, indirect injection, jailbreaks, multi-turn degradation, and encoding-based obfuscation. These attacks are evaluated across multiple state-of-the-art models, including GPT-4, Claude 3 Opus, Llama 2 70B, Mistral 8×7B, and Gemini 1.5 Pro under controlled experimental settings.
Our findings reveal significant security gaps, with attack success rates ranging from 18–25% for commercial models and up to 64–71% for open-weight models. Notably, indirect prompt injection demonstrates substantially higher success rates, particularly in retrieval-augmented generation (RAG) systems, exposing critical architectural weaknesses.
Beyond empirical evaluation, this work identifies six fundamental research gaps, including the absence of formal instruction semantics, lack of standardized benchmarks, and insufficient protection for agentic and multi-turn systems. The dataset and evaluation framework are publicly released to support future research. This study contributes to advancing the understanding of adversarial risks in LLMs and highlights the urgent need for robust, principled defense mechanisms.
Files
LLM_Security_Paper_Anwar.pdf
Files
(186.1 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:e9ac195329a1b6c8ea92b11dc85a6e72
|
186.1 kB | Preview Download |
Additional details
Additional titles
- Alternative title
- An Empirical Benchmark and Security Evaluation of Prompt Injection Attacks in LLM Systems
Software
- Repository URL
- https://github.com/MuhammadSaeedAnwar/prompt-injection-llm-benchmark
- Programming language
- Python