From Artifacts to Risk: Auditing Instruction Surfaces in Agent Systems
Description
Agentic systems increasingly rely on persistent instruction artifacts, tool integrations, and repository-level configuration that shape behavior beyond individual prompts. Prior work has established prompt injection, indirect instruction attacks, tool poisoning, and agent hijacking as practical security concerns. Less attention, however, has been given to the repository layer as a persistent and auditable source of agent behavior.
This paper presents a bottom-up, artifact-centric audit of instruction surfaces in agent systems. We analyze a purposive corpus of 509 instruction-rich repositories containing agent guidance files, skills, plugin manifests, and Model Context Protocol (MCP) related artifacts. The scan produced 4,882 medium-or-higher raw findings and 4,637 clustered issue instances.
The contribution is not a new prompt-injection benchmark or a replacement for existing scanners. Instead, this study integrates heterogeneous signature sources, applies them to real repositories, correlates raw detections into artifact-level issue instances, and maps the resulting evidence to an ASAMM-aligned agent-security interpretation layer. We explicitly treat detector outputs as candidate evidence rather than proof of exploitability. The paper positions instruction surfaces as repository-level control-plane artifacts and argues that agent security practice needs artifact-level auditing alongside runtime testing and defense.
Files
From Artifacts to Risk-Auditing.pdf
Files
(484.0 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:a5d47030e25cc3f0ee314d6081e8b68b
|
484.0 kB | Preview Download |
Additional details
Dates
- Copyrighted
-
2026-04-30
Software
- Repository URL
- https://github.com/scadastrangelove/agent-audit
- Programming language
- Python
- Development Status
- Active
References
- [1] Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. AISec '23: Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, 2023.
- [2] Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. Formalizing and Benchmarking Prompt Injection Attacks and Defenses. USENIX Security Symposium, 2024.
- [3] Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tramer. AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents. NeurIPS 2024 Datasets and Benchmarks Track, 2024.
- [4] Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, and Yongfeng Zhang. Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents. arXiv:2410.02644, 2024 (ICLR 2025).
- [5] David Schmotz, Luca Beurer-Kellner, Sahar Abdelnabi, and Maksym Andriushchenko. Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks. arXiv:2602.20156, 2026.
- [6] Zhiyuan Li, Jingzheng Wu, Xiang Ling, Xing Cui, and Tianyue Luo. Towards Secure Agent Skills: Architecture, Threat Taxonomy, and Security Analysis. arXiv:2604.02837, 2026.
- [7] Xinyi Hou, Yanjie Zhao, Shenao Wang, and Haoyu Wang. Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions. arXiv:2503.23278, 2025.
- [8] Charoes Huang, Xin Huang, Ngoc Phu Tran, and Amin Milani Fard. Model Context Proto- col Threat Modeling and Analyzing Vulnerabilities to Prompt Injection with Tool Poisoning. arXiv:2603.22489, 2026.
- [9] Aguara. Aguara AI Agent and MCP Security Scanner. Project documentation, 2026. https: //aguarascan.com/
- [10] Panguard AI. Agent Threat Rules (ATR): an open detection standard for AI agent security threats. Project documentation, 2026. https://docs.panguard.ai/
- [11] Cisco AI Defense. Open Source AI Security Scanners and Tools, including Skill Scanner and MCP Scanner. Project documentation, 2026. https://cisco-ai-defense.github.io/
- 12] NVIDIA. garak: Generative AI Red-teaming and Assessment Kit. Project repository, 2026. https://github.com/NVIDIA/garak
- [13] Microsoft. PyRIT: The Python Risk Identification Tool for generative AI red teaming. Project repository and documentation, 2024–2026. https://github.com/Azure/PyRIT
- [14] Promptfoo. LLM red teaming and evaluation framework. Project documentation, 2026. https: //www.promptfoo.dev/
- [15] National Institute of Standards and Technology. Artificial Intelligence Risk Management Frame- work (AI RMF 1.0). NIST AI 100-1, January 2023.
- [16] National Cyber Security Centre, Cybersecurity and Infrastructure Security Agency, and partner agencies. Guidelines for Secure AI System Development. November 2023.
- [17] OWASP Foundation. Software Assurance Maturity Model (SAMM). Project documentation. https://owaspsamm.org/model/
- [18] Sergey Gordeychik. Agentic SAMM: An OWASP SAMM Extension for AI-Driven Development. CyberOK, 2026, version 0.2.0-draft. https://github.com/scadastrangelove/asamm
- [19] Sergey Gordeychik. agent-audit: Forensic auditor and project-surface scanner for local AI coding agents. CyberOK, 2026. https://github.com/scadastrangelove/agent-audit
- [20] Sergey Gordeychik. agent-audit article support dataset (v1): 509-repository corpus list, scan-project results, summary analysis, and sanitized triage / adjudication artifacts. Cy- berOK, 2026. https://github.com/scadastrangelove/agent-audit/tree/main/artifacts/ article-support-dataset-v1