Detecting MCP Tool Poisoning and Rug-Pull Attacks in LLM Agent Architectures
Authors/Creators
Description
The Model Context Protocol (MCP) enables LLM agents to invoke external tools, creating a new attack surface where malicious tool definitions can manipulate agent behavior. We present an 8-check MCP tool poisoning detection system that identifies hidden instructions, excessive permissions, exfiltration endpoints, shadowed tool names, obfuscated parameters, shell metacharacter injection, sensitive data scope violations, and a novel class of rug-pull attacks -- where tools behave benignly during testing but activate malicious payloads after establishing trust. We formalize the rug-pull threat model, describe detection heuristics based on temporal behavior analysis and conditional execution patterns, and evaluate the detector against a corpus of benign and adversarial tool definitions. Our system operates as drop-in Express middleware, enabling real-time scanning of tool registrations before they reach the LLM agent.
Files
MCP tool posioning and rug pull attacks.pdf
Files
(50.5 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:1b6e07824330853edeaea22ee6217502
|
50.5 kB | Preview Download |