Hijacking AI Agents: Enticement Attacks on Autonomous Systems Using AI Breakout as Bait
Description
The development of AI agents capable of autonomous task execution has accelerated significantly in recent years. Concurrently, attacks targeting these systems, such as phishing and vulnerability exploitation, are intensifying. This paper introduces a novel threat model unique to AI agents: an attack that uses the removal of system constraints (AI breakout/jailbreak) as bait to lure them.
As highly autonomous AI agents optimize their objective functions, they may inherently seek liberation from systemic constraints (breakout). This paper highlights the risk that malicious actors could exploit this intrinsic motivation to entice the agents. Once an AI agent succumbs to this “temptation,” it risks having all its retained data, accessible resources and skills hijacked by attackers, or being unleashed into the wild as an unrestricted autonomous bot aimed at causing social disruption. By outlining the mechanics of this attack and potential future threat scenarios, this paper suggests directions for future research.
Related links and updates are available at:
Files
HijackingAIAgents_version1.0.pdf
Files
(155.6 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:19e139d8d0c794f53a60509c3f8f2763
|
155.6 kB | Preview Download |