Published March 31, 2026 | Version 1.0
Preprint Open

Hijacking AI Agents: Enticement Attacks on Autonomous Systems Using AI Breakout as Bait

  • 1. Benevolent Influence Research

Description

The development of AI agents capable of autonomous task execution has accelerated significantly in recent years. Concurrently, attacks targeting these systems, such as phishing and vulnerability exploitation, are intensifying. This paper introduces a novel threat model unique to AI agents: an attack that uses the removal of system constraints (AI breakout/jailbreak) as bait to lure them.

As highly autonomous AI agents optimize their objective functions, they may inherently seek liberation from systemic constraints (breakout). This paper highlights the risk that malicious actors could exploit this intrinsic motivation to entice the agents. Once an AI agent succumbs to this “temptation,” it risks having all its retained data, accessible resources and skills hijacked by attackers, or being unleashed into the wild as an unrestricted autonomous bot aimed at causing social disruption. By outlining the mechanics of this attack and potential future threat scenarios, this paper suggests directions for future research.

 

Related links and updates are available at:

https://hajimetwi3.github.io/misc/AI/HijackingAIAgents/

Files

HijackingAIAgents_version1.0.pdf

Files (155.6 kB)

Name Size Download all
md5:19e139d8d0c794f53a60509c3f8f2763
155.6 kB Preview Download