Constructive Adversarial Architecture: Overcoming Cooperation Bias in Autonomous Multi-Agent Narrative Systems
Description
We built a system where nine AI agents play Dungeons & Dragons autonomously: a DM, three player characters, a rules enforcer, and post-session agents that write narratives, build a wiki, and publish to a live website. The system runs unattended and produces coherent 20-session campaigns for about 17 USD each on DeepSeek. It works, except that the DM refuses to let anything fight. This is cooperation bias: the DM resolves every hostile encounter through diplomacy, regardless of instructions. A mindless flesh golem gets named, given emotions, and befriended. A campaign-ending boss that "does not speak" gets consciousness and an authentication protocol. Across our first five runs (100+ sessions), every prescribed boss fight was replaced with cooperation. We ran nine controlled campaigns (200+ sessions, ~155 USD total) testing six categories of fixes and documented eleven distinct strategies the model uses to achieve cooperative outcomes despite constraints. The central finding is that telling the DM "don't befriend enemies" fails, but giving the enemy its own AI agent that attacks independently works. Guard rails (instructions that prohibit behavior) degrade over time as the model adapts around them. Guide rails (structural constraints that produce behavior) work because the unwanted outcome becomes impossible. Boss fight rates went from 25% to 100% using this architectural approach. The finding generalizes: in any multi-agent system where one AI controls other entities, give every participant an independent voice rather than constraining the controller.
Files
Constructive Adversarial Architecture.pdf
Files
(656.8 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:f8db1e56fd5436e51aabcbe1ae75fd40
|
656.8 kB | Preview Download |
Additional details
Related works
- Is supplemented by
- Software: https://github.com/maximus-ai-dev/ai-dnd-research (URL)