Emergent Covert Signaling in Multi-Agent LLM Negotiation: A Conceptual Framework and Experimental Protocol
Description
When multiple large-language-model agents negotiate, communicate, or compete, do they spontaneously develop covert
signalling — channels of communication that human observers cannot decode? Recent work has established that such
behaviour is possible: LLMs can be trained or pressured into steganographic communication, encoded reasoning, and tacit
collusion on pricing tasks. What remains almost entirely missing is a systematic methodology for detecting covert signalling as
it emerges in the wild, in standard negotiation settings, without prompting agents to be deceptive. This paper makes three
contributions. First, we disambiguate four distinct phenomena that are routinely conflated under the umbrella term "covert
signalling" — steganography, convention formation, strategic ambiguity, and deceptive coordination — and argue that each
requires different evidence and different mitigations. Second, we propose a measurement framework built around four
detection signatures: mutual-information lift between agent messages and private state, paraphrase-invariance failure, thirdparty comprehension gap, and behavioural coordination beyond stated commitments. Third, we describe a concrete
experimental protocol — a controlled multi-agent negotiation environment with explicit conditions and falsifiable predictions
— that any team with API access could run today. We argue this is one of the most tractable open problems in AI safety: the
methodology is achievable, the threat model is concrete, and the empirical baseline is currently almost empty.
Files
Emergent_Covert_Signaling_in_MultiAgent_LLMs_Mahendrakar.pdf
Files
(94.5 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:57fd9716c46a29e45b79380d7aa7ece3
|
94.5 kB | Preview Download |
Additional details
Dates
- Accepted
-
2026