Published May 6, 2026
| Version 1
Preprint
Open
No Free Signal: A Negative Result for Substrate-Evolution Around Fixed LLMs in an Embodied Multi-Agent Population
Authors/Creators
Contributors
Researcher:
Description
This deposit contains the full reproducibility artifact for the paper
No Free Signal: A Negative Result for Substrate-Evolution Around Fixed
LLMs in an Embodied Multi-Agent Population: the manuscript (PDF, DOCX,
Markdown), the experimental harness, the analysis code, and the canonical
fitness dataset.
We test whether a fixed, non-fine-tuned large language model (Amazon Nova
Lite) can become adaptively useful in a 25-creature embodied multi-agent
population when selection pressure acts on the heritable communication
substrate around the model — production bias, perception attention,
emission gating — rather than on the model's own weights. The seven-arm
design isolates the contributions of substrate evolution, LLM presence,
LLM context-sensitivity, and emission shape, with matched controls
including a mute baseline, a no-emitter baseline, a frozen-substrate-with-
LLM control, a scrambled-LLM ablation, a replay-randomized LLM ablation,
and a no-LLM uniform-noise emitter at approximately matched cadence. Each
arm is run on 20 paired seeds for 15,000 ticks, yielding 140 controlled
runs.
The substrate-evolution hypothesis is not supported by these data. The
full-stack treatment did not outperform the frozen-substrate LLM baseline,
the mute control, the replay-randomized LLM, the scrambled LLM, or the
cadence-targeted random-emitter control. Among the four evolvable
emission-bearing arms (D, E, F, G), population AUC was statistically
indistinguishable; the cleanest internal contrast (F vs G, same substrate,
differing only on emission-source identity) gave a coin flip with Cohen's
d ≈ 0.00. No LLM-vs-control comparison crosses the 95% threshold under a
paired-seed bootstrap. The substrate-only no-emitter arm is descriptively
lowest, but the C-vs-emission contrasts are not statistically resolved at
n=20.
A complementary behavioral receiver-response analysis on a separately
instrumented run shows that emission *source* does affect local receiver
behavior even when fitness outcomes are equal: random-noise emissions
produce more flee-like movement after a heard event than LLM-shaped
emissions on the same metric, with effects modest in magnitude (~0.1–0.2
predator-distance units per heard event). Population-level fitness can
therefore hide behavioral discrimination between fitness-equivalent
emission sources.
Methodological contribution: matched-noise and semantics-broken controls
reveal whether claimed LLM-agent fitness gains arise from model
intelligence or from persistent signaling channels coupled to adaptive
scaffolding. Receiver-response analysis split by self-hearing vs non-self
disentangles social communication from self-feedback. We argue that
LLM-agent comparisons should be supplemented with (a) approximately
cadence-matched noise controls, (b) per-event receiver-response analysis,
and (c) explicit reporting of effect-size magnitudes alongside
null-hypothesis tests.
Files
REPORT.pdf
Files
(800.4 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:29d53cd72413e0a89d3f274576a94015
|
800.4 kB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/gdf-ai/no-free-signal