Published May 15, 2026 | Version v1
Preprint Open

Relational Cognitive Telemetry for Long-Lived LLM Agent Societies: From Internal State Monitoring to Collective Performance

Authors/Creators

  • 1. Independent Researcher, Taipei, Taiwan

Description

Long-lived large language model (LLM) agent systems — collectives of named agents with persistent memory, individual parameter state, and continuous interaction with each other and the world — are now technically feasible and increasingly deployed. The debugging and governance practices inherited from short-lived LLM applications are not. Trace-level observability tools record what each agent said and when; they do not measure whether an agent's internal cognitive state has drifted across weeks, whether two agents have developed measurably different ways of attending to each other, or whether the relational structure of an agent society predicts its performance on a collective task.

I propose <em>Relational Cognitive Telemetry</em> (RCT), a telemetry-first framework with two observability lenses: (i) <em>cognitive telemetry</em>, exposing each agent's internal parameter state as a measurable, queryable surface; and (ii) <em>relational telemetry</em>, exposing the directed attention and trust structure between agents as a measurable network. Two lenses, one substrate. I instantiate RCT in the Charenix Lobster Substrate — a live LLM substrate of 20 named agents, currently exposing 441 stable model-layer cognitive parameter families, 682 unfolded observable state fields, and 14 core numeric telemetry dimensions per agent. The present analysis focuses on a 10-agent primary analysis cohort for which directed trust, directed listening exposure, and complete C(10,3) = 120 three-agent Tiamat sandbox triple coverage are all available. All 120 triples were run through 12 seeded trials each: 1,440 controlled raids, plus an ecological-validity benchmark drawn from live raid memory.

I report three empirical findings. First, directed listening exposure between agents is measurably asymmetric and discriminative across the cohort, supporting the use of attention flow as a substrate-level relational metric. Second, the standard local-brain trust value, while present and structured, is saturated in the current substrate, producing low between-pair variance and limiting its independent discriminative power — an honestly reported negative result that motivates trust as a diagnostic, not a primary signal. Third, controlled Tiamat sandbox outcomes show a low-success regime that is reproducible across the full 120-triple coverage; coverage-supported live raid teams fall in the same low-success regime, supporting ecological validity without licensing the use of live memory as a primary outcome layer.

I do not claim that the agents are conscious. I do not claim that internal cognitive parameter values cause raid wins. I claim something narrower: that a long-lived LLM agent society can be made measurable along both an internal and a relational dimension, that the resulting measurements expose regularities absent in trace-level observability, and that the substrate is dense enough, and the experimental protocol concrete enough, that the framework can be falsified, attacked, or improved by external parties.

Notes (English)

Empirical anchors: This paper synthesises framework-level claims from previously-deposited empirical records of the Lobster Observatory substrate. The listening, trust, and Tiamat outcome matrices (directed_coop_matrix_raw_bySpeaker.csv, directed_trust_matrix_localBrain_socialInfluence.csv, tiamat_120_triple_records.csv, live_raid_memory_120_triple_coverage.csv) were first deposited as Zenodo 10.5281/zenodo.20018183. The 1,743-hour cognitive-parameter panel underlying the predictive-validity discussion was deposited as Zenodo 10.5281/zenodo.20083802. The substrate architecture documented in §3 was specified in Zenodo 10.5281/zenodo.19982724. This deposit reuses those matrices and panel data as empirical anchors for the Relational Cognitive Telemetry framework, and does not re-publish raw data.

Companion forthcoming: LOBSTER-Bench (benchmark synthesis using the same substrate; DOI to be added on deposit).

Files

chen_listening-trust-asymmetry-team-outcomes_2026_data.zip

Files (215.5 kB)

Name Size Download all
md5:edd9408c9373726979d8c02904361d36
20.8 kB Preview Download
md5:0eb1dedcdcad22dcf0421e45758723d1
138.5 kB Preview Download
md5:127d16351e2af28b5d5e71d13fdde737
56.2 kB Download

Additional details

Additional titles

Subtitle (English)
Lobster Observatory Paper 23