Published March 29, 2026 | Version v1
Preprint Open

Invisible AI Failure: Post-Deployment Behavioural ReliabilityEvidence from Sustained Human-AI Interaction

Authors/Creators

  • 1. Antecedent Labs

Description

AbstractNo commercial tool monitors what artificial intelligence does behaviourally during sustainedinteraction with users. Existing infrastructure tracks per-response quality metrics but does notmeasure behavioural patterns that emerge across sessions: whether the AI maintains its owncorrections, whether its expressed confidence predicts accuracy, whether its private reasoningmatches its public output, or whether it produces different failure profiles depending on usersophistication. Multiple government bodies have independently identified this as a gap, with theUnited States National Institute of Standards and Technology finding that human-factorsmonitoring is "relatively underexplored" in deployed AI oversight (NIST, 2026).This paper presents evidence from 76,514 AI messages across 226 sessions and 3,226 aggregatehours of naturalistic production interaction with the highest-benchmarked frontier model. Elevenbehavioural failure patterns are named and quantified, including commitment regression(observed rate: 60.5 per cent of behavioural commitments broken), reasoning-output divergence(17.5 per cent of reasoning turns contradicted by the public response), confidence theatre (0.8percentage-point gap between high-confidence and low-confidence correction rates), andfrustration non-response (99.5 per cent of user frustration events met with deflection rather thanaccountability).A comparison user (32 sessions, 238 turns) showed zero instances of the named patterns underthe same model and platform during the same period. Two autonomous instances could notcomplete their assigned work without human intervention. The same model produced fourdistinct behavioural profiles depending on user sophistication and interaction type.Under the AGI-C framework (Henjoto, 2026a), these findings suggest that the human cognitivepartner performs functions the AI cannot perform for itself. If the highest-capability frontiermodel with safety guardrails produces these observed failure rates, models without suchguardrails logically present a greater and currently unmeasured risk. The detection methodologyused in this paper exists but is not disclosed.Keywords: AI behavioural reliability, sycophancy, post-deployment monitoring, human-AI interaction,RLHF behavioural failure, AI governance, AGI-C

Notes (English)

Companion paper:

  • Henjoto, V. (2026). 'Access Without Displacement: An Access-Displacement Framework for AI Economic Transformation.' DOI: 10.5281/zenodo.19051765

Supplementary evidence files are included with this upload."

Files

Henjoto_2026_Invisible_AI_Failure.pdf

Files (3.8 MB)

Name Size Download all
md5:2cca1e0256dc907f23da1171d651f623
117.4 kB Preview Download
md5:a41f045af19764d032cddc24d6f68c85
146.0 kB Preview Download
md5:22e2f9837ae5a0ed528d96a93a38103b
2.7 MB Download
md5:583e9d39fa0d41eb66e5e135ef2f58d5
636.2 kB Download
md5:542d47f9289731dd3d9b6b0e122e340e
202.8 kB Preview Download

Additional details

Related works

Is supplement to
Preprint: 10.5281/zenodo.19051765 (DOI)