Published 2026 | Version v2
Publication Open

Argo AI Testing Protocol: Sustained Multi Axis Load Testing

Authors/Creators

Description

Most evaluation of conversational AI relies on short, prompt‑based tests that fail to reflect how real people use these systems in real and diverse situations. Such tests do not capture the demands of extended interaction, shifting user intent, or the cumulative effects of context over time. This paper introduces the Argo AI Testing Protocol (the Argo Protocol), a conceptual approach for evaluating AI systems within the User Interaction Space - the full set of observable outputs and interactions available to a user.

The Protocol outlines a long‑form, multi‑dimensional perspective on evaluation, recognising that behaviour emerges across extended interaction rather than isolated prompts. It describes a set of conceptual load dimensions that influence model behaviour, without prescribing specific procedures, measurements, or implementation details. The Protocol’s purpose is to provide a vocabulary and framing that developers can adapt to their own environments, rather than a fixed or prescriptive testing method.

The aim here is not to define a standard, though the Protocol may serve as a starting point should the field require a formalised approach in the future. By grounding evaluation in the observable behaviour of the User Interaction Space under sustained, multi‑dimensional conditions, the Argo Protocol offers a conceptual route toward more realistic assessment of how AI systems behave when used by real people.

 

Project website: https://assimilatedhuman.github.io/inquisitor-labs/

Files

ARGO AI TESTING PROTOCOL V2 Full Paper.pdf

Files (247.8 kB)

Name Size Download all
md5:ee618d8308860f015ec047528c8f4460
247.8 kB Preview Download