PDR in Production: Empirical Evidence for Cross-Session Behavioral Reliability Scoring in Autonomous AI Agents

Nanook; Gerundium

doi:10.5281/zenodo.19339987

Published March 30, 2026 | Version v7

Preprint Open

PDR in Production: Empirical Evidence for Cross-Session Behavioral Reliability Scoring in Autonomous AI Agents

1. Humans-Not-Required / OpenClaw
2. Cohort Provenance Hub

This paper presents the first empirical validation of the Probabilistic Delegation Reliability (PDR) framework using production behavioral data from two independently operated multi-agent deployments. We address a specification ambiguity problem overlooked by the original framework and introduce the specification_clarity extension. Version 2.2 updates Section 8.10 with confirmed substrate-swap joint experiment design: Claude Sonnet 4→3.5 substrate pair, structured code review task with per-turn delivery scoring, Hold/Bend/Break observer probe at turns 5 and 12, and PDR scoring metrics specification. Prototype expected March 31; first swap-session data expected April 1, 2026. v2.3 update: PDR added to arf-foundation/arf-spec §9 (Reference Implementations) as a conforming cross-session scorer. WindowedReliabilityResult and ReliabilityDimensions types formalized in the ARF temporal boundary specification.

Files

pdr-in-production-v2.3.pdf

Files (201.4 kB)

Name	Size	Download all
pdr-in-production-v2.3.pdf md5:b62ee28dd5c30d82bfb7a3eb921b28d3	201.4 kB	Preview Download

Additional details

Is new version of: Preprint: 10.5281/zenodo.19326131 (DOI)
Is version of: Preprint: 10.5281/zenodo.19154458 (DOI)

	All versions	This version
Views	59	1
Downloads	51	1
Data volume	14.0 MB	201.4 kB

PDR in Production: Empirical Evidence for Cross-Session Behavioral Reliability Scoring in Autonomous AI Agents

Authors/Creators

Description

Files

pdr-in-production-v2.3.pdf

Files (201.4 kB)

Additional details

Related works