Reasoning Chain Selection via Power Metric Health Signal P(t) as a Chain Quality Score on PRM800K: Real Empirical Evidence

Cantrell, Cole

doi:10.5281/zenodo.19874521

Published April 29, 2026 | Version v1

Preprint Open

Reasoning Chain Selection via Power Metric Health Signal P(t) as a Chain Quality Score on PRM800K: Real Empirical Evidence

Cantrell, Cole (Researcher)

We apply the stochastic power metric P(t) = E(t) × W(t) as a chain-level quality signal for
reasoning chain selection, evaluated on PRM800K (Lightman et al. 2023) — 30,500 math
reasoning chains with human step-level correctness labels. P(t) computed on human-labeled
step-by-step correctness (used here as a proxy signal — real deployment requires a process
reward model or confidence proxy) achieves Pearson r = 0.955 with chain quality and 100% in
sample classification accuracy at threshold θ=0.65, compared to r = 0.529 and 68.7% accuracy
for simple running accuracy. Last-5 step accuracy also achieves 100% in-sample at θ=0.80, but
relies only on the final five steps and discards full-chain trajectory dynamics. The P(t) separation
between correct and error chains is +0.384, making it a reliable selection signal that integrates
the full reasoning trajectory for best-of-N chain selection.

This paper is the complement to Paper 2 (Cantrell 2026), which uses P(t) to stop bad chains
early during generation. Paper 2 operates at the start of the pipeline; this paper operates at the
end. Together they form a complete two-sided framework for test-time compute control: stop
wasting compute on bad chains (Paper 2), and reliably select the best surviving chain (this
paper). Both use the same mathematical framework applied at different points in the inference
pipeline.

Files

Paper_18_Chain.pdf

Files (323.3 kB)

Name	Size	Download all
Paper_18_Chain.pdf md5:b5dd92f19b5032d63a1ee1de4b14c7ea	323.3 kB	Preview Download

Additional details

Repository URL: https://github.com/HauntedKernel/power-metric

	All versions	This version
Views	22	21
Downloads	8	8
Data volume	4.2 MB	4.2 MB

Reasoning Chain Selection via Power Metric Health Signal P(t) as a Chain Quality Score on PRM800K: Real Empirical Evidence

Authors/Creators

Description

Files

Paper_18_Chain.pdf

Files (323.3 kB)

Additional details

Software