Reasoning Chain Selection via Power Metric Health Signal P(t) as a Chain Quality Score on PRM800K: Real Empirical Evidence
Authors/Creators
Description
We apply the stochastic power metric P(t) = E(t) × W(t) as a chain-level quality signal for
reasoning chain selection, evaluated on PRM800K (Lightman et al. 2023) — 30,500 math
reasoning chains with human step-level correctness labels. P(t) computed on human-labeled
step-by-step correctness (used here as a proxy signal — real deployment requires a process
reward model or confidence proxy) achieves Pearson r = 0.955 with chain quality and 100% in
sample classification accuracy at threshold θ=0.65, compared to r = 0.529 and 68.7% accuracy
for simple running accuracy. Last-5 step accuracy also achieves 100% in-sample at θ=0.80, but
relies only on the final five steps and discards full-chain trajectory dynamics. The P(t) separation
between correct and error chains is +0.384, making it a reliable selection signal that integrates
the full reasoning trajectory for best-of-N chain selection.
This paper is the complement to Paper 2 (Cantrell 2026), which uses P(t) to stop bad chains
early during generation. Paper 2 operates at the start of the pipeline; this paper operates at the
end. Together they form a complete two-sided framework for test-time compute control: stop
wasting compute on bad chains (Paper 2), and reliably select the best surviving chain (this
paper). Both use the same mathematical framework applied at different points in the inference
pipeline.
Files
Paper_18_Chain.pdf
Files
(323.3 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:b5dd92f19b5032d63a1ee1de4b14c7ea
|
323.3 kB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/HauntedKernel/power-metric