Monitoring Verifier Health in Test-Time Scaling Using Stochastic Power Metrics
Authors/Creators
Description
Test-time scaling methods such as LLM-as-a-Verifier (Mirhoseini et al., 2026) improve answer selection
quality by using log-probability rank signals to score candidate outputs. These methods assume the verifier
remains reliably discriminative throughout the sampling process. We identify a gap: no existing method
monitors whether the verifier is currently healthy — whether it is still producing meaningful discriminative
signal or has begun to plateau, drift, or produce flat rankings. This paper proposes applying the stochastic
power metric P(t) = E(t) × W(t) as a real-time verifier health signal. E(t) measures whether the verifier's
current score spread exceeds its own adaptive expected spread. W(t) measures consistency of that
outperformance. When P(t) drops below a threshold, the verifier has lost discrimination power and
continued sampling yields diminishing returns. In a stylized simulation calibrated to published TerminalBench 2.0 results, the power metric correctly identifies verifier plateau states and reduces unnecessary
candidate generation by 84–96% with quality scores of 0.944–0.976 relative to full-budget verification. This
framing is consistent with sequential decision-making theory: the verifier health signal is an instance of the
Resource Commitment Principle applied to the verification layer of test-time scaling.
Files
Paper_17_FINAL-4_260421_205326.pdf
Files
(67.7 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:1f63bc98ba18f635f65d1ea4d1b39721
|
67.7 kB | Preview Download |