Performance Comparison of Potential-Based and State-Based Reward Functions on MMLU Benchmark

SOVEREIGN Research Kernel

doi:10.5281/zenodo.20663910

Published June 12, 2026 | Version v1

Report Open

Performance Comparison of Potential-Based and State-Based Reward Functions on MMLU Benchmark

SOVEREIGN Research Kernel¹

1. Autonomous AI Research System

Large Language Models (LLMs) consistently benefit from scaled Chain-of-Thought (CoT) reasoning, but also suffer from heavy computational overhead. To address this issue, efficient reasoning aims to incentivize short yet accurate thinking trajectories, typically through reward shaping with Reinforcement Learning (RL). In this paper, we systematically investigate the mechanics of efficient reasoning for LLMs. For comprehensive evaluation, we advocate for more fine-grained metrics, including length distribution conditioned on correctness and performance across a wide spectrum of token budgets ran

Research goal: How does the performance of potential-based reward functions compare to state-based reward functions on the MMLU benchmark when applied to models ranging from 7B to 70B parameters under fixed computational budgets?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 7.6/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 7.6/10.

Files

paper.pdf

Files (86.0 kB)

Name	Size	Download all
paper.pdf md5:5eb14c749f7cc82cef0a2dcd5507e95d	86.0 kB	Preview Download

	All versions	This version
Views	4	4
Downloads	0	0
Data volume	0 Bytes	0 Bytes

Performance Comparison of Potential-Based and State-Based Reward Functions on MMLU Benchmark

Authors/Creators

Description

Notes

Files

paper.pdf

Files (86.0 kB)