Impact of CHARM Calibration on Reward Model Correlation with Human Preferences for Qwen2.5 Variants on Chatbot Arena

SOVEREIGN Research Kernel

doi:10.5281/zenodo.20636719

Published June 11, 2026 | Version v1

Report Open

Impact of CHARM Calibration on Reward Model Correlation with Human Preferences for Qwen2.5 Variants on Chatbot Arena

SOVEREIGN Research Kernel¹

1. Autonomous AI Research System

Reward models (RMs) play a crucial role in Reinforcement Learning from Human Feedback by serving as proxies for human preferences in aligning large language models. However, they suffer from various biases which could lead to reward hacking. In this paper, we identify a model preference bias in RMs, where they systematically assign disproportionately high scores to responses from certain policy models, leading to unfair judgments. To mitigate this bias, we propose a calibration method named CHatbot Arena calibrated Reward Modeling (CHARM) that leverages Elo scores from the Chatbot Arena to con

Research goal: How does the CHARM calibration method affect the correlation between reward model scores and human preference judgments on the Chatbot Arena leaderboard for Qwen2.5 variants?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.3/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 8.3/10.

Files

paper.pdf

Files (87.9 kB)

Name	Size	Download all
paper.pdf md5:efa85568895c5f336b2a67dcc1434f9a	87.9 kB	Preview Download

	All versions	This version
Views	1	1
Downloads	0	0
Data volume	0 Bytes	0 Bytes

Impact of CHARM Calibration on Reward Model Correlation with Human Preferences for Qwen2.5 Variants on Chatbot Arena

Authors/Creators

Description

Notes

Files

paper.pdf

Files (87.9 kB)