TokenPPO: Token-Level Reinforcement Learning for Diffusion Model Generation

Published August 5, 2025 | Version v1

Preprint Open

With the increasing parameterization of diffusion-based image generation

models, the scope of prompts that can be processed has expanded, result-

ing in more diverse and complex generation tasks. However, this growth

introduces challenges related to attention distribution. Even with en-

hanced generative capabilities, the model’s attention mechanism may be-

come dispersed across a wider range of information, hindering its ability to

focus on specific task details. For instance, when a prompt contains mul-

tiple elements, the model may lose focus, leading to missing details and

a decrease in image quality. We propose a reinforcement learning-based

image generation optimization framework that incorporates an aesthetic

feedback mechanism. By utilizing Token-Level policy gradient control

and assigning aesthetic weights through an aesthetic model, this frame-

work guides the model’s attention to focus on the target details, thereby

improving image quality. We refer to this method as Token-Level Prox-

imal Policy Optimization (TokenPPO). We demonstrate that, through

the application of TokenPPO, the aesthetic scores, human satisfaction,

and other evaluation metrics of the generated images show significant im-

provement.

Files

Name	Size	Download all
TokenPPO.pdf md5:cfcf0628db2a964802373d6c6c7a840b	4.1 MB	Preview Download