Published August 5, 2025 | Version v1
Preprint Open

TokenPPO: Token-Level Reinforcement Learning for Diffusion Model Generation

  • 1. Independent Researcher

Description

With the increasing parameterization of diffusion-based image generation
models, the scope of prompts that can be processed has expanded, result-
ing in more diverse and complex generation tasks. However, this growth
introduces challenges related to attention distribution. Even with en-
hanced generative capabilities, the model’s attention mechanism may be-
come dispersed across a wider range of information, hindering its ability to
focus on specific task details. For instance, when a prompt contains mul-
tiple elements, the model may lose focus, leading to missing details and
a decrease in image quality. We propose a reinforcement learning-based
image generation optimization framework that incorporates an aesthetic
feedback mechanism. By utilizing Token-Level policy gradient control
and assigning aesthetic weights through an aesthetic model, this frame-
work guides the model’s attention to focus on the target details, thereby
improving image quality. We refer to this method as Token-Level Prox-
imal Policy Optimization (TokenPPO). We demonstrate that, through
the application of TokenPPO, the aesthetic scores, human satisfaction,
and other evaluation metrics of the generated images show significant im-
provement.

Files

TokenPPO.pdf

Files (4.1 MB)

Name Size Download all
md5:cfcf0628db2a964802373d6c6c7a840b
4.1 MB Preview Download

Additional details

Dates

Submitted
2025-08-05

Software

Programming language
Python