Published August 5, 2025
| Version v1
Preprint
Open
TokenPPO: Token-Level Reinforcement Learning for Diffusion Model Generation
Description
With the increasing parameterization of diffusion-based image generation
models, the scope of prompts that can be processed has expanded, result-
ing in more diverse and complex generation tasks. However, this growth
introduces challenges related to attention distribution. Even with en-
hanced generative capabilities, the model’s attention mechanism may be-
come dispersed across a wider range of information, hindering its ability to
focus on specific task details. For instance, when a prompt contains mul-
tiple elements, the model may lose focus, leading to missing details and
a decrease in image quality. We propose a reinforcement learning-based
image generation optimization framework that incorporates an aesthetic
feedback mechanism. By utilizing Token-Level policy gradient control
and assigning aesthetic weights through an aesthetic model, this frame-
work guides the model’s attention to focus on the target details, thereby
improving image quality. We refer to this method as Token-Level Prox-
imal Policy Optimization (TokenPPO). We demonstrate that, through
the application of TokenPPO, the aesthetic scores, human satisfaction,
and other evaluation metrics of the generated images show significant im-
provement.
Files
TokenPPO.pdf
Files
(4.1 MB)
Name | Size | Download all |
---|---|---|
md5:cfcf0628db2a964802373d6c6c7a840b
|
4.1 MB | Preview Download |
Additional details
Dates
- Submitted
-
2025-08-05