Hyperparameters are all you need: Using five-step inference for an original diffusion model to generate images comparable to the latest distillation model
Description
The diffusion probability model is a state-of-the-art generative model that generates an image by applying a neural network iteratively. Moreover, this generation process is regarded as an algorithm solving a diffusion ordinary differential equation (ODE) or stochastic differential equation (SDE). Based on the analysis of the truncation error of the diffusion ODE and SDE, our study proposes a training-free algorithm that generates high-quality 512 x 512 and 1024 x 1024 images in eight steps, with flexible guidance scales. To the best of my knowledge, our algorithm is the first one that samples a 1024 x 1024 resolution image in 8 steps with an FID performance comparable to that of the latest distillation model, but without additional training. Meanwhile, our algorithm can also generate a 512 x 512 image in 8 steps, and its FID performance is better than the inference result using state-of-the-art ODE solver DMP++ 2m in 20 steps. The result of our algorithm in generating high-quality 512 x 512 images and 1024 x 1024 images with five-step and six-step inference is also comparable to the latest distillation model. Moreover, unlike most distillation algorithms, which achieve state-of-the-art FID performance by fixing the sampling guidance scale, and which sometimes cannot improve their performance by adding inference steps, our algorithm uses a flexible guidance scale on classifier-free guidance sampling. The increase in inference steps enhances its FID performance. Additionally, the algorithm can be considered a plug-in component compatible with most ODE solvers and latent diffusion models. Extensive experiments are performed using the COCO 2014, COCO 2017, and the LAION dataset. Specifically, we validate our eight-step image generation algorithm using the COCO 2014, COCO 2017, and LAION validation datasets with a 5.5 guidance scale and a 7.5 guidance scale, respectively. Furthermore, the FID performance of the image synthesis in 512 x 512 resolution with a 5.5 guidance scale is 15.7, 22.35, and 17.52, meaning it is comparable with the state-of-the-art ODE solver DPM++ in 20 steps, whose best FID performance is 17.3, 23.75, and 17.33, respectively. Further, it also outperforms the state-of-the-art AMED-plugin solver, whose FID performance is 19.07, 25.50, and 18.06. We also apply the algorithm in five-step inference without additional training, for which the best FID performance of our algorithm in COCO 2014, COCO 2017, and LAION is 19.18, 23.24, and 19.61, respectively, which is comparable to the performance of the state-of-the-art AMED Pulgin solver in eight steps, SDXL-turbo in four steps, and the state-of-the-art diffusion distillation model Flash Diffusion in five steps. Then, we validate our algorithm in synthesizing 1024 * 1024 images, whose FID performance in COCO 2014, COCO 2017, and LAION using eight-step inference is 17.84, 24.42, and 19.25, respectively. Thus, it outperforms the SDXL-lightning in eight steps, Flash DiffusionXL in eight steps, and DMD2 in four steps. Moreover, the FID performance of the six-step inference of our algorithm in the 1024 x 1024 image synthesis is 23, which only has a limited distance to the state-of-the-art distillation model mentioned above. We also use information theory to explain the advantage of our algorithm and why it achieves a strong FID performance.
Files
PRL_250916J908Y-final.pdf
Files
(12.2 MB)
Name | Size | Download all |
---|---|---|
md5:2c11ffab14ec8a480c208b2b985b5424
|
12.2 MB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/TheLovesOfLadyPurple/Hyperparameter-is-all-you-need
- Programming language
- Python
- Development Status
- Active