Published September 20, 2024
| Version v1
Preprint
Open
DCT-based Autoregressive Diffusion for Image Generation: A Novel Frequency Domain Approach
Authors/Creators
Description
This paper introduces a novel approach to image generation that operates directly in the frequency domain using Discrete Cosine Transform (DCT) coefficients. We present the Aggressor model, which combines a transformer architecture with a diffusion process tailored for DCT coefficients. Our method incorporates two key innovations: the application of the diffusion process to DCT coefficients instead of pixel values, and a decay-based loss weighting scheme that emphasizes lower frequency components during training. This approach aligns the learning process with the natural distribution of information in images, where lower frequencies typically carry more structural information. We demonstrate the efficacy of our model on the CIFAR-10 dataset, focusing on a single class for initial experiments. Results show promising image quality and structural coherence, suggesting potential advantages in capturing global image structures and computational efficiency. The inherent interpretability of DCT coefficients also offers insights into the generation process. Our method bridges classical frequency-domain techniques with modern deep learning approaches, opening new avenues for research in image generation. Furthermore, we discuss the potential extensions of this approach to video generation and audio processing, highlighting the versatility of our frequency-domain method across various multimedia domains. This work represents a significant step towards more efficient and interpretable generative models, with broad implications for multimedia processing and generative AI.
Files
aggressor.pdf
Files
(72.7 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:163da18819f233ac41ec04ba0ee2930a
|
72.7 kB | Preview Download |