There is a newer version of the record available.

Published September 20, 2024 | Version v1
Preprint Open

DCT-based Autoregressive Diffusion for Image Generation: A Novel Frequency Domain Approach

Authors/Creators

Description

This paper introduces a novel approach to image generation that operates directly in the frequency domain using Discrete Cosine Transform (DCT) coefficients. We present the Aggressor model, which combines a transformer architecture with a diffusion process tailored for DCT coefficients. Our method incorporates two key innovations: the application of the diffusion process to DCT coefficients instead of pixel values, and a decay-based loss weighting scheme that emphasizes lower frequency components during training. This approach aligns the learning process with the natural distribution of information in images, where lower frequencies typically carry more structural information. We demonstrate the efficacy of our model on the CIFAR-10 dataset, focusing on a single class for initial experiments. Results show promising image quality and structural coherence, suggesting potential advantages in capturing global image structures and computational efficiency. The inherent interpretability of DCT coefficients also offers insights into the generation process. Our method bridges classical frequency-domain techniques with modern deep learning approaches, opening new avenues for research in image generation. Furthermore, we discuss the potential extensions of this approach to video generation and audio processing, highlighting the versatility of our frequency-domain method across various multimedia domains. This work represents a significant step towards more efficient and interpretable generative models, with broad implications for multimedia processing and generative AI.

Files

aggressor.pdf

Files (72.7 kB)

Name Size Download all
md5:163da18819f233ac41ec04ba0ee2930a
72.7 kB Preview Download