DCT-based Autoregressive Diffusion for Image Generation: A Novel Frequency Domain Approach

Albers, Josef

doi:10.5281/zenodo.13819513

Published September 20, 2024 | Version v1

Preprint Open

DCT-based Autoregressive Diffusion for Image Generation: A Novel Frequency Domain Approach

Albers, Josef

This paper introduces a novel approach to image generation that operates directly in the frequency domain using Discrete Cosine Transform (DCT) coefficients. We present the Aggressor model, which combines a transformer architecture with a diffusion process tailored for DCT coefficients. Our method incorporates two key innovations: the application of the diffusion process to DCT coefficients instead of pixel values, and a decay-based loss weighting scheme that emphasizes lower frequency components during training. This approach aligns the learning process with the natural distribution of information in images, where lower frequencies typically carry more structural information. We demonstrate the efficacy of our model on the CIFAR-10 dataset, focusing on a single class for initial experiments. Results show promising image quality and structural coherence, suggesting potential advantages in capturing global image structures and computational efficiency. The inherent interpretability of DCT coefficients also offers insights into the generation process. Our method bridges classical frequency-domain techniques with modern deep learning approaches, opening new avenues for research in image generation. Furthermore, we discuss the potential extensions of this approach to video generation and audio processing, highlighting the versatility of our frequency-domain method across various multimedia domains. This work represents a significant step towards more efficient and interpretable generative models, with broad implications for multimedia processing and generative AI.

Files

aggressor.pdf

Files (72.7 kB)

Name	Size	Download all
aggressor.pdf md5:163da18819f233ac41ec04ba0ee2930a	72.7 kB	Preview Download

	All versions	This version
Views	216	129
Downloads	225	196
Data volume	18.0 MB	15.6 MB

DCT-based Autoregressive Diffusion for Image Generation: A Novel Frequency Domain Approach

Authors/Creators

Description

Files

aggressor.pdf

Files (72.7 kB)