Multi-instrument Music Synthesis with Spectrogram Diffusion

Curtis Hawthorne; Ian Simon; Adam Roberts; Neil Zeghidour; Joshua Gardner; Ethan Manilow; Jesse Engel

doi:10.5281/zenodo.7316734

Published December 4, 2022 | Version v1

Conference paper Open

Multi-instrument Music Synthesis with Spectrogram Diffusion

An ideal music synthesizer should be both interactive and expressive, generating high-fidelity audio in realtime for arbitrary combinations of instruments and notes. Recent neural synthesizers have exhibited a tradeoff between domain-specific models that offer detailed control of only specific instruments, or raw waveform models that can train on any music but with minimal control and slow generation. In this work, we focus on a middle ground of neural synthesizers that can generate audio from MIDI sequences with arbitrary combinations of instruments in realtime. This enables training on a wide range of transcription datasets with a single model, which in turn offers note-level control of composition and instrumentation across a wide range of instruments. We use a simple two-stage process: MIDI to spectrograms with an encoder-decoder Transformer, then spectrograms to audio with a generative adversarial network (GAN) spectrogram inverter. We compare training the decoder as an autoregressive model and as a Denoising Diffusion Probabilistic Model (DDPM) and find that the DDPM approach is superior both qualitatively and as measured by audio reconstruction and Fréchet distance metrics. Given the interactivity and generality of this approach, we find this to be a promising first step towards interactive and expressive neural synthesis for arbitrary combinations of instruments and notes.

Files

000072.pdf

Files (2.8 MB)

Name	Size	Download all
000072.pdf md5:54dfd76fe5db207ceaebbdcd938564cb	2.8 MB	Preview Download

Views

139

Downloads

Show more details

	All versions	This version
Views	96	94
Downloads	139	136
Data volume	432.0 MB	423.4 MB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

ISMIR

Imprint

Proceedings of the 23rd International Society for Music Information Retrieval Conference, 598-607. Bengaluru, India.

Conference

International Society for Music Information Retrieval Conference (ISMIR 2022) , Bengaluru, India, December 4-8, 2022

Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: November 13, 2022
Modified: July 15, 2024

Multi-instrument Music Synthesis with Spectrogram Diffusion

Creators

Description

Files

000072.pdf

Files (2.8 MB)