DDSP-based Singing Vocoders: A New Subtractive-based Synthesizer and A Comprehensive Evaluation

Da-Yi Wu; Wen-Yi Hsiao; Fu-Rong Yang; Oscar D Friedman; Warren Jackson; Scott Bruzenak; Yi-Wen Liu; Yi-Hsuan Yang

doi:10.5281/zenodo.7316600

Published December 4, 2022 | Version v1

Conference paper Open

DDSP-based Singing Vocoders: A New Subtractive-based Synthesizer and A Comprehensive Evaluation

A vocoder is a conditional audio generation model that converts acoustic features such as mel-spectrograms into waveforms. Taking inspiration from Differentiable Digital Signal Processing (DDSP), we propose a new vocoder named SawSing for singing voices. SawSing synthesizes the harmonic part of singing voices by filtering a sawtooth source signal with a linear time-variant finite impulse response filter whose coefficients are estimated from the input mel-spectrogram by a neural network. As this approach enforces phase continuity, SawSing can generate singing voices without the phase-discontinuity glitch of many existing vocoders. Moreover, the source-filter assumption provides an inductive bias that allows SawSing to be trained on a small amount of data. Our evaluation shows that SawSing converges much faster and outperforms state-of-the-art generative adversarial network- and diffusion-based vocoders in a resource-limited scenario with only 3 training recordings and a 3-hour training time.

Files

000008.pdf

Files (1.4 MB)

Name	Size	Download all
000008.pdf md5:9b34f205b8e23f4884638a55133182fa	1.4 MB	Preview Download

198

Views

275

Downloads

Show more details

	All versions	This version
Views	198	197
Downloads	275	274
Data volume	425.0 MB	423.6 MB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

ISMIR

Imprint

Proceedings of the 23rd International Society for Music Information Retrieval Conference, 76-83. Bengaluru, India.

Conference

International Society for Music Information Retrieval Conference (ISMIR 2022) , Bengaluru, India, December 4-8, 2022

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: November 13, 2022
Modified: July 15, 2024

DDSP-based Singing Vocoders: A New Subtractive-based Synthesizer and A Comprehensive Evaluation

Authors/Creators

Description

Files

000008.pdf

Files (1.4 MB)