There is a newer version of the record available.

Published September 21, 2025 | Version v1
Conference paper Open

Tuning Matters: Analyzing Musical Tuning Bias in Neural Vocoders

Description

Vocoders, which reconstruct time-domain waveforms from spectral representations such as mel-spectrograms, are essential in modern music and speech synthesis. Traditional signal-processing techniques like the Griffin-Lim algorithm have largely been replaced by neural vocoders, which leverage generative models to achieve superior audio quality. However, these models can introduce artifacts and biases, potentially affecting their output in unforeseen ways. In this study, we examine how different musical tunings affect neural mel-to-audio vocoders within the context of Western music, where performances do not necessarily adhere to the modern 440 Hz standard tuning. As a key contribution, we evaluate several recent neural vocoders on datasets containing piano, violin, and singing voice recordings. Our results reveal that different vocoders exhibit distinct biases, causing deviation in tuning, and affecting waveform reconstruction quality in case of non-standard tuning. Our work underscores the need for improved vocoder robustness in music synthesis and provides insights for refining future models.

Files

000020.pdf

Files (1.8 MB)

Name Size Download all
md5:55027a233c422584b4f1d7e10af31c37
1.8 MB Preview Download