Published September 21, 2025
| Version v1
Conference paper
Open
Tuning Matters: Analyzing Musical Tuning Bias in Neural Vocoders
Authors/Creators
Description
Vocoders, which reconstruct time-domain waveforms from spectral representations such as mel-spectrograms, are essential in modern music and speech synthesis. Traditional signal-processing techniques like the Griffin-Lim algorithm have largely been replaced by neural vocoders, which leverage generative models to achieve superior audio quality. However, these models can introduce artifacts and biases, potentially affecting their output in unforeseen ways.
In this study, we examine how different musical tunings affect neural mel-to-audio vocoders within the context of Western music, where performances do not necessarily adhere to the modern 440 Hz standard tuning. As a key contribution, we evaluate several recent neural vocoders on datasets containing piano, violin, and singing voice recordings. Our results reveal that different vocoders exhibit distinct biases, causing deviation in tuning, and affecting waveform reconstruction quality in case of non-standard tuning.
Our work underscores the need for improved vocoder robustness in music synthesis and provides insights for refining future models.
Files
000020.pdf
Files
(1.8 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:55027a233c422584b4f1d7e10af31c37
|
1.8 MB | Preview Download |