Published October 11, 2020
| Version v1
Conference paper
Open
Investigating U-Nets with various intermediate blocks for spectrogram-based singing voice separation
Authors/Creators
Description
Singing Voice Separation (SVS) tries to separate singing voice from a given mixed musical signal. Recently, many U-Net-based models have been proposed for the SVS task, but there were no existing works that evaluate and compare various types of intermediate blocks that can be used in the U-Net architecture. In this paper, we introduce a variety of intermediate spectrogram transformation blocks. We implement U-nets based on these blocks and train them on complex-valued spectrograms to consider both magnitude and phase. These networks are then compared on the SDR metric. When using a particular block composed of convolutional and fully-connected layers, it achieves state-of-the-art SDR on the MUSDB singing voice separation task by a large margin of 0.9 dB. Our code and models are available online.
Files
46.pdf
Files
(503.0 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:7a16d7947d2376d97268c0545ece243d
|
503.0 kB | Preview Download |