Published November 7, 2021
| Version v1
Conference paper
Open
VOCANO: A note transcription framework for singing voice in polyphonic music
Creators
Description
High variability of singing voice and insufficiency of note event annotation present a huge bottleneck in singing voice transcription (SVT). In this paper, we present VOCANO, an open-source VOCAl NOte transcription framework built upon robust neural networks with multi-task and semi-supervised learning. Based on a state-of-the-art SVT method, we further consider virtual adversarial training (VAT), a semi-supervised learning (SSL) method for SVT on both clean and accompanied singing voice data, the latter being pre-processed using the singing voice separation (SVS) technique. The proposed framework outperforms the state of the arts on public benchmarks over a wide variety of evaluation metrics. The effects of the types of training models and the sizes of the unlabeled datasets on the performance of SVT are also discussed.
Files
000036.pdf
Files
(662.6 kB)
Name | Size | Download all |
---|---|---|
md5:3649cec177dc05cffb96335d5e165e7f
|
662.6 kB | Preview Download |