Published November 7, 2021 | Version v1
Conference paper Open

VOCANO: A note transcription framework for singing voice in polyphonic music

Description

High variability of singing voice and insufficiency of note event annotation present a huge bottleneck in singing voice transcription (SVT). In this paper, we present VOCANO, an open-source VOCAl NOte transcription framework built upon robust neural networks with multi-task and semi-supervised learning. Based on a state-of-the-art SVT method, we further consider virtual adversarial training (VAT), a semi-supervised learning (SSL) method for SVT on both clean and accompanied singing voice data, the latter being pre-processed using the singing voice separation (SVS) technique. The proposed framework outperforms the state of the arts on public benchmarks over a wide variety of evaluation metrics. The effects of the types of training models and the sizes of the unlabeled datasets on the performance of SVT are also discussed.

Files

000036.pdf

Files (662.6 kB)

Name Size Download all
md5:3649cec177dc05cffb96335d5e165e7f
662.6 kB Preview Download