Published September 23, 2018 | Version v1
Conference paper Open

Main Melody Estimation with Source-Filter NMF and CRNN

Description

Estimating the main melody of a polyphonic audio recording remains a challenging task. We approach the task from a classification perspective and adopt a convolutional recurrent neural network (CRNN) architecture that relies on a particular form of pretraining by source-filter nonnegative matrix factorisation (NMF). The source-filter NMF decomposition is chosen for its ability to capture the pitch and timbre content of the leading voice/instrument, providing a better initial pitch salience than standard timefrequency representations. Starting from such a musically motivated representation, we propose to further enhance the NMF-based salience representations with CNN layers, then to model the temporal structure by an RNN network and to estimate the dominant melody with a final classification layer. The results show that such a system achieves state-of-the-art performance on the MedleyDB dataset without any augmentation methods or large training sets.

Files

273_Paper.pdf

Files (385.0 kB)

Name Size Download all
md5:a22be8b6b73b37fb1065c7c69e2a2407
385.0 kB Preview Download