Published September 23, 2018
| Version v1
Conference paper
Open
Main Melody Estimation with Source-Filter NMF and CRNN
Creators
Description
Estimating the main melody of a polyphonic audio recording remains a challenging task. We approach the task from a classification perspective and adopt a convolutional recurrent neural network (CRNN) architecture that relies on a particular form of pretraining by source-filter nonnegative matrix factorisation (NMF). The source-filter NMF decomposition is chosen for its ability to capture the pitch and timbre content of the leading voice/instrument, providing a better initial pitch salience than standard timefrequency representations. Starting from such a musically motivated representation, we propose to further enhance the NMF-based salience representations with CNN layers, then to model the temporal structure by an RNN network and to estimate the dominant melody with a final classification layer. The results show that such a system achieves state-of-the-art performance on the MedleyDB dataset without any augmentation methods or large training sets.
Files
273_Paper.pdf
Files
(385.0 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:a22be8b6b73b37fb1065c7c69e2a2407
|
385.0 kB | Preview Download |