Attention-Based Predominant Instruments Recognition in Polyphonic Music
Creators
- 1. College of Engineering Trivandrum, APJ Abdul Kalam Technological University, Trivandrum, Kerala, India
Description
Predominant instrument recognition in polyphonic music is addressed using the score-level fusion of two visual representations, namely, Mel-spectrogram and modgdgram. Modgdgram, a visual representation is obtained by stacking modified group delay functions of consecutive frames successively. Convolutional neural networks (CNN) with an attention mechanism, learn the distinctive local characteristics and classify the instrument to the group where it belongs. The proposed system is systematically evaluated using the IRMAS dataset with eleven classes. We train the network using fixed-length singlelabeled audio excerpts and estimate the predominant instruments from variable-length audio recordings. A wave generative adversarial network (WaveGAN) architecture is also employed to generate audio files for data augmentation. The proposed system reports a micro and macro F1 score of 0.65 and 0.60, respectively, which is 20.37% and 27.66% higher than those obtained by the state-of-the-art Han model. The experiments demonstrate the potential of CNN with attention mechanism on Mel-spectro/modgdgram fusion framework for the task of predominant instrument recognition.
Files
SMC_2021_paper_56.pdf
Files
(1.1 MB)
Name | Size | Download all |
---|---|---|
md5:ad39a80cfcb71577566983b8e54296f1
|
1.1 MB | Preview Download |