Published June 26, 2023 | Version v1
Journal article Open

Multimodal Deep Learning Architecture for Hindustani Raga Classification

  • 1. ROR icon Hellenic Mediterranean University
  • 2. ROR icon Ionian University
  • 1. ROR icon Hellenic Mediterranean University
  • 2. ROR icon Ionian University

Description

In this paper, our key aspect is the design of a deep learning architecture for the classification of Hindustani (classical North Indian music) ragas (music modes). In an attempt to address this task, we propose a modular deep learning architecture designed to process data from two modalities, comprising audio recordings and metadata. Our bipolar classifier utilizes convolutional and feed forward neural networks and incorporates spectral information of audio data and metadata descriptors tailored to the peculiar melodic characteristics of Hindustani music. In specific, audio recordings as well as manually annotated and automatically extracted metadata were utilized for audio samples of both Hindustani improvisations and compositions available in the Saraga open dataset of Indian art music. Experiments are conducted on two Hindustani ragas, namely Yaman and Bhairavi. Results indicate that the integration of multimodal data increases the classification accuracy of the classifier in comparison to simply using audio features. Additionally, for the specific task of raga classification the use of the swaragram feature, which is customized for Hindustani music, outperforms the effectiveness of audio features that are commonly used in Eurocentric music genres.

Files

P_3301.pdf

Files (592.1 kB)

Name Size Download all
md5:bb9269390b8ae67ae2884a73df311aed
592.1 kB Preview Download

Additional details

Identifiers

ISSN
2306-8515

Dates

Available
2023-06-26
Journal publication