Dataset Open Access

Saraga: research datasets of Indian Art Music

Bozkurt, B.; Srinivasamurthy, A.; Gulati, S.; Serra, X.

Dataset introduction

This repository contains time aligned melody, rhythm, and structural annotations for two large open corpora of Indian Art Music (Carnatic and Hindustani music).

The repository contains Carnatic and Hindustani collections in separated zip files, and each collection is organized by songs grouped by artist concerts/live performances. This organization follows the structure generated by downloading the data using the scripts available at the dataset Github repository:

Moreover, there is a part of the Carnatic collection, 168 tracks to be specific, that counts with multitrack audio files apart from the mix audio. The considered instruments are: Ghatam, Mridangam, Violin, Voice and Secondary Voice.


Annotations in the dataset

Section and tempo annotations stored as start and end timestamps together with the name of the section and tempo during the section (in a separate file). Sama annotations referring to rhythmic cycle boundaries stored as timestamps. Phrase annotations stored as timestamps and transcription of the phrases using solfège symbols ({S, r, R, g, G, m, M, P, d, D, n, N}). Audio features automatically extracted and stored: pitch and tonic.

For more information about the dataset tracks and annotations, please refer to the Saraga website:


Using this dataset

We are interested in knowing if you find our datasets useful! If you use our dataset please email us at and tell us about your research.

*Please note that you can also use this dataset through the MIRDATA library (, where this dataset is in the list of available datasets.

Files (18.5 GB)
Name Size
14.4 GB Download
4.1 GB Download
All versions This version
Views 878145
Downloads 41012
Data volume 3.4 TB152.0 GB
Unique views 756112
Unique downloads 25211


Cite as