Published October 7, 2021 | Version 1.0
Dataset Open


  • 1. Music Technology Group
  • 2. Max Planck Institute for Empirical Aesthetics


This repository contains the Saraga-Carnatic-Melody-Synth (SCMS), a dataset focusing on time-aligned and continuous vocal melody annotations for the Carnatic music tradition. The annotations have been compiled using an own implemented Analysis/Synthesis framework, and the input data for the framework is the multi-track audio of the Saraga Dataset.

This dataset may be used for the research on automatic melody extraction for the Carnatic music tradition.


Dataset contents

The dataset includes audio excerpts of 30 seconds of the mixtures, together with .csv vocal melody annotations and .lab activations. It also includes metadata files to split between train and test sets, and to relate each excerpt with its respective artists. It also includes relevant metadata about each concert: number of excerpts per concert, gender of the artists, and tonic of the concert. Note that artists and genre are equally distributed on the train and test splits and that data from single artists is never in both sets simultaneously.



This dataset is associated with the following paper:

Plaja-Roglans, G., Nuttall, T., Pearson, L., Serra, X. and Miron, M., 2023. Repertoire-Specific Vocal Pitch Data Generation for Improved Melodic Analysis of Carnatic Music. Transactions of the International Society for Music Information Retrieval, 6(1), p.13–26.DOI:

Check the paper out for further details, benchmarks, and experiments!


Files (24.0 GB)

Name Size Download all
24.0 GB Preview Download

Additional details


  • Plaja-Roglans, G., Nuttall, T., Pearson, L., Serra, X. and Miron, M., 2023. Repertoire-Specific Vocal Pitch Data Generation for Improved Melodic Analysis of Carnatic Music. Transactions of the International Society for Music Information Retrieval, 6(1), p.13–26.DOI: