There is a newer version of the record available.

Published November 10, 2024 | Version v1
Dataset Open

Saraga Audiovisual: a large multimodal open data collection for the analysis of carnatic music

  • 1. ROR icon Pompeu Fabra University

Description

Carnatic music is a style of South Indian art music whose analysis using computational methods is an active area of research in Music Information Research (MIR). A core, open dataset for such analysis is the Saraga dataset, which includes multi-stem audio, expert annotations, and accompanying metadata. However, it has been noted that there are several limitations to the Saraga collections, and that additional relevant aspects of the tradition still need to be covered to facilitate musicologically important research lines. Saraga Audiovisual includes diverse renditions of Carnatic vocal performances, totalling 42 concerts and more than 60 hours of music. It includes video recordings for all concerts, allowing for a wide range of multimodal analyses. We also provide high-quality human pose estimation data of the musicians extracted from the video footage, and perform benchmarking experiments for the different modalities to validate the utility of the novel collection. Saraga Audiovisual, along with access tools and results of our experiments, is made available for research purposes.

Please cite the following publication if you use the material shared here in your research work.

Shankar A, Plaja-Roglans G, Nuttall T, Rocamora M, Serra X. Saraga Audiovisual: a large multimodal open data collection for the analysis of carnatic music. Paper presented at: 25th International Society for Music Information Retrieval Conference (ISMIR2024); 2024 November 10-14; San Francisco, USA. [Postprint PDF@MTG]

The dataset is organized into four separate files while maintaining the original directory structure:

  • saraga audio.zip – Contains all multi-track audio files along with their corresponding mixture files.

  • saraga gesture.zip – Includes pose estimation files extracted from videos corresponding to each audio track.

  • saraga metadata.zip – Provides metadata for all the audio files.

  • saraga video.zip – Features videos from three sample concerts. Due to size constraints, only these three concerts are included in this release. 

Mirdata

This dataset is included in mirdata. Use the following code snippet to access the dataset in mirdata.

# Import midata
import mirdata

# Initialize dataset
dataset_name = 'saraga_audiovisual'
data_home = 'mirdata/dataset'
dataset = mirdata.initialize(dataset_name, data_home=data_home)

# Download dataset
dataset.download()

# Validate dataset
dataset.validate()

# Load dataset as a dictionary with track ids as keys and track objects as values
data = dataset.load_tracks()

Contact

If you have any questions or comments about the dataset, please feel free to email:

mtg-info@upf.edu

Files

saraga audio.zip

Files (101.0 GB)

Name Size Download all
md5:bf0a36f7fb59098ec0d4d303b407ea73
65.7 GB Preview Download
md5:f58fe9e0c80760a0b31cb08b61624487
20.4 GB Preview Download
md5:4f3a8e919593aa2b71f7a0b81cc8cc00
680.1 kB Preview Download
md5:e411c0917e522c6ba527a9cfc8ee7e11
14.8 GB Preview Download

Additional details

Related works

Is original form of
Conference proceeding: 10.5281/zenodo.14877274 (DOI)

Dates

Accepted
2024-11-10

References

  • Shankar A, Plaja-Roglans G, Nuttall T, Rocamora M, Serra X. Saraga Audiovisual: a large multimodal open data collection for the analysis of carnatic music. Paper presented at: 25th International Society for Music Information Retrieval Conference (ISMIR2024); 2024 November 10-14; San Francisco, USA. [Postprint PDF@MTG]