Saraga Audiovisual: a large multimodal open data collection for the analysis of carnatic music
Description
Carnatic music is a style of South Indian art music whose analysis using computational methods is an active area of research in Music Information Research (MIR). A core, open dataset for such analysis is the Saraga dataset, which includes multi-stem audio, expert annotations, and accompanying metadata. However, it has been noted that there are several limitations to the Saraga collections, and that additional relevant aspects of the tradition still need to be covered to facilitate musicologically important research lines. Saraga Audiovisual includes diverse renditions of Carnatic vocal performances, totalling 42 concerts and more than 60 hours of music. It includes video recordings for all concerts, allowing for a wide range of multimodal analyses. We also provide high-quality human pose estimation data of the musicians extracted from the video footage, and perform benchmarking experiments for the different modalities to validate the utility of the novel collection. Saraga Audiovisual, along with access tools and results of our experiments, is made available for research purposes.
Please cite the following publication if you use the material shared here in your research work.
Shankar A, Plaja-Roglans G, Nuttall T, Rocamora M, Serra X. Saraga Audiovisual: a large multimodal open data collection for the analysis of carnatic music. Paper presented at: 25th International Society for Music Information Retrieval Conference (ISMIR2024); 2024 November 10-14; San Francisco, USA. [Postprint PDF@MTG]
The dataset is organized into four separate files while maintaining the original directory structure:
-
saraga audio.zip – Contains all multi-track audio files along with their corresponding mixture files.
-
saraga gesture.zip – Includes pose estimation files extracted from videos corresponding to each audio track.
-
saraga metadata.zip – Provides metadata for all the audio files.
-
saraga video.zip – Features videos from three sample concerts. Due to size constraints, only these three concerts are included in this release.
Mirdata
This dataset is included in mirdata. Use the following code snippet to access the dataset in mirdata.
# Import midata
import mirdata
# Initialize dataset
dataset_name = 'saraga_audiovisual'
data_home = 'mirdata/dataset'
dataset = mirdata.initialize(dataset_name, data_home=data_home)
# Download dataset
dataset.download()
# Validate dataset
dataset.validate()
# Load dataset as a dictionary with track ids as keys and track objects as values
data = dataset.load_tracks()
Contact
If you have any questions or comments about the dataset, please feel free to email:
Files
saraga audio.zip
Additional details
Related works
- Is original form of
- Conference proceeding: 10.5281/zenodo.14877274 (DOI)
Dates
- Accepted
-
2024-11-10
References
- Shankar A, Plaja-Roglans G, Nuttall T, Rocamora M, Serra X. Saraga Audiovisual: a large multimodal open data collection for the analysis of carnatic music. Paper presented at: 25th International Society for Music Information Retrieval Conference (ISMIR2024); 2024 November 10-14; San Francisco, USA. [Postprint PDF@MTG]