Software Open Access

MTG/otmm_makam_recognition_dataset: Ottoman-Turkish Makam Music Makam Recognition Dataset

Sertan Şentürk; Altuğ Karakurt; Hasan Sercan Atlı

OTMM Makam Recognition Dataset

This repository hosts the dataset designed to test makam recognition methodologies on Ottoman-Turkish makam music. It is composed of 50 recordings from each of the 20 most common makams in CompMusic Project's Dunya Ottoman-Turkish Makam Music collection. Currently, the dataset is the largest makam recognition dataset.

Please cite the publication below if you use this dataset in your work:

Karakurt, A., Şentürk S., & Serra X. (2016). MORTY: A Toolbox for Mode Recognition and Tonic Identification. 3rd International Digital Libraries for Musicology Workshop. New York, USA

The recordings are selected from commercial recordings carefully such that they cover diverse musical forms, vocal/instrumentation settings, and recording qualities (e.g. historical vs contemporary). Each recording in the dataset is identified by a 16-character long unique identifier called MBID, hosted in MusicBrainz. The makam and the tonic of each recording are annotated in the file annotations.json.

The audio-related data in the test dataset is organized by each makam in the folder data. Due to copyright reasons, we are unable to distribute the audio. Instead, we provide the predominant melody of each recording, computed by a state-of-the-art predominant melody extraction algorithm optimized for OTMM culture. These features are saved as text files (with the paths data/[makam]/[mbid].pitch) of a single column that contains the frequency values. The timestamps are removed to reduce the filesizes. The step size of the pitch track is 0.0029 seconds (an analysis window of 128 samples hop size of an mp3 with 44100 Hz sample rate), with which one can recompute the timestamps of samples.

Moreover, the metadata of each recording is available in the repository, crawled from MusicBrainz using an open source tool developed by us. The metadata files are saved as data/[makam]/[mbid].json.

For reproducibility purposes, we note the version of all tools we have used to generate this dataset in the file algorithms.json.

A complementary toolbox for this dataset is MORTY, which is a mode recognition and tonic identification toolbox. It can be used and optimized for any modal music culture. Further details are explained in the publication above.

For more information, please contact the authors.

Errata
  • April 2020: We replaced 2 recordings, which do not exist in CompMusic Dunya makam corpus, with their instrumental versions. We also patched the dunya_uid of a recording, which is a redirection to the MusicBrainz ID. None of the annotations has changed. (PR #1)
  • November 2016: We discovered several discrepancies in the tonic annotations while merging human and machine annotations to create the otmm_tonic_dataset. Please refer to the repo for further explanation. We advise to use the tonic annotations in otmm_tonic_dataset.
License

<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.

Files (101.0 MB)
Name Size
MTG/otmm_makam_recognition_dataset-dlfm2016-fix1.zip
md5:83724c889d36f684cff3f15f20ce0d34
101.0 MB Download
386
52
views
downloads
All versions This version
Views 38628
Downloads 521
Data volume 5.2 GB101.0 MB
Unique views 33823
Unique downloads 461

Share

Cite as