Published November 7, 2021
| Version v1
Dataset
Open
Multi-modal dataset for music genre recognition based on six different modalities for LMD-aligned and SLAC datasets
Authors/Creators
- 1. TU Dortmund University, Department of Computer Science, Germany
- 2. Marianopolis College, Department of Liberal and Creative Arts, Canada
Description
Multi-modal dataset for music genre recognition based on six different modalities for the LMD-aligned [1] and SLAC [2] datasets. Further details are provided in [3].
Descriptions of files
| Link | Description |
|---|---|
| LMD-aligned_Filelist.arff | File list with 1575 music tracks selected from the LMD-aligned dataset [1] with tagtraum genre annotations [4] (only a subset of LMD-aligned is used, which includes only pieces for which all six modalities were accessible, and which includes only well-represented genres) |
| LMD-aligned_ExtractedFeatures.tar.gz | Raw audio signal and model-based features extracted with AMUSE [5] |
| LMD-aligned_ProcessedFeatures.tar.gz | Processed features: audio signal and model-based features aggregated for 4 s time frames with 2 s step size / all other features (see the table below) with the same values for all time frames |
| LMD-aligned_Datasets.tar.gz | Training, optimization, and test datasets for 3 splits for the recognition of 5 genres in [3] |
| SLAC_Filelist.arff | File list with 250 music tracks from the SLAC dataset [2] (genres and sub-genres are provided in the folder structure) |
| SLAC_ExtractedFeatures.tar.gz | Raw audio signal and model-based features extracted with AMUSE [5] |
| SLAC_ProcessedFeatures.tar.gz | Processed features: audio signal and model-based features aggregated for 4 s time frames with 2 s step size / all other features (see the table below) with the same values for all time frames |
| SLAC_Datasets.tar.gz | Training, optimization, and test datasets for 3 splits for the recognition of 5 genres and 10 sub-genres in [3] |
Modalities and feature sub-groups
| Modality | Sub-group |
Dimensions in processed features of LMD-aligned |
Dimensions in processed features of SLAC |
|---|---|---|---|
| Audio signal | Low-level | 1-524 | 1-524 |
| Audio signal | Semantic | 525-810 | 525-810 |
| Audio signal | Structural complexity | 811-908 | 811-908 |
| Model-based | Instruments | 909-1018 | 909-1018 |
| Model-based | Moods | 1019-1146 | 1019-1146 |
| Model-based | Various | 1147-1402 | 1147-1402 |
| Playlists | Genres | 1403-1973 | 1403-1973 |
| Playlists | Styles | 1974-1695 | 1974-1695 |
| Symbolic | Pitch | 1696-1757 | 1696-1757 |
| Symbolic | Melodic | 1758-1781 | 1758-1781 |
| Symbolic | Chords | 1782-1836 | 1782-1836 |
| Symbolic | Rhythm | 1837-1935 | 1837-1935 |
| Symbolic | Tempo | 1936-1963 | 1936-1963 |
| Symbolic | Instrument presence | 1964-2441 | 1964-2441 |
| Symbolic | Instruments | 2442-2456 | 2442-2456 |
| Symbolic | Texture | 2457-2480 | 2457-2480 |
| Symbolic | Dynamics | 2481-2484 | 2481-2484 |
| Album covers | SIFT | 2485-2584 | 2485-2584 |
| Lyrics | jLyrics descriptors | 2585-2603 | 2585-2671 |
| Lyrics | Bag-of-Words | 2604-2703 | |
| Lyrics | Doc2Vec | 2704-2803 |
Files
Files
(11.6 GB)
| Name | Size | |
|---|---|---|
|
md5:b12218bee9b89f59b3a5a618af15a2de
|
339.0 kB | Download |
|
md5:64605dfb088f07e84fcdb0a58471f64c
|
5.5 GB | Download |
|
md5:9302ccd86292c893c187df89c637484a
|
162.3 kB | Download |
|
md5:8757b54912eb795ead87bc90a468cc41
|
148.3 MB | Download |
|
md5:faa8cbf325129f52bdf31904938b12ff
|
85.4 kB | Download |
|
md5:b7991f385d2c4284b22f9c707f01ff7c
|
5.7 GB | Download |
|
md5:4c06a526e11bd93bddfe436b49e3e7ef
|
15.8 kB | Download |
|
md5:e3d716aa197904dec27f179bd8130693
|
247.8 MB | Download |
Additional details
References
- [1] Raffel, C. (2016). Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching. PhD thesis, Graduate School of Arts and Sciences, Columbia University
- [2] McKay, C., Burgoyne, J. A., Hockman, J., Smith, J. B. L., Vigliensoni, G., and Fujinaga, I. (2010). Evaluating the genre classification performance of lyrical features relative to audio, symbolic and cultural features. In Proc. of the 11th International Society for Music Information Retrieval Conference, ISMIR, pp. 213–218
- [3] Vatolkin, I. and McKay, C. (2022). Multi-Objective Investigation of Six Feature Source Types for Multi-Modal Music Classification. Transactions of the International Society for Music Information Retrieval, 5(1), pp.1–19
- [4] Schreiber, H. (2015). Improving genre annotations for the million song dataset. In Proceedings of the 16th International Society for Music Information Retrieval Conference, ISMIR, pp. 241–247
- [5] Vatolkin, I., Theimer, W. M., and Botteck, M. (2010). AMUSE (advanced music explorer) - A multitool framework for music data analysis. In Proc. of the 11th International Society for Music Information Retrieval Conference, ISMIR, pp. 33–38