Multi-modal dataset for music genre recognition based on six different modalities for LMD-aligned and SLAC datasets

Igor Vatolkin; Cory McKay

doi:10.5281/zenodo.5651429

Published November 7, 2021 | Version v1

Dataset Open

Multi-modal dataset for music genre recognition based on six different modalities for LMD-aligned and SLAC datasets

1. TU Dortmund University, Department of Computer Science, Germany
2. Marianopolis College, Department of Liberal and Creative Arts, Canada

Multi-modal dataset for music genre recognition based on six different modalities for the LMD-aligned [1] and SLAC [2] datasets. Further details are provided in [3].

Descriptions of files

Link	Description
LMD-aligned_Filelist.arff	File list with 1575 music tracks selected from the LMD-aligned dataset [1] with tagtraum genre annotations [4] (only a subset of LMD-aligned is used, which includes only pieces for which all six modalities were accessible, and which includes only well-represented genres)
LMD-aligned_ExtractedFeatures.tar.gz	Raw audio signal and model-based features extracted with AMUSE [5]
LMD-aligned_ProcessedFeatures.tar.gz	Processed features: audio signal and model-based features aggregated for 4 s time frames with 2 s step size / all other features (see the table below) with the same values for all time frames
LMD-aligned_Datasets.tar.gz	Training, optimization, and test datasets for 3 splits for the recognition of 5 genres in [3]
SLAC_Filelist.arff	File list with 250 music tracks from the SLAC dataset [2] (genres and sub-genres are provided in the folder structure)
SLAC_ExtractedFeatures.tar.gz	Raw audio signal and model-based features extracted with AMUSE [5]
SLAC_ProcessedFeatures.tar.gz	Processed features: audio signal and model-based features aggregated for 4 s time frames with 2 s step size / all other features (see the table below) with the same values for all time frames
SLAC_Datasets.tar.gz	Training, optimization, and test datasets for 3 splits for the recognition of 5 genres and 10 sub-genres in [3]

Modalities and feature sub-groups

Modality	Sub-group	Dimensions in processed features of LMD-aligned	Dimensions in processed features of SLAC
Audio signal	Low-level	1-524	1-524
Audio signal	Semantic	525-810	525-810
Audio signal	Structural complexity	811-908	811-908
Model-based	Instruments	909-1018	909-1018
Model-based	Moods	1019-1146	1019-1146
Model-based	Various	1147-1402	1147-1402
Playlists	Genres	1403-1973	1403-1973
Playlists	Styles	1974-1695	1974-1695
Symbolic	Pitch	1696-1757	1696-1757
Symbolic	Melodic	1758-1781	1758-1781
Symbolic	Chords	1782-1836	1782-1836
Symbolic	Rhythm	1837-1935	1837-1935
Symbolic	Tempo	1936-1963	1936-1963
Symbolic	Instrument presence	1964-2441	1964-2441
Symbolic	Instruments	2442-2456	2442-2456
Symbolic	Texture	2457-2480	2457-2480
Symbolic	Dynamics	2481-2484	2481-2484
Album covers	SIFT	2485-2584	2485-2584
Lyrics	jLyrics descriptors	2585-2603	2585-2671
Lyrics	Bag-of-Words	2604-2703
Lyrics	Doc2Vec	2704-2803

Files

Files (11.6 GB)

Name	Size
LMD-aligned_Datasets.tar.gz md5:b12218bee9b89f59b3a5a618af15a2de	339.0 kB	Download
LMD-aligned_ExtractedFeatures.tar.gz md5:64605dfb088f07e84fcdb0a58471f64c	5.5 GB	Download
LMD-aligned_Filelist.arff md5:9302ccd86292c893c187df89c637484a	162.3 kB	Download
LMD-aligned_ProcessedFeatures.tar.gz md5:8757b54912eb795ead87bc90a468cc41	148.3 MB	Download
SLAC_Datasets.tar.gz md5:faa8cbf325129f52bdf31904938b12ff	85.4 kB	Download
SLAC_ExtractedFeatures.tar.gz md5:b7991f385d2c4284b22f9c707f01ff7c	5.7 GB	Download
SLAC_Filelist.arff md5:4c06a526e11bd93bddfe436b49e3e7ef	15.8 kB	Download
SLAC_ProcessedFeatures.tar.gz md5:e3d716aa197904dec27f179bd8130693	247.8 MB	Download

Additional details

[1] Raffel, C. (2016). Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching. PhD thesis, Graduate School of Arts and Sciences, Columbia University
[2] McKay, C., Burgoyne, J. A., Hockman, J., Smith, J. B. L., Vigliensoni, G., and Fujinaga, I. (2010). Evaluating the genre classification performance of lyrical features relative to audio, symbolic and cultural features. In Proc. of the 11th International Society for Music Information Retrieval Conference, ISMIR, pp. 213–218
[3] Vatolkin, I. and McKay, C. (2022). Multi-Objective Investigation of Six Feature Source Types for Multi-Modal Music Classification. Transactions of the International Society for Music Information Retrieval, 5(1), pp.1–19
[4] Schreiber, H. (2015). Improving genre annotations for the million song dataset. In Proceedings of the 16th International Society for Music Information Retrieval Conference, ISMIR, pp. 241–247
[5] Vatolkin, I., Theimer, W. M., and Botteck, M. (2010). AMUSE (advanced music explorer) - A multitool framework for music data analysis. In Proc. of the 11th International Society for Music Information Retrieval Conference, ISMIR, pp. 33–38

	All versions	This version
Views	855	853
Downloads	665	665
Data volume	953.2 GB	953.2 GB

Multi-modal dataset for music genre recognition based on six different modalities for LMD-aligned and SLAC datasets

Authors/Creators

Description

Files

Files (11.6 GB)

Additional details

References