There is a newer version of the record available.

Published July 16, 2025 | Version 1.1
Dataset Open

mshoxxDB - a Versioned Dataset for Electronic Music

Authors/Creators

Description

This dataset was presented as a Late Breaking Demo at ISMIR 2024 in San Francisco, CA, including the paper (as an extended abstract), poster, and demo video. It was initially studied in this EURASIP article.

Description
mshoxxDB is an open-source dataset for research in the field of Music Information Retrieval (MIR), with a focus on Electronic Music. It was created by Michael Taenzer in the Reason Studios digital audio workstation (DAW). The dataset provides comprehensively annotated music audio data for a genre that has received comparatively limited attention in MIR research. With its combination of diverse synthetic timbres, acoustic and traditional classical instruments, and multitrack material, it supports tasks such as instrument detection, multi-pitch estimation, and source separation, beat detection, and tempo estimation. It is particularly well suited for evaluating instrument-agnostic methods and model generalization. The music covers several sub-genres of Electronic Music, e.g. video game, 8-bit (chiptune), EDM, pop, house, and chillout/dreamy styles.

Contents
- 18 full-length pieces of music, 61 minutes of audio in total
- mixtures and multitrack stems in FLAC format (44.1 kHz, 16-bit, mono, compression level 6)
- track-level MIDI files
- CSV metadata including genre, tempo, time signature, and artist information
- ms12 and ms14 dataset splits in JSON format, as described in the initial study (see above)

Technical Properties
Not all mixtures are exact sums of their corresponding multitrack stems. Some mixtures may contain additional processing in the form of limiters and compression, e.g. applied to the full mix or through side-chain compression between tracks.
No harmonic effects were added onto the mixtures, such as reverb, echo, or delay, as these would introduce additional harmonic content, resulting in mismatches between MIDI and audio.

Demo Page & Repository
A demo page with selected listening examples is available on GitHub Pages: https://mic-tae.github.io/mshoxxdb/. The mshoxxDB repository is located at https://github.com/mic-tae/mshoxxdb. The canonical archived release of mshoxxDB is hosted on Zenodo. The GitHub repository and demo page provide supplementary documentation, examples, and project-related resources.

License
All contents are distributed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license (CC BY-NC-SA 4.0). See the LICENSE file for details.

Citation
Should you use this dataset in your work, please cite it the following way (bibtex):

@misc {taenzer:mshoxxDB:2024,
  author = {Taenzer, Michael},
  title = {{mshoxxDB - a Versioned Dataset for Electronic Music}},
  booktitle = {{Late-Breaking and Demo Session of the 25th International Conference on Music Information Retrieval (ISMIR)}},
  address = {{San Francisco, CA, USA}},
  year = {2024},
}

Future versions
Future versions of mshoxxDB may include additional music, segmentation annotations for each piece, and possibly stereo audio data.

Community
Contributions to this dataset are welcome in all forms, e.g. by adding new music, annotations, or other suggestions that could help improve mshoxxDB.

Changelog

Version 1.1 (16 July 2025)
- all files now reflect main dataset version number v1 (previous numbers referred to internal track session numbers)
- removed umlaut from “Güte” --> “Guete”
- added ms12 and ms14 dataset splits (JSON files), a LICENSE file, and a README file

Version 1.0 (9 August 2024)
- initial release

Files

mshoxxDB_v1.1.zip

Files (629.6 MB)

Name Size Download all
md5:87a3fa9f9c940d66c55a4f0fba43d195
629.6 MB Preview Download

Additional details

Related works

Is described by
Journal article: 10.1186/s13636-025-00398-2 (DOI)

Dates

Available
2024-08-09
Initial Release
Updated
2025-07-16
Version 1.1

Software

Repository URL
https://github.com/mic-tae/mshoxxdb
Development Status
Wip