Published February 3, 2025 | Version 1.0
Dataset Open

Pop-K: Augmented MIDI Dataset for Learning Constrained Modern Pop Melodies

Authors/Creators

Description

Pop-K MIDI Dataset

The Pop-K MIDI Dataset is an open collection of modern pop melodies developed for training and testing symbolic music models in a constrained musical domain. The dataset contains 305,815 files augmented from a base dataset of 8-bar vocal lead, chords, and bass melody tracks. An accompanying model trained on this dataset can be found on GitHub.

The dataset was created to evaluate how limited training data can be scaled via augmentation to efficiently train a model to generate a specific musical style. Additionally, the melodies were transposed to C major and A minor, with timing information normalized to 120 BPM at a 96-tick resolution. This results in a total duration of approximately 1360 hours of musical notation.

License

The Pop-K MIDI Dataset is licensed under the Creative Commons Attribution-NonCommercial (CC BY-NC) license. While efforts have been made to augment and transform the original melodies, some segments may still resemble the source material.

Files

Files (56.2 MB)

Name Size Download all
md5:25decb80e7c1e4395694e90aa6f39f76
56.2 MB Download

Additional details

Software