4 Bars Monophonic Melodies Dataset (Pitch Sequence)
Creators
Description
This dataset is designed for applications in music information retrieval, algorithmic composition, and machine learning tasks involving symbolic music data and it consists in a collection of unique 4 bars monophonic melodies represented as MIDI pitch sequences and each accompanied by thirteen attributes obtained with computational methods. The dataset has been generated using the Resolv system's pipelines starting from the full version of the Lakh MIDI Dataset (a collection of 176,581 unique MIDI files) and it is possible to download it only upon request at the moment (just send an email). The full article of this work also contains all the details on how the attributes have been obtained and on the implementation of the pipelines used for the generation and here it is worth to point out that melodies have been quanized to 4 steps per quarter and only 4/4 time signatures have been considered, hence each melody consists of N = 64 steps where each step is a number in the range [21-108], the MIDI pitches available in a standard piano, or a token in the set {128, 129} for hold note and note off events respectively. No additional performance features (e.g., dynamics, duration, or timing) are included, making this dataset a purely pitch-based collection.
Three datasets (train, validation and test) are provided as TFRecord file divided into 8 shards that contain the data in the Tensorflow's SequenceExample format in which the feature_lists field contains the pitch sequence as a list of integers and the context field its attributes.
The table below shows the numbers of unique melodies contained in the three datasets.
Train | Validation | Test | |
Total unique melodies | 10,126,676 | 70,908 | 22,265 |
And here is the list of computed attributes for each melody:
Attribute Name | SequenceExample Context Key | Description |
Toussaint Metrical Complexity | toussaint |
A metric that measures the degree of syncopation in rhythm patterns. |
Note Density | note_density |
Measures the density of note onsets within the melody. |
Pitch Range | pitch_range |
An indicator of how wide or narrow the melody is in terms of its pitch content. |
Contour | contour |
Measures the degree to which the melody moves up or down. |
Note Change Ratio | note_change_ratio |
The number of note changes normalized to the total number of steps N. |
Dynamic Range | dynamic_range |
The difference between the maximum and minimum note velocities. |
Longest Repetitive Section | len_longest_rep_section |
The length of the longest repetitive section in the melody normalized to the total number of steps N. A repetitive section is defined as a note that consecutively repeats at least r = 4 times. |
Repetitive Section Ratio | repetitive_section_ratio |
The ratio between the total number of repetitive sections and a normalization factor N/r = 64/4 = 16. |
Hold Note Steps Ratio | ratio_hold_note_steps |
The ratio between the number steps where a note is hold and the total steps N. |
Note Off Steps Ratio | ratio_note_off_steps |
The ratio between the number steps where no note is played and the total steps N. |
Unique Notes Ratio | unique_notes_ratio |
The ratio of unique notes is defined with respect to the total number of MIDI pitches considered (88) and the total number of steps N. |
Unique Bigrams Ratio | unique_bigrams_ratio |
It is the ratio of the unique bigrams in the melody with respect to the total numbers of steps N. |
Unique Trigrams Ratio | unique_trigrams_ratio |
It is the ratio of the unique trigrams in the melody with respect to the total numbers of steps N. |
To access the content of a SequenceExample use the tf.io.parse_single_sequence_example, for instance:
tf.io.parse_single_sequence_example(
serialized_example,
context_features={
"toussaint": tf.io.FixedLenFeature([], dtype=tf.float32, default_value=0),
"note_density": tf.io.FixedLenFeature([], dtype=tf.float32, default_value=0),
},
sequence_features=["pitch_seq"]
)
Files
Additional details
Related works
- Is derived from
- Thesis: https://colinraffel.com/publications/thesis.pdf (URL)
Software
- Repository URL
- https://github.com/resolv-libs
- Programming language
- Python
- Development Status
- Wip