CANTO-JRP Dataset: Audio Pitch Extractions from the Josquin Research Project
Description
The CANTO-JRP Dataset is based on compositions from the Josquin Research Project that were available on Spotify at the time this dataset was created. Due to copyright restrictions, the recordings are not publicly available. However, the dataset includes multiple f0 estimations (from various models), symbolic encodings, and metadata. CANTO-JRP is part of the CANTOSTREAM project. On Spotify, we created the CANTO-JRP playlist that is fully aligned with this dataset.
Please find a detailed description of the dataset in the README.md.
This dataset is described in our article:
Visscher, M., & Wiering, F. (2025). Fuzzy Frequencies: Finding Tonal Structures in Audio Recordings of Renaissance Polyphony. Heritage, 8(5), 164. https://doi.org/10.3390/heritage8050164
If you use this dataset for your reasearch please cite this paper.
Technical info (English)
General
This README.md file provides an overview of the CANTO-JRP Dataset, which is based on compositions from the Josquin Research Project, limited to those available on Spotify at the time the dataset was created. Due to copyright restrictions, the recordings themselves are not publicly available. Instead, the dataset includes multiple f0 estimations (from various models), symbolic encodings, and metadata.
Folder Structure
Each set of multiple f0 extractions is stored in its own folder. Due to their large size, these folders are compressed into separate tar.gz files. To use the data with the accompanying code from GitHub, download and extract the relevant folders into FuzzyFrequencies/data/raw/CANTO-JRP/. If you're only interested in the Multif0 extractions, you can download just the experiment_metadata.csv and the multif0.tar.gzip.
Folders and files in this dataset
- 195f
- 214c
- basicpitch
- MT3
- multif0
- symbolic
- experiment_metadata.csv
- README.md
Description of dataset items
This section provides an overview of the data in each folder, how the data should be used, and its purpose.
| Folder | File types | Source | number of files | format |
|---|---|---|---|---|
| 195f | Multipitch extractions, model 195f | Fuzzy Frequencies | 637 | CSV |
| 214c | Multipitch extractions, model 214c | Fuzzy Frequencies | 637 | CSV |
| basicpitch | Basicpitch extractions | Fuzzy Frequencies | 637 | CSV |
| MT3 | MT3 extractions | Fuzzy Frequencies | 172 | MIDI |
| multif0 | Multif0 extractions | Fuzzy Frequencies | 637 | CSV |
| symbolic | Symbolic encodings | Josquin Research Project (JRP) | 637 | MusicXML |
Note The number of MT3 extractions is lower than those from the other models. Due to the MT3 model’s lower performance on our dataset and the high computational cost, we only processed audio files smaller than approximately 10 MB.
File Types
Metadata
The file experiment_metadata.csv file contains information about each composition from the JRP that was available on Spotify at the time this dataset was created. This file serves both as a reference for users of the dataset and as a specification file for the GitHub code.
| Field | Format | Description |
|---|---|---|
| id | integer | row identifier |
| nr_playlist | string | position(s) in the playlist |
| composer | string | composer's surname |
| composition | string | name of the composition |
| voices | integer | number of voices |
| experiment | string | experiment name, needed for the code |
| performer | string | performer(s) of the recording |
| Album | string | album name of the recording |
| year_recording | integer | year of recording |
| audio_final | integer | MIDI tone of the lowest note of the final chord |
| symbolic_is_audio | string | extent to which recording and encoding are the same (yes, almost, no) |
| instrumentation | string | instrumentation v(ocal) and (i)nstrumental |
| instrumentation_category | string | category of instrumentation (vocal, instrumental, mixed) |
| final_safe | string | extent to which the audio final is the same as the (transposed) encoded final (yes, no, pitch class profile) |
| not repeated | string | whether there is repetition of the encoding in the recording (yes, no) |
| repetitions | string | rough specification of the repetitions |
| comments | string | extra comments, mainly instruments used |
| symbolic | string | file name of the symbolic encoding |
| audio | string | file name of the recording |
| mf0 | string | file name of the Multif0 extraction |
| basicpitch | string | file name of the Basicpitch extraction |
| multipitch_214c | string | file name of the Multipitch extraction, model 214c |
| multipitch_195f | string | file name of the Multipitch extraction, model 195f |
| MT3 | string | file name of the MT3 extraction |
Multipitch extraction
The Multipitch extractions include a column for each MIDI tone, with cell values representing the loudness of the pitch at a given timestamp.
| Field | Format | Description |
|---|---|---|
| [empty] | integer | timestamp index, sample rate = 43.06640625 |
| 1 | float | loudness of MIDI tone 1 + 24 = 25 |
| .. | .. | .. |
| 71 | float | loudness of MIDI tone 71 + 24 = 85 |
Basicpitch extraction
The Basicpitch extractions include a row for each detected note and its corresponding loudness.
| Field | Format | Description |
|---|---|---|
| start_time_s | float | start time of the pitch in seconds |
| end_time_s | float | end time of the pitch in seconds |
| pitch_midi | integer | MIDI tone of the pitch |
| velocity | integer | MIDI equivalent of loudness |
| pitch_bend | inteher | multiple columns of microtonal pitch deviations |
MT3 extraction
The MT3 extractions are provided in MIDI format (Musical Instrument Digital Interface). MIDI is an industry standard music technology protocol used to represent musical data and allow communication between musical devices. For more details, see the MIDI specifications.
Multif0 extraction
The Multif0 extractions do not have meaningful headers; the first column contains the timestamps, the subsequent columns contain 'voice' columns, without voice leading. By default, the leftmost voice column contains the lowest detected frequency.
| Field | Format | Description |
|---|---|---|
| 0.0 | float | timestamp in seconds, time sample rate = 86.1328125 |
| [empty] | float | frequency of the lowest voice at that time stamp, frequency sample rate = 20 cents |
| .. | .. | .. |
| [empty] | float | frequency of the highest voice at that time stamp, frequency sample rate = 20 cents |
Symbolic encoding
The symbolic encodings are provided in MusicXML format. For an introduction to this format, please see the MusicXML tutorial
Codebook
In this section, we specify for each file type how the data was collected or created.
For 611 out of the 902 works on the JRP website, usable recordings have been found on Spotify; these are collected in the Spotify playlist.
The Basicpitch extractions are created by applying the model by Bittner et al. (2022) [1] to the set of audio recordings.
The Multipitch extractions are created by applying the model by Weiß and Müller (2024) [4] to the set of audio recordings, with model 214c and 195f.
The MT3 extractions are created by extracting the audio files smaller than ~110 MB using the Colab notebook provided by Gardner et al (2022) [3].
The Multif0 extractions are created by applying the model by Cuesta et al. (2020) [2] to the audio files.
The symbolic encodings are downloaded from The Josquin Research Project
The files experiment_metadata.csv and README.md have been handcrafted by the first author.
Notes (English)
Files
experiment_metadata.csv
Files
(12.3 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:dace9c1d94767925494b9ea321a85165
|
6.0 GB | Download |
|
md5:3ca66b89fa879baef7bfd138e1b5c669
|
6.1 GB | Download |
|
md5:eb81b22e46cb9251f09fb59d6abba4f7
|
25.1 MB | Download |
|
md5:b7664d40e60e2379c47c2fc59b4726b2
|
285.0 kB | Preview Download |
|
md5:8cf29e4b6774f177af9ecb45057d0728
|
622.1 kB | Download |
|
md5:83682900e7a70fbe743b6f5df2fe9469
|
142.6 MB | Download |
|
md5:c9d3fa91f73ceb6286c55baa3a9ef2ac
|
12.9 kB | Preview Download |
|
md5:9209fcb0bd8cb1f52b77c4a369ee8cea
|
9.8 MB | Download |
Additional details
Software
- Repository URL
- https://github.com/MirjamVisscher/FuzzyFrequencies
References
- Bittner, R.M.; Bosch, J.J.; Rubinstein, D.; Meseguer-Brocal, G.; Ewert, S. A lightweight instrument-agnostic model for polyphonic note transcription and multipitch estimation. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Singapore, 2022.
- Cuesta, H.; McFee, B.; Gómez, E. Multiple f0 estimation in vocal ensembles using convolutional neural networks. In Proceedings of the International Society for Music Information Retrieval (ISMIR), Montréal, Canada, 2020.
- Gardner, J.P.; Simon, I.; Manilow, E.; Hawthorne, C.; Engel, J. MT3: Multi-task multitrack music transcription. In Proceedings of the International Conference on Learning Representations (ICLR), 2022.
- Weiß, C.; Müller, M. From music scores to audio recordings: Deep pitch-class representations for measuring tonal structures. ACM Journal on Computing and Cultural Heritage 2024.