Cadenza Challenge (CAD2): databases for rebalancing classical music task

Cox, Trevor John; Roa Dabike, Gerardo

doi:10.5281/zenodo.12664932

Published July 10, 2024 | Version 1.0.0

Dataset Restricted

Cadenza Challenge (CAD2): databases for rebalancing classical music task

1. University of Salford

Contributors

Data collector:

Miller, Alex²

Data curators:

1. University of Sheffield
2. University of Salford
3. University of Nottingham
4. University of Leeds

A new version has been published.

Please ensure you are working with the last version.

Cadenza

This is the training and validation data for the rebalancing classic music task from the Second Cadenza Machine Learning Challenge (CAD2).

The Cadenza Challenges are improving music production and processing for people with a hearing loss. According to The World Health Organization, 430 million people worldwide have a disabling hearing loss. Hearing aid users report several issues when listening to music, including distortion in the bass, difficulties in perceiving the full range of the music, especially high-frequency pitches, and a tendency to miss the impact of quieter parts of compositions [1]. In a pilot study, we found giving listeners sliders to allow them to rebalance different instruments in a classical music ensemble was desirable.

Overview of files:

CadenzaWoodwind. Synthesized dataset of small ensembles of woodwind instruments for training and validation.
EnsembleSetSmall. A subset of the synthesised EnsembleSet [7] for training and validation.
Real Data for Tuning: Stereo_Reverb_Real_Data_For_Tuning.zip.
metadata,zip contains audiograms, scene details, target gains and compressor settings.

The audio files are in FLAC format in the ,zip archives. The json files contain metadata.

More details below.

Methods

1) CadenzaWoodwind

This is a dataset of five woodwind instruments (flute, clarinet, oboe, alto saxophone and bassoon) created for two quartet orchestrations: (a) flute, clarinet, oboe, and bassoon and (b) flute, alto saxophone, oboe and bassoon. The stems for each solo instrument are presented along with the two mixtures.

The scores for the dataset came from the OpenScore String Quartet Corpus [6]. 21 of these were selected at random. The dice was weighted according to the length of the pieces so that longer scores with multiple movements were more likely to be chosen. This was done because there were manual steps in the subsequent rendering process, and having many small scores would make the process much longer. A maximum of two pieces per composer were chosen. Two of the scores would not render in the music notation software and so were excluded. The 19 scores selected are shown in CadenzaWoodwind_openscores_selected.txt.

The scores were interpretted using Dorico music notation software. The four string parts were allocated to flute, oboe, clarinet and bassoon and a professional sample library used to create the audio. Reverberation was added using a convolution reverb, using an impulse response from the Royal Tropical Institute, Amsterdam in Avid's Space Impulse Response IR Library.

2) EnsembleSetSmall

A subset of the synthesised EnsembleSet [7] for Mix_1 stereo version provided to reduce download size. Please see the EnsembleSet page and paper for full details about methods.

3) Real Data for Tuning

Stereo_Reverb_Real_Data_For_Tuning.zip is a small sample of real recordings that are intended to help entrants deal with the mismatch between the synthetic training data and the real recorded evaluation set. It was generated from:

Bach10 Dataset [2]
URMP Dataset [3]

Both databases have mono recordings of isolated instruments in anechoic conditions. We have taken these and created stereo versions in small halls using convolution reverb. This used ambisonic b-format impulse responses from the Openair Database [4].

Four small to medium sized venue impulse responses were chosen: Arthur Sykes Rymer Auditorium, University of York; Central Hall, University of York; The Dixon Studio Theatre, University of York, and York Guildhall Council Chamber. Only impulse responses measured with a source-receiver distance greater than 5m were included.

The following procesure was used to convert the b-format to stereo. For each instrument.

The b-format representation was rotated in azimuth to face the instrument being rendered. The instruments were spaced at 10 degrees apart.
The b-format was convered to mid-side stereo using Eqn 1.7 from reference [5] (alpha=0.5).
The mid-side stereo impulse responses were convolved with the anechoic mono music recordings.
The mix was created by summing all the audio for the instruments in the ensemble.
The mix was then normalised so the peak absolute sample value was 1. The scaling factor used to do this was then applied to all the solo instruments audio tracks.

The readme.txt files in each folder give the filename of the impulse response used.

code_to_create_urmp_bach10_stereo_with_reverb.zip contains the MATLAB code used to create the dataset.

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Additional details

UK Research and Innovation
EnhanceMusic: Machine Learning Challenges to Revolutionise Music Listening for People with Hearing Loss EP/W019434/1

Created: 2024-07-05

Repository URL: https://github.com/claritychallenge/clarity
Programming language: Python
Development Status: Active

[1] A. Greasley, H. Crook, and R. Fulford, "Music listening and hearing aids: perspectives from audiologists and their patients," International Journal of Audiology, vol. 59, no. 9, pp. 694–706, 2020.
[2] Zhiyao Duan and Bryan Pardo, "Soundprism: an online system for score-informed source separation of music audio," IEEE Journal of Selected Topics in Signal Process., vol. 5, no. 6, pp. 1205-1215, 2011.
[3] Bochen Li *, Xinzhao Liu *, Karthik Dinesh, Zhiyao Duan, Gaurav Sharma, "Creating a multi-track classical music performance dataset for multi-modal music analysis: Challenges, insights, and applications", IEEE Transactions on Multimedia, 2018. (* equal contribution).
[4] Murphy, D.T. and Shelley, S., 2010, November. Openair: An interactive auralization web resource and database. In Audio Engineering Society Convention 129. Audio Engineering Society.
[5] Zotter, F. and Frank, M., 2019. Ambisonics: A practical 3D audio theory for recording, studio production, sound reinforcement, and virtual reality (p. 5). Springer Nature.
[6] Gotham, M., Redbond, M., Bower, B. and Jonas, P., 2023, November. The "OpenScore String Quartet" Corpus. In Proceedings of the 10th International Conference on Digital Libraries for Musicology (pp. 49-57).
[7] Sarkar, S., Benetos, E. and Sandler, M., 2022. Ensembleset: A new high-quality synthesised dataset for chamber ensemble separation.

	All versions	This version
Views	654	253
Downloads	1,299	197
Data volume	2.6 TB	342.0 GB

Cadenza Challenge (CAD2): databases for rebalancing classical music task

Contributors

Data collector:

Data curators:

A new version has been published.

Please ensure you are working with the last version.

Cadenza

Methods

1) CadenzaWoodwind

2) EnsembleSetSmall

3) Real Data for Tuning

Files

Restricted

Additional details

Funding

Dates

Software

References

Cadenza Challenge (CAD2): databases for rebalancing classical music task

Creators

Contributors

Data collector:

Data curators:

Description

A new version has been published.

Please ensure you are working with the last version.

Cadenza

Methods

1) CadenzaWoodwind

2) EnsembleSetSmall

3) Real Data for Tuning

Files

Restricted

Additional details

Funding

Dates

Software

References