Audio Commons Ground Truth Data for deliverables D4.4, D4.10 and D4.12
This dataset contains the ground truth data used to evaluate the musical pitch, tempo and key estimation algorithms developed during the AudioCommons H2020 EU project and which are part of the Audio Commons Audio Extractor tool. It also includes ground truth information for the single-eventnessaudio descriptor also developed for the same tool.
This ground truth data has been used to generate the following documents:
Deliverable D4.4: Evaluation report on the first prototype tool for the automatic semantic description of music samples
Deliverable D4.10: Evaluation report on the second prototype tool for the automatic semantic description of music samples
Deliverable D4.12: Release of tool for the automatic semantic description of music samples
All ground truth data in this repository is provided in the form of CSV files. Each CSV file corresponds to one of the individual datasets used in one or more evaluation tasks of the aforementioned deliverables. This repository does not include the audio files of each individual dataset, but includes references to the audio files. The following paragraphs describe the structure of the CSV files and give some notes about how to obtain the audio files in case these would be needed.
Structure of the CSV files
All CSV files in this repository (with the sole exception of SINGLE EVENT - Ground Truth.csv) feature the following 5 columns:
Audio reference: reference to the corresponding audio file. This will either be a string withe the filename, or the Freesound ID (for one dataset based on Freesound content). See below for details about how to obtain those files.
Audio reference type: will be one of Filename or Freesound ID, and specifies how the previous column should be interpreted.
Key annotation: tonality information as a string with the form "RootNote minor/major". Audio files with no ground truth annotation for tonality are left blank. Ground truth annotations are parsed from the original data source as described in the text of deliverables D4.4 and D4.10.
Tempo annotation: tempo information as an integer representing beats per minute. Audio files with no ground truth annotation for tempo are left blank. Ground truth annotations are parsed from the original data source as described in the text of deliverables D4.4 and D4.10. Note that integer values are used here because we only have tempo annotations for music loops which typically only feature integer tempo values.
Pitch annotation: pitch information as an integer representing the MIDI note number corresponding to annotated pitch's frequency. Audio files with no ground truth pitch for tempo are left blank. Ground truth annotations are parsed from the original data source as described in the text of deliverables D4.4 and D4.10.
The remaining CSV file, SINGLE EVENT - Ground Truth.csv, has only the following 2 columns:
Freesound ID: sound ID used in Freesound to identify the audio clip.
Single Event: boolean indicating whether the corresponding sound is considered to be a single event or not. Single event annotations were collected by the authors of the deliverables as described in deliverable D4.10.
How to get the audio data
In this section we provide some notes about how to obtain the audio files corresponding to the ground truth annotations provided here. Note that due to licensing restrictions we are not allowed to re-distribute the audio data corresponding to most of these ground truth annotations.
Apple Loops (APPL): This dataset includes some of the music loops included in Apple's music software such as Logic or GarageBand. Access to these loops requires owning a license for the software. Detailed instructions about how to set up this dataset are provided here.
Carlos Vaquero Instruments Dataset (CVAQ): This dataset includes single instrument recordings carried out by Carlos Vaquero as part of this master thesis. Sounds are available as Freesound packs and can be downloaded at this page: https://freesound.org/people/Carlos_Vaquero/packs
Freesound Loops 4k (FSL4): This dataset set includes a selection of music loops taken from Freesound. Detailed instructions about how to set up this dataset are provided here.
Giant Steps Key Dataset (GSKY): This dataset includes a selection of previews from Beatport annotated by key. Audio and original annotations available here.
Good-sounds Dataset (GSND): This dataset contains monophonic recordings of instrument samples. Full description, original annotations and audio are available here.
University of IOWA Musical Instrument Samples (IOWA): This dataset was created by the Electronic Music Studios of the University of IOWA and contains recordings of instrument samples. The dataset is available upon request by visiting this website.
Mixcraft Loops (MIXL): This dataset includes some of the music loops included in Acoustica's Mixcraft music software. Access to these loops requires owning a license for the software. Detailed instructions about how to set up this dataset are provided here.
NSynth Dataset Test and Validation sets (NSYT and NSYV): NSynth is a large-scale and high-quality dataset of annotated musical notes built with synthesized sounds by Google's Magenta team. Full dataset description including original annotations and audio files is available here.
Philarmonia Orchestra Sound Samples Dataset (PHIL): This includes thousands of free, downloadable sound samples specially recorded by Philharmonia Orchestra players. Audio files are freely downloadable from the philarmonia orchestra website.
Freesound Single Events Dataset (SINGLE EVENT): This includes a selection of Freesound audio clips representing audio signals containing either a single audio event or multiple ones. Original audio files can be retrieved by downloading individual audio clips from Freesound using the ID identifier provided in the CSV file. A similar procedure to that described here could be followed.