Dataset Open Access
Version 2.0, April 2020.
Carmine-Emanuele Cella (1), Daniele Ghisi (1), Vincent Lostanlen (2), Fabien Lévy (3), Joshua Fineberg (4), Yan Maresz (5)
(1): UC Berkeley
(2): New York University
(3): Columbia University
(4): Boston University
(5): Conservatoire de Paris
OrchideaSOL is a dataset of 13265 samples, each containing a single musical note from one of 14 different instruments:
These sounds were originally recorded at Ircam in Paris (France) between 1996 and 1999, as part of a larger project named Studio On Line (SOL). One asset of OrchideaSOL is that it contains many combinations of mutes and extended playing techniques.
The OrchideaSOL audio data can be used for creative purposes insofar at the use complies with the Ircam Forum License. Please visit: https://forum.ircam.fr/legal/contrat-de-licence-forum-ircam/
The OrchideaSOL metadata can be used for creative purposes insofar at the use complies with the Creative Commons Attribution 4.0 International license (see below).
OrchideaSOL can be used for education and research purposes. In particular, it can be employed as a dataset for training and/or evaluating music information retrieval (MIR) systems, for tasks such as instrument recognition, playing technique recognition, or fundamental frequency estimation. For this purpose, we provide an official 5-fold split of OrchideaSOL. This split has been carefully balanced in terms of instrumentation, pitch range, and dynamics. For the sake of research reproducibility, we encourage users of OrchideaSOL to adopt this split and report their results in terms of average performance across folds.
OrchideaSOL contains 13265 audio clips as WAV files, sampled at 44.1 kHz, with a single channel (mono), at a bit depth of 16. This is equivalent to the audio quality of a compact disc. Audio clips vary in duration between two and ten seconds.
Every audio file has a file path of the form:
For example, "Strings/Violin+sordina/tremolo/Vn+S-trem-A4-mf-4c-T13d_R200d.wav" corresponds to:
The audio data for OrchideaSOL is not directly downloadable on Zenodo. Rather, it can be downloaded for free after registering to the Ircam forum. Please visit: https://forum.ircam.fr/
The OrchideaSOL_metadata.csv file contains 13265 rows, one for each audio clip. It can be opened by a text editor or by a spreadsheet software application. It contains 13 columns:
Conditions of Use
OrchideaSOL was created in 2020 by Carmine-Emanuele Cella, Daniele Ghisi, Vincent Lostanlen, Fabien Lévy, Joshua Fineberg, and Yan Maresz.
OrchideaSOL is a derivative of SOL. We wish to thank Hugues Vinet, Greg Beller, and all coordinators of the Ircam Forum for their authorization to upload the metadata of OrchideaSOL to Zenodo.
The audio samples in OrchideaSOL are offered free of charge under the Ircam Forum License. Please visit: https://forum.ircam.fr/legal/contrat-de-licence-forum-ircam/
The dataset and its contents are made available on an "as is" basis and without warranties of any kind, including without limitation satisfactory quality and conformity, merchantability, fitness for a particular purpose, accuracy or completeness, or absence of errors. Subject to any liability that may not be excluded or limited by law, the authors are not liable for, and expressly exclude all liability for, loss or damage however and whenever caused to anyone by any use of the OrchideaSOL dataset or any part of it.
1.0 was released on February 24th, 2020.
2.0 was released on April 4th, 2020. It fixes a bug in the instance IDs of oboe sounds in the "blow without reed" technique.
Please help us improve OrchideaSOL by sending your feedback to:
For issues regarding the metadata encoding, the five-fold split, or the OrchideaSOL module in mirdata, please write to:
In case of a problem, please include as many details as possible.