Dataset Open Access
Version 6.0, February 2020.
Carmine-Emanuele Cella (1), Daniele Ghisi (1), Vincent Lostanlen (2), Fabien Lévy (3), Joshua Fineberg (4), Yan Maresz (5)
(1): UC Berkeley
(2): New York University
(3): Columbia University
(4): Boston University
(5): Conservatoire de Paris
TinySOL is a dataset of 2913 samples, each containing a single musical note from one of 14 different instruments:
These sounds were originally recorded at Ircam in Paris (France) between 1996 and 1999, as part of a larger project named Studio On Line (SOL). Although SOL contains many combinations of mutes and extended playing techniques, TinySOL purely consists of sounds played in the so-called "ordinary" style, and in absence of mute.
TinySOL can be used for creative purposes insofar at the use complies with the Creative Commons Attribution 4.0 International license (see below).
TinySOL can be used for education and research purposes. In particular, it can be employed as a dataset for training and/or evaluating music information retrieval (MIR) systems, for tasks such as instrument recognition or fundamental frequency estimation. For this purpose, we provide an official 5-fold split of TinySOL. This split has been carefully balanced in terms of instrumentation, pitch range, and dynamics. For the sake of research reproducibility, we encourage users of TinySOL to adopt this split and report their results in terms of average performance across folds.
TinySOL contains 2913 audio clips as WAV files, sampled at 44.1 kHz, with a single channel (mono), at a bit depth of 16. This is equivalent to the audio quality of a compact disc. Audio clips vary in duration between two and ten seconds.
Every audio file has a file path of the form:
For example, "Strings/Violin/ordinario/Vn-ord-D#7-mf-1c-T22d_R100u" corresponds to:
The TinySOL_metadata.csv file contains 2913 rows, one for each audio clip. It can be opened by a text editor or by a spreadsheet software application. It contains 13 columns:
Conditions of Use
TinySOL was created in 2020 by Carmine-Emanuele Cella, Daniele Ghisi, Vincent Lostanlen, Fabien Lévy, Joshua Fineberg, and Yan Maresz.
TinySOL is a derivative of SOL. We wish to thank Hugues Vinet, Greg Beller, and all coordinators of the Ircam Forum for their authorization to upload TinySOL to Zenodo.
TinySOL is offered free of charge under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) license:
The dataset and its contents are made available on an "as is" basis and without warranties of any kind, including without limitation satisfactory quality and conformity, merchantability, fitness for a particular purpose, accuracy or completeness, or absence of errors. Subject to any liability that may not be excluded or limited by law, the authors are not liable for, and expressly exclude all liability for, loss or damage however and whenever caused to anyone by any use of the TinySOL dataset or any part of it.
We encourage TinySOL users to subscribe to the Ircam Forum so that they can have access to larger versions of SOL. While downloading full version of SOL requires premium membership (for a yearly fee), a medium-sized version named OrchideaSOL is made available free of charge to all members. Note, however, that TinySOL is the only subset of SOL which is released under a Creative Commons License. For more information, please visit: https://forum.ircam.fr/
1.0 was released on January 31st, 2020.
2.0 and 3.0 were released the same day, after fixing an issue in the metadata related to file paths.
4.0 was released on February 7th, 2020. The file structure of the tar.gz file was simplified so as to improve the interoperability with the mirdata Python package.
5.0 was released on February 23th, 2020. New audio samples were added (from 2478 to 2913) and more details were supplied regarding retuning.
6.0 was released the same day, after fixing an issue in the metadata related to file paths.
Please help us improve TinySOL by sending your feedback to:
For issues regarding the metadata encoding, the five-fold split, or the TinySOL module in mirdata, please write to:
In case of a problem, please include as many details as possible.