TinySOL: an audio dataset of isolated musical notes
Version 6.0, February 2020.
Carmine-Emanuele Cella (1), Daniele Ghisi (1), Vincent Lostanlen (2), Fabien Lévy (3), Joshua Fineberg (4), Yan Maresz (5)
(1): UC Berkeley
(2): New York University
(3): Columbia University
(4): Boston University
(5): Conservatoire de Paris
TinySOL is a dataset of 2913 samples, each containing a single musical note from one of 14 different instruments:
- Bass Tuba
- French Horn
- Trumpet in C
- Clarinet in B-flat
- Alto Saxophone
These sounds were originally recorded at Ircam in Paris (France) between 1996 and 1999, as part of a larger project named Studio On Line (SOL). Although SOL contains many combinations of mutes and extended playing techniques, TinySOL purely consists of sounds played in the so-called "ordinary" style, and in absence of mute.
TinySOL can be used for creative purposes insofar at the use complies with the Creative Commons Attribution 4.0 International license (see below).
TinySOL can be used for education and research purposes. In particular, it can be employed as a dataset for training and/or evaluating music information retrieval (MIR) systems, for tasks such as instrument recognition or fundamental frequency estimation. For this purpose, we provide an official 5-fold split of TinySOL. This split has been carefully balanced in terms of instrumentation, pitch range, and dynamics. For the sake of research reproducibility, we encourage users of TinySOL to adopt this split and report their results in terms of average performance across folds.
TinySOL contains 2913 audio clips as WAV files, sampled at 44.1 kHz, with a single channel (mono), at a bit depth of 16. This is equivalent to the audio quality of a compact disc. Audio clips vary in duration between two and ten seconds.
Every audio file has a file path of the form:
- <FAMILY> corresponds to the instrument family: "Brass", "Keyboards" (includes accordion), "Strings", and "Winds" (i.e., woodwinds).
- <INSTRUMENT> is the full name of the instrument.
- "ordinario" denotes the ordinary playing technique. This is in contrast with the rest of the SOL dataset, which also encompasses extended playing techniques.
- <INSTR> is the abbreviation of the instrument.
- "ord" is the abbreviation of "ordinario".
- <PITCH> denotes the pitch of the musical note. This pitch is encoded in the American standard pitch notation: pitch class (C means "do") followed by pitch octave. According to this convention, A4 has a fundamental frequency of 440 Hz.
- <DYN> denotes the intensity dynamics, ranked from pp (pianissimo) to ff (fortissimo).
- <INSTANCE> contains additional information, when applicable. For example, for bowed string instruments, the same pitch may sometimes be achieved on different positions and different strings, resulting in small timbre differences. In this case the label "1c", "2c", "3c", or "4c" denotes the string which is being bowed. (The letter c originates from the word "corde", which means string in French.) By convention, the first string is the one with the highest pitch when played as an open string. Furthermore, on some wind instruments, the same note was played multiple times, e.g. at multiple durations. In this case, we use the label "alt1", "alt2", etc. to denote alternative instances of the note. If none of these tags apply, the <INSTANCE> field becomes "N", which stands for "Not Applicable".
- <MISC> contains additional information, if applicable. In TinySOL, some pitches were never recorded (about 1% of the whole dataset), and thus missing from the chromatic scale. In this case, the <MISC> tag contains a letter "R", to denote the fact that the corresponding WAV file has been obtained by transforming a different audio clip via some digital frequency transposition (similar to Auto-Tune). The letter "R" stands for "resampled". Furthermore, some pitches (about 20% of the whole dataset) were slightly out of tune in comparison with the A440 tuning standard. Again, we applied some digital frequency transposition to correct them and put them exactly in tune. The amount of frequency transposition is measured in "cents" of an equal-tempered semitone. The letter "T" stands for "tuned". Because we employed a high-fidelity algorithm for frequency transposition, and because the amount of digital frequency transposition is small, the timbre of pitch-corrected notes remains faithful to the instrument. If none of these tags apply, the <MISC> field becomes "N", which stands for "natural"; in this case, the note is distributed exactly as it was recorded in the studio.
For example, "Strings/Violin/ordinario/Vn-ord-D#7-mf-1c-T22d_R100u" corresponds to:
- a violin sound ;
- played in the ordinary playing technique ;
- at pitch D#7 (approximately 2489 Hz) ;
- with mezzoforte dynamics ;
- on the first string ; and
- resampled from a D7 by raising pitch by a semitone, i.e. 100 cents (R100u)
- lowered by 22 cents (T22d) to match the A440 tuning standard.
The TinySOL_metadata.csv file contains 2913 rows, one for each audio clip. It can be opened by a text editor or by a spreadsheet software application. It contains 13 columns:
- Path to the WAV file, in UNIX filesystem format. For Windows compatibility, replace the slashes ("/") by backslashes ("\"). Ex: "Brass/BTb/BTb-ord-A#1-ff-N.wav"
- Fold ID. Either equal to 0, 1, 2, 3, or 4.
- Family. Ex: "Brass"
- Instrument abbreviation. Ex: "BTb"
- Instrument name in full. Ex: "Bass Tuba"
- Technique abbreviation. Always equal to "ord" in the case of TinySOL.
- Technique name in full. Always equal to "ordinario" in the case of TinySOL.
- Pitch. Ex: "A#1"
- Pitch ID in MIDI format. Ex: 34. Integer in the range 0-127.
- Dynamics. Ex: "ff".
- Dynamics ID. Integer. pp maps to 0 and ff maps to 4. The higher, the louder.
- Instance ID. Integer in the range 0-4
- String ID. Equal to 1, 2, 3, 4, or empty if not applicable.
- "Needed digital retuning". TRUE if the file has been pitch-shifted with digital audio effects; FALSE otherwise.
Conditions of Use
TinySOL was created in 2020 by Carmine-Emanuele Cella, Daniele Ghisi, Vincent Lostanlen, Fabien Lévy, Joshua Fineberg, and Yan Maresz.
TinySOL is a derivative of SOL. We wish to thank Hugues Vinet, Greg Beller, and all coordinators of the Ircam Forum for their authorization to upload TinySOL to Zenodo.
TinySOL is offered free of charge under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) license:
The dataset and its contents are made available on an "as is" basis and without warranties of any kind, including without limitation satisfactory quality and conformity, merchantability, fitness for a particular purpose, accuracy or completeness, or absence of errors. Subject to any liability that may not be excluded or limited by law, the authors are not liable for, and expressly exclude all liability for, loss or damage however and whenever caused to anyone by any use of the TinySOL dataset or any part of it.
We encourage TinySOL users to subscribe to the Ircam Forum so that they can have access to larger versions of SOL. While downloading full version of SOL requires premium membership (for a yearly fee), a medium-sized version named OrchideaSOL is made available free of charge to all members. Note, however, that TinySOL is the only subset of SOL which is released under a Creative Commons License. For more information, please visit: https://forum.ircam.fr/
1.0 was released on January 31st, 2020.
2.0 and 3.0 were released the same day, after fixing an issue in the metadata related to file paths.
4.0 was released on February 7th, 2020. The file structure of the tar.gz file was simplified so as to improve the interoperability with the mirdata Python package.
5.0 was released on February 23th, 2020. New audio samples were added (from 2478 to 2913) and more details were supplied regarding retuning.
6.0 was released the same day, after fixing an issue in the metadata related to file paths.
Please help us improve TinySOL by sending your feedback to:
For issues regarding the metadata encoding, the five-fold split, or the TinySOL module in mirdata, please write to:
In case of a problem, please include as many details as possible.