Dataset Open Access

# OrchideaSOL: an audio dataset of isolated musical notes, including mutes and extended playing techniques

Carmine-Emanuele Cella; Daniele Ghisi; Vincent Lostanlen; Fabien Lévy; Joshua Fineberg; Yan Maresz

<dc:description>OrchideaSOL
==========
Version 1.0, February 2020.

Created By
--------------

Carmine-Emanuele Cella (1), Daniele Ghisi (1), Vincent Lostanlen (2), Fabien Lévy (3), Joshua Fineberg (4), Yan Maresz (5)

(1): UC Berkeley
(2): New York University
(3): Columbia University
(4): Boston University
(5): Conservatoire de Paris

Description
---------------

OrchideaSOL is a dataset of 13265 samples, each containing a single musical note from one of 14 different instruments:

Bass Tuba
French Horn
Trombone
Trumpet in C
Accordion
Contrabass
Violin
Viola
Violoncello
Bassoon
Clarinet in B-flat
Flute
Oboe
Alto Saxophone

These sounds were originally recorded at Ircam in Paris (France) between 1996 and 1999, as part of a larger project named Studio On Line (SOL). One asset of OrchideaSOL is that it contains many combinations of mutes and extended playing techniques.

The OrchideaSOL audio data can be used for creative purposes insofar at the use complies with the Ircam Forum License. Please visit: https://forum.ircam.fr/legal/contrat-de-licence-forum-ircam/

The OrchideaSOL metadata can be used for creative purposes insofar at the use complies with the Creative Commons Attribution 4.0 International license (see below).

OrchideaSOL can be used for education and research purposes. In particular, it can be employed as a dataset for training and/or evaluating music information retrieval (MIR) systems, for tasks such as instrument recognition, playing technique recognition, or fundamental frequency estimation. For this purpose, we provide an official 5-fold split of OrchideaSOL. This split has been carefully balanced in terms of instrumentation, pitch range, and dynamics. For the sake of research reproducibility, we encourage users of OrchideaSOL to adopt this split and report their results in terms of average performance across folds.

Data Files
--------------

OrchideaSOL contains 13265 audio clips as WAV files, sampled at 44.1 kHz, with a single channel (mono), at a bit depth of 16. This is equivalent to the audio quality of a compact disc. Audio clips vary in duration between two and ten seconds.

Every audio file has a file path of the form:
&lt;FAMILY&gt;/&lt;INSTRUMENT&gt;&lt;+MUTE&gt;/&lt;TECHNIQUE&gt;/&lt;INSTR&gt;&lt;+M&gt;-&lt;TECH&gt;-&lt;PITCH&gt;-&lt;DYN&gt;-&lt;INSTANCE&gt;-&lt;MISC&gt;.wav

where:

&lt;FAMILY&gt; corresponds to the instrument family: "Brass", "Keyboards" (includes accordion), "Strings", and "Winds" (i.e., woodwinds).
&lt;INSTRUMENT&gt; is the full name of the instrument.
&lt;+MUTE&gt; is the type of mute being used, such as "wah", "harmon", "piombo", or "sordina". If there is no mute, this field is absent.
&lt;TECHNIQUE&gt; is the type of playing technique.
&lt;INSTR&gt; is the abbreviation of the instrument.
&lt;+M&gt; is the abbreviation of the type of mute, if applicable.
&lt;TECH&gt; is the abbreviation of playing technique.
&lt;PITCH&gt; denotes the pitch of the musical note. This pitch is encoded in the American standard pitch notation: pitch class (C means "do") followed by pitch octave. According to this convention, A4 has a fundamental frequency of 440 Hz.
&lt;DYN&gt; denotes the intensity dynamics, ranked from pp (pianissimo) to ff (fortissimo).
&lt;INSTANCE&gt; contains additional information, when applicable. For example, for bowed string instruments, the same pitch may sometimes be achieved on different positions and different strings, resulting in small timbre differences. In this case the label "1c", "2c", "3c", or "4c" denotes the string which is being bowed. (The letter c originates from the word "corde", which means string in French.) By convention, the first string is the one with the highest pitch when played as an open string. Furthermore, on some wind instruments, the same note was played multiple times, e.g. at multiple durations. In this case, we use the label "alt1", "alt2", etc. to denote alternative instances of the note. If none of these tags apply, the &lt;INSTANCE&gt; field becomes "N", which stands for "Not Applicable".
&lt;MISC&gt; contains additional information, if applicable. In OrchideaSOL, some pitches were never recorded, and thus missing from the chromatic scale. In this case, the &lt;MISC&gt; tag contains a letter "R", to denote the fact that the corresponding WAV file has been obtained by transforming a different audio clip via some digital frequency transposition (similar to Auto-Tune). The letter "R" stands for "resampled". Furthermore, some pitches were slightly out of tune in comparison with the A440 tuning standard. Again, we applied some digital frequency transposition to correct them and put them exactly in tune. The amount of frequency transposition is measured in "cents" of an equal-tempered semitone. The letter "T" stands for "tuned". Because we employed a high-fidelity algorithm for frequency transposition, and because the amount of digital frequency transposition is small, the timbre of pitch-corrected notes remains faithful to the instrument. If none of these tags apply, the &lt;MISC&gt; field becomes "N", which stands for "natural"; in this case, the note is distributed exactly as it was recorded in the studio.

For example, "Strings/Violin+sordina/tremolo/Vn+S-trem-A4-mf-4c-T13d_R200d.wav" corresponds to:

a violin sound;
equipped with a sordina mute;
played in the tremolo playing technique;
at pitch A4 (440 Hz);
with mezzoforte dynamics;
on the fourth string (i.e. the lowest);
resampled from a B4 by lowering pitch by a semitone, i.e. 100 cents (R100d)
lowered by 13 cents (T22d) to match the A440 tuning standard.

-------------------

The OrchideaSOL_metadata.csv file contains 13265 rows, one for each audio clip. It can be opened by a text editor or by a spreadsheet software application. It contains 13 columns:

Path to the WAV file, in UNIX filesystem format. For Windows compatibility, replace the slashes ("/") by backslashes ("\"). Ex: "Strings/Violin+sordina/tremolo/Vn+S-trem-A4-mf-4c-T13d_R200d.wav"
Fold ID. Either equal to 0, 1, 2, 3, or 4.
Family. Ex: "Brass"
Instrument abbreviation. Ex: "BTb"
Instrument name in full. Ex: "Bass Tuba"
Technique abbreviation.
Technique name in full.
Pitch. Ex: "A#1"
Pitch ID in MIDI format. Ex: 34. Integer in the range 0-127.
Dynamics. Ex: "ff".
Dynamics ID. Integer. pp maps to 0 and ff maps to 4. The higher, the louder.
Instance ID. Integer in the range 0-4
String ID. Equal to 1, 2, 3, 4, or empty if not applicable.
"Needed digital retuning". TRUE if the file has been pitch-shifted with digital audio effects; FALSE otherwise.

Conditions of Use
------------------------

OrchideaSOL was created in 2020 by Carmine-Emanuele Cella, Daniele Ghisi, Vincent Lostanlen, Fabien Lévy, Joshua Fineberg, and Yan Maresz.

OrchideaSOL is a derivative of SOL. We wish to thank Hugues Vinet, Greg Beller, and all coordinators of the Ircam Forum for their authorization to upload the metadata of OrchideaSOL to Zenodo.

The audio samples in OrchideaSOL are offered free of charge under the Ircam Forum License. Please visit: https://forum.ircam.fr/legal/contrat-de-licence-forum-ircam/

The dataset and its contents are made available on an "as is" basis and without warranties of any kind, including without limitation satisfactory quality and conformity, merchantability, fitness for a particular purpose, accuracy or completeness, or absence of errors. Subject to any liability that may not be excluded or limited by law, the authors are not liable for, and expressly exclude all liability for, loss or damage however and whenever caused to anyone by any use of the OrchideaSOL dataset or any part of it.

Versions
-----------
1.0 was released on February 24th, 2020.

Feedback
-------------

carmine.cella@berkeley.edu

For issues regarding the metadata encoding, the five-fold split, or the OrchideaSOL module in mirdata, please write to:
vincent.lostanlen@nyu.edu

In case of a problem, please include as many details as possible.
