Published January 14, 2022 | Version 1.0.0
Dataset Open

ESMUC Choir Dataset

  • 1. Music Technology Group, Universitat Pompeu Fabra
  • 2. Music Technology Group, Universitat Pompeu Fabra / European Commission, Joint Research Centre

Description

ESMUC Choir Dataset is a multi-track dataset of Western choral music that contains individual audio recordings of 12 singers, all of them undergraduate students in vocal performance at Escola Superior de Música de Catalunya (ESMUC), the professional music school in Barcelona (Spain), at the time of recording.

ECD is released as part of the following Ph.D. dissertation:

Helena Cuesta. Data-driven Pitch Content Description of Choral Singing Recordings. PhD thesis, Universitat Pompeu Fabra. 2022 (to appear).

Singers are unevenly distributed into Soprano, Alto, Tenor, and Bass (SATB) sections and were recorded simultaneously. A close-up microphone captured each voice, and the whole choir sound was captured by two stereo room microphones placed at two different distances from the singers. ECD comprises all audio files from the multi-track recording and manually corrected annotations of F0 contours and notes.


ECD includes the following pieces:

  • Die Himmel erzählen die Ehre Gottes (Op. 11, SWV 386), written by Heinrich Schütz (Germany, 1585-1672).
  • Der Greis, written by Franz Joseph Haydn (Austria, 1732-1809).
  • Seele Christi, heilige mich, written by Anton Heiller (Austria, 1923-1979).

All singers' tracks are mono audio files, and the two room mics are stereo, all using a sampling rate of 44 100 Hz. The total duration of accumulated audio for the entire dataset is roughly 31 minutes.

ECD contains three songs, two of them recorded in shorter parts, as well as some brief voice warm-up exercises. For each of the songs, the dataset presents three modalities:

  1. Full takes (FT), where the song (or song part) is performed from beginning to end by the entire choir.
  2. Isolated section (IS), where some sections are recorded in isolation (other sections are silent) performing parts of the songs.
  3. Short excerpts (SE), which are short passages of the songs, mostly performed to practice challenging parts, performed by the full choir.

Songs and warm-up exercises are organized in Takes, which are numbered, i.e., take1 or take3. For the IS setting, filenames refer to the choir section. Similarly, the short passages are indicated by SE and the passage number. Finally, we denote each singer using S/A/T/B and a number, e.g., T3 refers to the third tenor.

All audio tracks from the dataset, except the room microphones, have two associated annotation files: one for the F0 contour, and a second one with the note annotations. Tracks from the warm-up exercises only have F0 contours, since there is no associated score to them.

A README file accompanies the dataset with specific information about the filenames.

All dataset files and the README are compressed in the provided zip file.

Special cases:

  • Die Himmel has two soprano parts and two tenor parts. In the dataset:
    • Soprano part 1 is sung by S1 and S2
    • Soprano part 2 is sung by S3, S4, and S5
    • Tenor part 1 is sung by T1 and T2
    • Tenor part 2 is sung by T3
  • For some takes, singers from different sections were mixed (e.g., randomly distributed and not grouped by section). These are the takes:
    • DG_take3
    • DG_take4
    • SC3_take2
    • SC2_take3
    • SC1_take3
    • SC1_short1
    • SC1_short2
    • SC3_short6
    • SC3_short7

Known issues:

  • The annotation file SC1_FT_take3S1.f0 has a mistake in the filename and should be SC1_FT_take3_S1.f0

Files

EsmucChoirDataset_v1.0.0.zip

Files (2.3 GB)

Name Size Download all
md5:ba2b4b5c4326dbe0a6d391167fa30574
2.3 GB Preview Download

Additional details

Funding

TROMPA – Towards Richer Online Music Public-domain Archives 770376
European Commission