Published March 28, 2022 | Version v1
Thesis Open

Data-driven Pitch Content Description of Choral Singing Recordings




Ensemble singing is a well-established practice across cultures, found in a great diversity of forms, languages, and levels. However, it has not been widely studied in the field of Music Information Retrieval (MIR), likely due to the lack of appropriate data. In this dissertation, we first address the data scarcity by building new open, multi-track datasets of ensemble singing. Then, we address three main research problems: multiple F0 estimation and streaming, voice assignment, and the characterization of vocal unisons, all in the context of four-part vocal ensembles. Hence, the first contribution of this thesis is the development and release of four multi-track datasets of vocal ensembles: Choral Singing Dataset, Dagstuhl ChoirSet, ESMUC Choir Dataset, and Cantoría Dataset, all of them with audio recordings and accompanying annotations. The second contribution is a set of deep learning models for multiple F0 estimation, streaming, and voice assignment of vocal quartets, mainly based on convolutional neural networks designed leveraging music domain knowledge. Finally, we propose two methods to characterize vocal unison performances in terms of pitch dispersion.


Cite as:
Helena Cuesta (2022). Data-Driven Pitch Content Description of Choral Singing Recordings. PhD thesis
Universitat Pompeu Fabra, Barcelona, Spain.


This thesis is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.


Primary funding for this thesis was provided by AGAUR (Generalitat de Catalunya) through the FI grant for the recruitment of early stage research staff (2018FI_B_01015), by the European Commission under the TROMPA project (H2020 770376), and by the Ministry of Science and Innovation of the Spanish Government under the MusicalAI project (PID2019-111403GB-I00).



Files (53.7 MB)

Name Size Download all
53.7 MB Preview Download