Published August 30, 2021 | Version v1
Conference paper Open

Automatic Analysis of the Emotional Content of Speech in Daylong Child-Centered Recordings from a Neonatal Intensive Care Unit

  • 1. Unit of Computing Sciences, Tampere University, Finland
  • 2. Department of Clinical Medicine, University of Turku, Finland
  • 3. Unit of Computing Sciences, Tampere University, Finland & Department of Signal Processing and Acoustics, Aalto University, Finland


Researchers have recently started to study how the emotional speech heard by young infants can affect their developmental outcomes. As a part of this research, hundreds of hours of daylong recordings from preterm infants’ audio environments were collected from two hospitals in Finland and Estonia in the context of so-called APPLE study. In order to analyze the emotional content of speech in such a massive dataset, an automatic speech emotion recognition (SER) system is required. However, there are no emotion labels or existing in-domain SER systems to be used for this purpose. In this paper, we introduce this initially unannotated large-scale real-world audio dataset and describe the development of a functional SER system for the Finnish subset of the data. We explore the effectiveness of alternative state-of-the-art techniques to deploy a SER system to a new domain, comparing cross-corpus generalization, WGAN-based domain adaptation, and active learning in the task. As a result, we show that the best-performing models are able to achieve a classification performance of 73.4% unweighted average recall (UAR) and 73.2% UAR for a binary classification for valence and arousal, respectively. The results also show that active learning achieves the most consistent performance compared to the two alternatives.


OR was funded by Academy of Finland grant no. 314602 and KD partially by EU Horizon-2020 grant no. 957337 MARVEL. SAB was funded by Academy of Finland grant no. 332962. The authors thank the APPLE consortium for the help in the project.



Files (453.0 kB)

Name Size Download all
453.0 kB Preview Download

Additional details


Computational basis of contextually grounded language acquisition in humans and machines 314602
Academy of Finland
MARVEL – Multimodal Extreme Scale Data Analytics for Smart Cities Environments 957337
European Commission
The change mechanisms of the Close Collaboration with Parents intervention and the adaptability of the intervention into the Australian neonatal care 332962
Academy of Finland