Published October 5, 2020 | Version v1
Conference paper Open

Transfer learning from speech to music: towards language-sensitive emotion recognition models

  • 1. Universitat Pompeu Fabra
  • 2. Social and Cognitive Computing Department, A*STAR, Singapore
  • 3. European Commission, Joint Research Centre; Universitat Pompeu Fabra

Description

In this study, we address emotion recognition using unsupervised feature learning from speech data, and test its transferability to music. Our approach is to pre-train models using speech in English and Mandarin, and then fine-tune them with excerpts of music labeled with categories of emotion. 
Our initial hypothesis is that features automatically learned from speech should be transferable to music. Namely, we expect  the intra-linguistic setting (e.g., pre-training on speech in English and fine-tuning on music in English) should result in improved performance over the cross-linguistic setting (e.g., pre-training on speech in English and fine-tuning on music in Mandarin). Our results confirm previous research on cross-domain transferability, and encourage research towards language-sensitive Music Emotion Recognition (MER) models.

Files

EUSIPCO2020_JSGC_Transfer_Learning.pdf

Files (189.3 kB)

Name Size Download all
md5:3d1d8da98f8fc134c70c62cf936e179f
189.3 kB Preview Download

Additional details

Funding

TROMPA – Towards Richer Online Music Public-domain Archives 770376
European Commission