BEE-MER: BIMODAL EMBEDDINGS ENSEMBLE FOR MUSIC EMOTION RECOGNITION

LOURO, Pedro; RIBEIRO, Tiago; MALHEIRO, Ricardo; PANDA, Renato; PAIVA, Rui Pedro

doi:10.5281/zenodo.15837365

Published July 8, 2025 | Version v1

Conference paper Open

BEE-MER: BIMODAL EMBEDDINGS ENSEMBLE FOR MUSIC EMOTION RECOGNITION

1. University of Coimbra, CISUC/LASI – Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
2. Polytechnic Institute of Leiria School of Technology and Management, Leiria, Portugal
3. Ci2 — Smart Cities Research Center, Polytechnic Institute of Tomar, Tomar, Portugal

Static music emotion recognition systems typically focus on audio for classification, although some research has explored the potential of analyzing lyrics as well. Both approaches face challenges when it comes to accurately discerning emotions that have similar energy but differing valence, and vice versa, depending on the modality used. Previous studies have introduced bimodal audio-lyrics systems that outperform single-modality solutions by combining information from standalone systems and conducting joint classification. In this study, we propose and compare two bimodal approaches: one strictly based on embedding models (audio and word embeddings) and another one following a standard spectrogram-based deep learning method for the audio part. Additionally, we explore various information fusion strategies to leverage both modalities effectively. The main conclusions of this work are the following: i) the two approaches show comparable overall classification performance; ii) the embedding-only approach leads to a higher confusion between quadrants 3 and 4 of Russell's circumplex model; iii) and this approach requires significantly less computational cost for training. We discuss the insights gained from the approaches we experimented with and highlight promising avenues for future research.

Files

259_SMC25_proceedings_with_concerts.pdf

Files (292.3 kB)

Name	Size	Download all
259_SMC25_proceedings_with_concerts.pdf md5:891ea96a20594f196aad6ac34ace91fc	292.3 kB	Preview Download

255

Views

139

Downloads

Show more details

	All versions	This version
Views	255	255
Downloads	139	139
Data volume	48.5 MB	48.5 MB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

Institute of Electronic Music and Acoustics, University of Music and Performing Arts Graz

Imprint

Proceedings of the 22nd Sound and Music Computing Conference (SMC2025), Graz, July 2025, 250-257. Graz, Austria. ISBN: 978-3-200-10642-0.

Conference

Sound and Music Computing 2025 (SMC 2025) , Graz, Austria, 10-12 July 2025

Languages

English

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: July 9, 2025
Modified: July 9, 2025

BEE-MER: BIMODAL EMBEDDINGS ENSEMBLE FOR MUSIC EMOTION RECOGNITION

Authors/Creators

Description

Files

259_SMC25_proceedings_with_concerts.pdf

Files (292.3 kB)