Emotional Voice Messages (EMOVOME) database
Creators
- 1. HUMAN-tech Institute, Universitat Politècnica de València, Valencia, 46022, Spain
Description
The Emotional Voice Messages (EMOVOME) database is a speech dataset collected for emotion recognition in real-world conditions. It contains 999 spontaneous voice messages from 100 Spanish speakers, collected from real conversations on a messaging app. EMOVOME includes both expert and non-expert emotional annotations, covering valence and arousal dimensions, along with emotion categories for the expert annotations. Detailed participant information is provided, including sociodemographic data and personality trait assessments using the NEO-FFI questionnaire. Moreover, EMOVOME provides audio recordings of participants reading a given text, as well as transcriptions of all 999 voice messages. Additionally, baseline models for valence and arousal recognition are provided, utilizing both speech and audio transcriptions.
Description
For details on the EMOVOME database, please refer to the article:
"EMOVOME Database: Advancing Emotion Recognition in Speech Beyond Staged Scenarios". Lucía Gómez-Zaragozá, Rocío del Amor, María José Castro-Bleda, Valery Naranjo, Mariano Alcañiz Raya, Javier Marín-Morales. (pre-print available in https://doi.org/10.48550/arXiv.2403.02167)
Content
The Zenodo repository contains four files:
- EMOVOME_agreement.pdf: agreement file required to access the original audio files, detailed in section Usage Notes.
- labels.csv: ratings of the three non-experts and the expert annotator, independently and combined.
- participants_ids.csv: table mapping each numerical file ID to its corresponding alphanumeric participant ID.
- transcriptions.csv: transcriptions of each audio.
The repository also includes three folders:
- Audios: it contains the file features_eGeMAPSv02.csv corresponding to the standard acoustic feature set used in the baseline model, and two folders:
- Lecture: contains the audio files corresponding to the text readings, with each file named according to the participant's ID.
- Emotions: contains the voice recordings from the messaging app provided by the user, named with a file ID.
- Questionnaires: it contains two files: 1) sociodemographic_spanish.csv and sociodemographic_english.csv are the sociodemographic data of participants in Spanish and English, respectively, including the demographic information; and 2) NEO-FFI_spanish.csv includes the participants’ answers to the Spanish version of the NEO-FFI questionnaire. The three files include a column indicating the participant's ID to link the information.
- Baseline_emotion_recognition: it includes three files and two folders. The file partitions.csv specifies the proposed data partition. Particularly, the dataset is divided into 80% for development and 20% for testing using a speaker-independent approach, i.e., samples from the same speaker are not included in both development and test. The development set includes 80 participants (40 female, 40 male) containing the following distribution of labels: 241 negative, 305 neutral and 261 positive valence; and 148 low, 328 neutral and 331 high arousal. The test set includes 20 participants (10 female, 10 male) with the distribution of labels that follows: 57 negative, 62 neutral and 73 positive valence; and 13 low, 70 neutral and 109 high arousal. Files baseline_speech.ipynb and baseline_text.ipynb contain the code used to create the baseline emotion recognition models based on speech and text, respectively. The actual trained models for valence and arousal prediction are provided in folders models_speech and models_text.
Audio files in “Lecture” and “Emotions” are only provided to the users that complete the agreement file in section Usage Notes. Audio files are in Ogg Vorbis format at 16-bit and 44.1 kHz or 48 kHz. The total size of the “Audios” folder is about 213 MB.
Usage Notes
All the data included in the EMOVOME database is publicly available under the Creative Commons Attribution 4.0 International license. The only exception is the original raw audio files, for which an additional step is required as a security measure to safeguard the speakers' privacy. To request access, interested authors should first complete and sign the agreement file EMOVOME_agreement.pdf and send it to the corresponding author (jamarmo@htech.upv.es). The data included in the EMOVOME database is expected to be used for research purposes only. Therefore, the agreement file states that the authors are not allowed to share the data with profit-making companies or organisations. They are also not expected to distribute the data to other research institutions; instead, they are suggested to kindly refer interested colleagues to the corresponding author of this article. By agreeing to the terms of the agreement, the authors also commit to refraining from publishing the audio content on the media (such as television and radio), in scientific journals (or any other publications), as well as on other platforms on the internet. The agreement must bear the signature of the legally authorised representative of the research institution (e.g., head of laboratory/department). Once the signed agreement is received and validated, the corresponding author will deliver the "Audios" folder containing the audio files through a download procedure. A direct connection between the EMOVOME authors and the applicants guarantees that updates regarding additional materials included in the database can be received by all EMOVOME users.
Files
emovome.zip
Files
(1.3 GB)
Name | Size | Download all |
---|---|---|
md5:f5d9a21ed4852cc5e352a711a144699e
|
1.3 GB | Preview Download |