Published July 1, 2024 | Version 1.1
Dataset Open

Emozionalmente: a crowdsourced Italian speech emotional corpus

  • 1. Politecnico di Milano

Description

This repository contains Emozionalmente: an extensive simulated speech emotional corpus in Italian. The dataset comprises 6,902 labeled samples acted out by 431 amateur actors, each verbalizing 18 different sentences to express the Big Six emotions (anger, disgust, fear, joy, sadness, surprise) plus neutrality. The labels represent the emotional communicative intention of the actors (i.e., the seven emotional states).

Key details about the dataset:
- **Recording specifications**: The recordings were generally obtained with non-professional equipment. They are .wav files, mono-channel, with a sample size of 16 bits and a sample rate of 16,000 Hz. Each audio recording lasts 3.81 seconds on average (SD = 0.99 seconds).
- **Validation**: To validate the emotional content of the clips, 829 humans evaluated each audio recording, providing five evaluations per audio. The general Unweighted Average Recall (UAR) achieved by the evaluators was 66%, which is comparable to previous literature in the field.

The repository includes the following additional resources:
1. **Demographic information**: Three .csv files describing the demographics of the actors and evaluators, as well as the emotions they expressed and recognized for each audio sample.
2. **Data splits**: A speaker-independent train-dev-test split, stratified by emotion, gender, and age.

If you use this dataset, please cite the following paper:

F. Catania, J. W. Wilke and F. Garzotto,
"Emozionalmente: A Crowdsourced Corpus of Simulated Emotional Speech in Italian,"
IEEE Transactions on Audio, Speech and Language Processing, vol. 33, pp. 1142–1155, 2025.
doi: 10.1109/TASLPRO.2025.3540662

Files

emozionalmente.zip

Files (558.9 MB)

Name Size Download all
md5:398c6502dd2b0e1e810f927c12897547
558.9 MB Preview Download