Enhanced RAVDESS Speech Dataset

Morrison, Max; Jin, Zeyu; Bryan, Nicholas J.; Caceres, Juan-Pablo; Pardo, Bryan

doi:10.5281/zenodo.4783521

Published May 24, 2021 | Version 1.0

Dataset Open

Enhanced RAVDESS Speech Dataset

1. Northwestern University
2. Adobe Research

This is a modified version of the speech audio contained within the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset. The original dataset can be found here. The unmodified version of just the speech audio used as source material for this dataset can be found here. This dataset performs speech enhancement and bandwidth extension on the original speech using HiFi-GAN. HiFi-GAN produces high-quality speech at 48 kHz that contains significantly less noise and reverb relative to the original recordings.

If you use this work as part of an academic publication, please cite the papers corresponding to both the original dataset as well as HiFi-GAN:

Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. https://doi.org/10.1371/journal.pone.0196391.

Su, Jiaqi, Zeyu Jin, and Adam Finkelstein. "HiFi-GAN: High-fidelity denoising and dereverberation based on speech deep features in adversarial networks." Proc. Interspeech. October 2020.

Note that there are two recent papers with the name "HiFi-GAN". Please be sure to cite the correct paper as listed here.

Files

Files (193.7 MB)

Name	Size	Download all
ravdess-hifi.tar.gz md5:7ec457ec1b8b51a9d16ddd545eb976af	193.7 MB	Download

Additional details

Is derived from: Dataset: 10.5281/zenodo.1188976 (DOI)

	All versions	This version
Views	1,194	1,189
Downloads	210	210
Data volume	48.8 GB	48.8 GB

Enhanced RAVDESS Speech Dataset

Creators

Description

Files

Files (193.7 MB)

Additional details

Related works