Model for Dimensional Speech Emotion Recognition based on Wav2vec 2.0

Wagner, Johannes; Triantafyllopoulos, Andreas; Wierstorf, Hagen; Schmitt, Maximilian; Burkhardt, Felix; Eyben, Florian; Schuller, Björn W.

doi:10.5281/zenodo.6221127

Published February 22, 2022 | Version 1.1.0

Other Open

Model for Dimensional Speech Emotion Recognition based on Wav2vec 2.0

1. audEERING GmbH
2. EIHW, University of Augsburg

The model expects a raw audio signal as input and outputs predictions for arousal, dominance and valence in a range of approximately 0...1. In addition, it also provides the pooled states of the last transformer layer. The model was created by fine-tuning a pre-trained wav2vec 2.0 model on MSP-Podcast (v1.7). As foundation we use wav2vec2-large-robust released by Facebook under Apache.2.0, which we pruned from 24 to 12 transformer layers before fine-tuning. The model was afterwards exported to ONNX format. Further details are given in the associated paper. For an introduction how to use the model, please visit our tutorial project. The original [Torch](https://pytorch.org/docs/stable/torch.html) model is hosted on Hugging Face.

Files

w2v2-L-robust-12.6bc4a7fd-1.1.0.zip

Files (609.9 MB)

Name	Size	Download all
w2v2-L-robust-12.6bc4a7fd-1.1.0.zip md5:76fef8c090addd7fcee60c64e9536ced	609.9 MB	Preview Download

	All versions	This version
Views	5,840	5,768
Downloads	14,072	13,942
Data volume	14.7 TB	14.6 TB

Model for Dimensional Speech Emotion Recognition based on Wav2vec 2.0

Creators

Description

Files

w2v2-L-robust-12.6bc4a7fd-1.1.0.zip

Files (609.9 MB)