Model for Dimensional Speech Emotion Recognition based on Wav2vec 2.0
Creators
- 1. audEERING GmbH
- 2. EIHW, University of Augsburg
Description
The model expects a raw audio signal as input and outputs predictions for arousal, dominance and valence in a range of approximately 0...1. In addition, it also provides the pooled states of the last transformer layer. The model was created by fine-tuning a pre-trained wav2vec 2.0 model on MSP-Podcast (v1.7). As foundation we use wav2vec2-large-robust released by Facebook under Apache.2.0, which we pruned from 24 to 12 transformer layers before fine-tuning. The model was afterwards exported to ONNX format. Further details are given in the associated paper. For an introduction how to use the model, please visit our tutorial project. The original [Torch](https://pytorch.org/docs/stable/torch.html) model is hosted on Hugging Face.
Files
w2v2-L-robust-12.6bc4a7fd-1.1.0.zip
Files
(609.9 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:76fef8c090addd7fcee60c64e9536ced
|
609.9 MB | Preview Download |