Published June 26, 2023 | Version v1
Journal article Open

Speech Emotion Recognition for multiclass classification using Hybrid CNN-LSTM

  • 1. Birla Institute of Technology, Mesra, Ranchi, India

Description

Emotions are biological states of the human nervous system recorded in different signal forms that may be audio signals, electroencephalogram signals, etc. In this paper, cross-corpus emotion recognition is carried out on voice data.  Also, a hybrid CNN–LSTM (Convolution Neural Network–Long Short-Term Memory) model was proposed for recognizing gender-biased emotions. Three established corpora were considered, namely, SAVEE, RAVDESS and TESS. Three new corpora have been constructed by combining the above-mentioned corpora for cross-corpus implementation, referred to as mix corpus. Corpora formed were gender-specific (i.e., male and female) and gender independent. Seven different emotions (i.e., happiness, sadness, anger, fear, neutral, disgust and surprise) have been identified within all the corpora. Data augmentation has been applied to reduce over-fitting and increase the robustness of deep neural networks by adding noise and pitch features to the signals. Also, the Mel-Frequency Cepstral Coefficient (MFCC) method was used for extracting feature before applying the hybrid network to each database. The experiment results show that the female corpus gives better accuracy than the male corpus.

Files

IJMIT20230033.pdf

Files (568.5 kB)

Name Size Download all
md5:752d8458d09b3cfdeacb72bf23a66a70
568.5 kB Preview Download