Emotion Detection and Voice-Emotion Conversions using Deep Learning
Description
Emotion, especially through speech, is a powerful tool humans possess that conveys much more information than any text can describe. Using artificial intelligence to tap into this can have a big positive impact on a variety of industries, including audio mining, customer service applications, security and forensics, and more. A growing field of research, spoken emotion recognition, has relied heavily on models that employ audio data to create effective classifiers. This paper resents convolutional neural network as a deep learning classification algorithm to classify 7 emotions ith an accuracy of 69.45% on the combined datasets of Savee, Ravdess and Tess. It proposes a new system to help replicate the emotions on a neutral audio (voice conversion). The production of the emotional audio is implemented using MelGAN, a special type of Generative Adversarial Network (GAN).
Files
Files
(739.3 kB)
Name | Size | Download all |
---|---|---|
md5:b82744f4a19683c2aeb4494d04ca3677
|
739.3 kB | Download |