Published March 11, 2024 | Version v2
Journal article Open

Emotion Detection and Voice-Emotion Conversions using Deep Learning

Description

Emotion, especially through speech, is a powerful tool humans possess that conveys much more information than any text can describe. Using artificial intelligence to tap into this can have a big    positive impact on a variety of industries, including audio mining, customer service applications,    security and forensics, and more. A growing field of research, spoken emotion recognition, has  relied heavily on models that employ audio data to create effective classifiers. This paper resents convolutional neural network as a deep learning classification algorithm to classify 7 emotions ith an accuracy of 69.45% on the combined datasets of Savee, Ravdess and Tess. It proposes a new system to help replicate the emotions on a neutral audio (voice conversion). The production of the emotional audio is implemented using MelGAN, a special type of Generative Adversarial Network (GAN).

Files

Files (739.3 kB)

Name Size Download all
md5:b82744f4a19683c2aeb4494d04ca3677
739.3 kB Download