Audio,Speech and Vision Processing Lab Emotional Sound database (ASVP-ESD)
- 1. School of Electronic and Information Engineering, South China University of Technology, Guangzhou, 510640, China
Description
The Audio, Speech, and Vision Processing Lab Emotional Sound database (ASVP-ESD)
Dejoli Tientcheu Touko Landry; Qianhua He; Wei Xie
Citing the ASVP-ESD
The ASVP-ESD emotional sound database is released by Audio, Speech, and Vision Processing Lab(http://www.speech-led.com/main.htm, from the South China University of Technology), so please cite the ASVP-ESD: A dataset and its benchmark for emotion recognition using both speech and non-speech utterances( papers which is a study conducted using the first batch of the collected database) if it is used in your work in any form. Personal works, such as machine learning projects or posts, should provide a URL to this page, through a reference.
Contact Information
If you would like further information about the ASVP-ESD, when facing any issues downloading files, please contact us at 201722800077@mail.scut.edu.cn, 1197581424@qq.com
Data labeling process
The first version dataset labeling (containing 63+2 folders) was done by 5 different annotators through a tagging application specially designed for audio tagging the latest added folders were done by 3 other annotators. After listening to each audio the judge should choose the corresponding label according to personal feeling; Then after the tagging part, a simple voting algorithm was built for voting and upgrading the corresponding audio to the class having the most number of votes.
Construction and Validation
The ASVP-ESD contains 12625 audio files(with additional 1204 files for babies' voices in 3 folders range in bonus directory) in the Audio directory; It is an emotional-based database, containing speech and non-speech emotional sound; The audio was recorded and collected from movies, tv shows, youtube channels, and others website. Compared to other public available emotional databases, ASVP-ESD is more realistic and non-scripted with no language restriction.
Description
The Audio, Speech, and Vision Processing Lab Emotional Sound database(ASVP-ESD) contains audio files regrouped in 130 folders; The data are organized as follows: Meanwhile some are mixed, odd folder number are mainly for females, and even for males (total size: 2 GB). As it's a realistic dataset some folders contain dialog or several people interacting in the audio; Speech and non-speech Emotional sounds include boredom(sigh,yawn), neutral, happiness (laugh, gaggle), sadness(cry), anger, fear (scream, panic), surprise(amazed,gasp), disgust(contempt), excite(Triumph,elation), pleasure(desire), pain(groan), disappointment; A total of 12 different emotions plus breath. 2 levels of intensity were used for the database (normal and high). Audio is available in 16k, 1 channel, .wav format, the average length of the file is between 0.5 to 20 seconds, for a total of more than 11 hours. Note, there are 3 additional folders (acteur_200,acteur_150,acteur_50) that contains only babies audio(laugh, cry); Audio-only files are regrouped as:
Actor_00 and Actor_50 are composed of mixed audio samples from movies and website sounds.
From actor_01 to actor_19 and actor_31 to acteur_38 are different actors from 3 different movie sounds.
Actor_100(actor_100 are crowd or many people voices) to actor_102 are from the same website ,same for actor_103 to actor_106.
Actor_120 just as Actor_00 also have both Gender interating in the same audio.
From actor_21 to actor_29 and actor_39 to actor_121; are only for sounds randomly collected on the online platform such as : https://www.epidemicsound.com
https://sfx.productioncrate.com
https://elements.envato.com/sound-effects/
https://www.aigei.com/view/121628-45344256.html
https://mixkit.co/free-sound-effects/human/?page=3
https://www.zapsplat.com/sound-effect-category/
http://freesoundeffect.net/tags/sigh?page=7
and others
File naming convention
Each of the audio files has a unique filename. The filename consists of numerical identifiers (e.g., 02-01-06-01-02-105-02-01-02.wav, for speech 02-01-06-01-02-105-02-01-02-01-03.wav, for non-speech) these identifiers define the stimulus characteristic.
Filename identifiers:
Modality ( 03 = audio-only).
Vocal channel (01 = speech, 02 = non speech).
Emotion ( 01 = boredom,sigh| 02 = neutral,calm| 03 = happy, laugh,gaggle|04 = sad,cry | 05 = angry,grunt,frustration|06 = fearful,scream,panic| 07 = disgust, dislike,contempt|08 = surprised,gasp,amazed| 09 = excited| 10 = pleasure, 11 = pain,groan| 12 = disappointmen,disapproval| 13=breath).
Emotional intensity (01 = normal, 02 = high).
Statement (as it’s non scripted this help to refer approximately to data collected from the same period or source base on their rank ).
Actor ( even numbered acteurs are male, odd numbered actors are female).
Age(01 = above 65, 02 = between 20~64, 03 = under 20,04= baby).
Source of downloading (01 -02 = =website ,youtube channel| 03= movies).
Language(01=Chinese , 02=English ,04 = French , others:Russian and Others ).
Filename example: 03-01-06-01-02-12-02-01-01-16.wav:
1.audio-only (03)
2.Speech (01)
3.Fearful (06)
4.Normal intensity (01)
5.Statement (02)
6.12th Actorr (12) folder 12 male as its even
7.Age(02)
8.Source(01)
9.language(01)
10. similarity with others emotion/sound (16)
All audio file with 77 at the end means files with a high noise environment.
audio with 66 at the end means mixed voices(there is a limited number of free download on online platform, downloading more will come with mixed voice that can affect the sound)
for non-speech data:
Happyness is a collection of (laugh=13,gaggle=23,others=33)
sadness is a collection of (cry=14, sigh=24,sniffle=34,suffering=44)
fear is a collection of (scream 16,panic=36)
angry (rage=15,frustration=25 ,other=35,grunt)
surprise (surprised=18, amazed=28 ,astonishment=38,others=48)
disgust(disgust=17, rejection=27)
pain(moaned)
boredom(sigh)
For any suggestions please don't hesitate just send us an email.
We wouldn't be here without the help of other lab mates and annotators. below is link to download the materials used such as the annotation Exe App, and the voting code to allow each person interested to make his personal annotation or tagging. For audio emotion analysis, and classification the audio file is the corresponding file to be used.
voting : https://github.com/landryroni/emotions-voting
tagging : https://github.com/landryroni/emotion_tagging
For any suggestion please don't hesitate just send us an email.
Files
ASVP-ESD-Update.zip
Files
(1.7 GB)
Name | Size | Download all |
---|---|---|
md5:01837d6e865415b8312bc1b9817a5fcb
|
1.7 GB | Preview Download |