The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)

Livingstone, Steven R.; Russo, Frank A.

  "abstract": "<p><strong>Contact Information</strong></p>\n\n<p>If you experience any issues downloading the RAVDESS, or if would like further information about the database, please contact us at <a href=\"\"></a>.&nbsp;</p>\n\n<p><strong>Construction and Validation</strong></p>\n\n<p>Construction and validation of the RAVDESS is described in our paper: Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391.&nbsp;<a href=\"\"></a>.&nbsp;</p>\n\n<p>Our Open Access paper is made freely available&nbsp;and can be downloaded without restriction from <a href=\"\">PLoS ONE</a>.</p>\n\n<p>The RAVDESS contains 7356 files. Each file&nbsp;was rated 10 times on emotional validity, intensity, and genuineness. Ratings were provided by 247 individuals who were characteristic of untrained adult research participants from North America. A further set of 72 participants provided test-retest data. High levels of emotional validity, interrater reliability,&nbsp;and test-retest intrarater reliability were reported. Validation data is open-access, and can be downloaded along with our paper from <a href=\"\">PLOS ONE</a>.</p>\n\n<p><strong>Description</strong></p>\n\n<p>This dataset contains the complete set of 7356 RAVDESS files (total size: 24.8 GB). Each of the 24 actors consists of three modality formats: Audio-only&nbsp;(16bit, 48kHz .wav), Audio-Video (720p H.264, AAC 48kHz, .mp4), and Video-only (no sound).&nbsp;&nbsp;Note, there are no song files for Actor_18.</p>\n\n<p><em>Audio-only&nbsp;files</em></p>\n\n<p>Audio-only files of all actors (01-24) are available as two separate zip files (~200 MB each):</p>\n\n<ul>\n\t<li>Speech file (, 215 MB) contains 1440 files: 60 trials per actor x 24 actors = 1440.&nbsp;</li>\n\t<li>Song file (, 198 MB) contains 1012 files: 44 trials per actor x 23 actors = 1012.</li>\n</ul>\n\n<p><em>Audio-Visual and Video-only files</em></p>\n\n<p>Video files are provided as separate zip downloads for each actor (01-24, ~500 MB each), and are split into separate speech and song downloads:</p>\n\n<ul>\n\t<li>Speech files ( to collectively contains 2880 files: 60 trials per actor x 2 modalities (AV, VO) x&nbsp;24 actors&nbsp;= 2880.</li>\n\t<li>Song files ( to collectively contains 2024 files: 44 trials per actor x 2 modalities (AV, VO) x&nbsp;23 actors&nbsp;= 2024.</li>\n</ul>\n\n<p><em>File Summary</em></p>\n\n<p>In total, the RAVDESS collection includes 7356 files (2880+2024+1440+1012 files).</p>\n\n<p><strong>License information</strong></p>\n\n<p>The RAVDESS is released under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License,&nbsp;<a href=\"\">CC BY-NA-SC 4.0</a>&nbsp;</p>\n\n<p><strong>How to cite the RAVDESS</strong></p>\n\n<p><em>Academic citation&nbsp;</em><br>\nIf you use the RAVDESS in an academic publication, please use the following citation:&nbsp;</p>\n\n<p>Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. <a href=\"\"></a>.<br>\n<br>\n<em>All other attributions&nbsp;</em><br>\nIf you use the RAVDESS in a form other than an academic publication, such as in a blog post, school project, or non-commercial product, please use the following attribution: &quot;<a href=\"\">The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)</a>&quot; by Livingstone &amp; Russo is licensed under&nbsp;<a href=\"\">CC BY-NA-SC 4.0</a>.</p>\n\n<p><strong>File naming convention</strong></p>\n\n<p>Each of the 7356 RAVDESS files has a unique filename. The filename consists of a 7-part numerical identifier (e.g., 02-01-06-01-02-01-12.mp4). These identifiers define the stimulus characteristics:&nbsp;<br>\n<br>\n<em>Filename identifiers&nbsp;</em></p>\n\n<ul>\n\t<li>Modality (01 = full-AV, 02 = video-only, 03 = audio-only).</li>\n\t<li>Vocal channel (01 = speech, 02 = song).</li>\n\t<li>Emotion (01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, 08 = surprised).</li>\n\t<li>Emotional intensity (01 = normal, 02 = strong). NOTE: There is no strong intensity for the &#39;neutral&#39; emotion.</li>\n\t<li>Statement (01 = &quot;Kids are talking by the door&quot;, 02 = &quot;Dogs are sitting by the door&quot;).</li>\n\t<li>Repetition (01 = 1st repetition, 02 = 2nd repetition).</li>\n\t<li>Actor (01 to 24. Odd numbered actors are male, even numbered actors are female).</li>\n</ul>\n\n<p><br>\n<em>Filename example: 02-01-06-01-02-01-12.mp4&nbsp;</em></p>\n\n<ol>\n\t<li>Video-only (02)</li>\n\t<li>Speech (01)</li>\n\t<li>Fearful (06)</li>\n\t<li>Normal intensity (01)</li>\n\t<li>Statement &quot;dogs&quot; (02)</li>\n\t<li>1st Repetition (01)</li>\n\t<li>12th Actor (12)</li>\n\t<li>Female, as the actor ID number is even.</li>\n</ol>", 
