Dataset Open Access

The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)

Livingstone, Steven R.; Russo, Frank A.

DataCite XML Export

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="" xmlns="" xsi:schemaLocation="">
  <identifier identifierType="DOI">10.5281/zenodo.1188976</identifier>
      <creatorName>Livingstone, Steven R.</creatorName>
      <givenName>Steven R.</givenName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="">0000-0002-6364-6410</nameIdentifier>
      <affiliation>University of Wisconsin, River Falls</affiliation>
      <creatorName>Russo, Frank A.</creatorName>
      <givenName>Frank A.</givenName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="">0000-0002-2939-6358</nameIdentifier>
      <affiliation>Ryerson University</affiliation>
    <title>The Ryerson Audio-Visual Database Of Emotional Speech And Song (Ravdess)</title>
    <subject>emotion expression</subject>
    <subject>emotion perception</subject>
    <subject>emotion database</subject>
    <subject>facial expressions</subject>
    <subject>vocal expressions</subject>
    <subject>stimulus validation</subject>
    <subject>multimodal communication</subject>
    <subject>emotion classification</subject>
    <date dateType="Issued">2018-04-05</date>
  <resourceType resourceTypeGeneral="Dataset"/>
    <alternateIdentifier alternateIdentifierType="url"></alternateIdentifier>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsCitedBy">10.1371/journal.pone.0196391</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsVersionOf">10.5281/zenodo.1188975</relatedIdentifier>
    <rights rightsURI="">Creative Commons Attribution-NonCommercial-ShareAlike</rights>
    <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
    <description descriptionType="Abstract">&lt;p&gt;&lt;strong&gt;Contact Information&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you experience any issues downloading the RAVDESS, or if would like further information about the database, please contact us at &lt;a href=""&gt;;/a&gt;.&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Construction and Validation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Construction and validation of the RAVDESS is described in our paper: Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391.&amp;nbsp;&lt;a href=""&gt;;/a&gt;.&amp;nbsp;&lt;/p&gt;

&lt;p&gt;Our Open Access paper is made freely available&amp;nbsp;and can be downloaded without restriction from &lt;a href=""&gt;PLoS ONE&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The RAVDESS contains 7356 files. Each file&amp;nbsp;was rated 10 times on emotional validity, intensity, and genuineness. Ratings were provided by 247 individuals who were characteristic of untrained adult research participants from North America. A further set of 72 participants provided test-retest data. High levels of emotional validity, interrater reliability,&amp;nbsp;and test-retest intrarater reliability were reported. Validation data is open-access, and can be downloaded along with our paper from &lt;a href=""&gt;PLOS ONE&lt;/a&gt;.&lt;/p&gt;


&lt;p&gt;This dataset contains the complete set of 7356 RAVDESS files (total size: 24.8 GB). Each of the 24 actors consists of three modality formats: Audio-only&amp;nbsp;(16bit, 48kHz .wav), Audio-Video (720p H.264, AAC 48kHz, .mp4), and Video-only (no sound).&amp;nbsp;&amp;nbsp;Note, there are no song files for Actor_18.&lt;/p&gt;


&lt;p&gt;Audio-only files of all actors (01-24) are available as two separate zip files (~200 MB each):&lt;/p&gt;

	&lt;li&gt;Speech file (, 215 MB) contains 1440 files: 60 trials per actor x 24 actors = 1440.&amp;nbsp;&lt;/li&gt;
	&lt;li&gt;Song file (, 198 MB) contains 1012 files: 44 trials per actor x 23 actors = 1012.&lt;/li&gt;

&lt;p&gt;&lt;em&gt;Audio-Visual and Video-only files&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Video files are provided as separate zip downloads for each actor (01-24, ~500 MB each), and are split into separate speech and song downloads:&lt;/p&gt;

	&lt;li&gt;Speech files ( to collectively contains 2880 files: 60 trials per actor x 2 modalities (AV, VO) x&amp;nbsp;24 actors&amp;nbsp;= 2880.&lt;/li&gt;
	&lt;li&gt;Song files ( to collectively contains 2024 files: 44 trials per actor x 2 modalities (AV, VO) x&amp;nbsp;23 actors&amp;nbsp;= 2024.&lt;/li&gt;

&lt;p&gt;&lt;em&gt;File Summary&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In total, the RAVDESS collection includes 7356 files (2880+2024+1440+1012 files).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;License information&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The RAVDESS is released under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License,&amp;nbsp;&lt;a href=""&gt;CC BY-NA-SC 4.0&lt;/a&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to cite the RAVDESS&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Academic citation&amp;nbsp;&lt;/em&gt;&lt;br&gt;
If you use the RAVDESS in an academic publication, please use the following citation:&amp;nbsp;&lt;/p&gt;

&lt;p&gt;Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. &lt;a href=""&gt;;/a&gt;.&lt;br&gt;
&lt;em&gt;All other attributions&amp;nbsp;&lt;/em&gt;&lt;br&gt;
If you use the RAVDESS in a form other than an academic publication, such as in a blog post, school project, or non-commercial product, please use the following attribution: &amp;quot;&lt;a href=""&gt;The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)&lt;/a&gt;&amp;quot; by Livingstone &amp;amp; Russo is licensed under&amp;nbsp;&lt;a href=""&gt;CC BY-NA-SC 4.0&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;File naming convention&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each of the 7356 RAVDESS files has a unique filename. The filename consists of a 7-part numerical identifier (e.g., 02-01-06-01-02-01-12.mp4). These identifiers define the stimulus characteristics:&amp;nbsp;&lt;br&gt;
&lt;em&gt;Filename identifiers&amp;nbsp;&lt;/em&gt;&lt;/p&gt;

	&lt;li&gt;Modality (01 = full-AV, 02 = video-only, 03 = audio-only).&lt;/li&gt;
	&lt;li&gt;Vocal channel (01 = speech, 02 = song).&lt;/li&gt;
	&lt;li&gt;Emotion (01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, 08 = surprised).&lt;/li&gt;
	&lt;li&gt;Emotional intensity (01 = normal, 02 = strong). NOTE: There is no strong intensity for the &amp;#39;neutral&amp;#39; emotion.&lt;/li&gt;
	&lt;li&gt;Statement (01 = &amp;quot;Kids are talking by the door&amp;quot;, 02 = &amp;quot;Dogs are sitting by the door&amp;quot;).&lt;/li&gt;
	&lt;li&gt;Repetition (01 = 1st repetition, 02 = 2nd repetition).&lt;/li&gt;
	&lt;li&gt;Actor (01 to 24. Odd numbered actors are male, even numbered actors are female).&lt;/li&gt;

&lt;em&gt;Filename example: 02-01-06-01-02-01-12.mp4&amp;nbsp;&lt;/em&gt;&lt;/p&gt;

	&lt;li&gt;Video-only (02)&lt;/li&gt;
	&lt;li&gt;Speech (01)&lt;/li&gt;
	&lt;li&gt;Fearful (06)&lt;/li&gt;
	&lt;li&gt;Normal intensity (01)&lt;/li&gt;
	&lt;li&gt;Statement &amp;quot;dogs&amp;quot; (02)&lt;/li&gt;
	&lt;li&gt;1st Repetition (01)&lt;/li&gt;
	&lt;li&gt;12th Actor (12)&lt;/li&gt;
	&lt;li&gt;Female, as the actor ID number is even.&lt;/li&gt;
    <description descriptionType="Other">Funding Information
Natural Sciences and Engineering Research Council of Canada: 2012-341583 
Hear the world research chair in music and emotional speech from Phonak</description>
    <description descriptionType="Other">{"references": ["Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391."]}</description>


Cite as