Dataset Open Access

The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)

Livingstone, Steven R.; Russo, Frank A.


DataCite XML Export

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-4" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4.1/metadata.xsd">
  <identifier identifierType="DOI">10.5281/zenodo.1188976</identifier>
  <creators>
    <creator>
      <creatorName>Livingstone, Steven R.</creatorName>
      <givenName>Steven R.</givenName>
      <familyName>Livingstone</familyName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0002-6364-6410</nameIdentifier>
      <affiliation>University of Wisconsin, River Falls</affiliation>
    </creator>
    <creator>
      <creatorName>Russo, Frank A.</creatorName>
      <givenName>Frank A.</givenName>
      <familyName>Russo</familyName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0002-2939-6358</nameIdentifier>
      <affiliation>Ryerson University</affiliation>
    </creator>
  </creators>
  <titles>
    <title>The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)</title>
  </titles>
  <publisher>Zenodo</publisher>
  <publicationYear>2018</publicationYear>
  <subjects>
    <subject>emotion</subject>
    <subject>emotion expression</subject>
    <subject>emotion perception</subject>
    <subject>emotion database</subject>
    <subject>facial expressions</subject>
    <subject>vocal expressions</subject>
    <subject>stimulus validation</subject>
    <subject>face</subject>
    <subject>voice</subject>
    <subject>multimodal communication</subject>
    <subject>RAVDESS</subject>
    <subject>emotion classification</subject>
  </subjects>
  <dates>
    <date dateType="Issued">2018-04-05</date>
  </dates>
  <resourceType resourceTypeGeneral="Dataset"/>
  <alternateIdentifiers>
    <alternateIdentifier alternateIdentifierType="url">https://zenodo.org/record/1188976</alternateIdentifier>
  </alternateIdentifiers>
  <relatedIdentifiers>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsCitedBy">10.1371/journal.pone.0196391</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsReferencedBy">10.5281/zenodo.3255102</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsVersionOf">10.5281/zenodo.1188975</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsPartOf">https://zenodo.org/communities/ravdess</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsPartOf">https://zenodo.org/communities/zenodo</relatedIdentifier>
  </relatedIdentifiers>
  <version>1.0.0</version>
  <rightsList>
    <rights rightsURI="http://creativecommons.org/licenses/by-nc-sa/4.0/legalcode">Creative Commons Attribution Non Commercial Share Alike 4.0 International</rights>
    <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
  </rightsList>
  <descriptions>
    <description descriptionType="Abstract">&lt;p&gt;&lt;strong&gt;Citing the RAVDESS&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The RAVDESS is released under a Creative Commons Attribution license, so please cite the RAVDESS if it is used in your work in any form.&amp;nbsp; Published academic papers should use the academic paper citation for our PLoS1 paper.&amp;nbsp; Personal works, such as machine learning projects/blog posts, should provide a URL to this Zenodo page, though a reference to our PLoS1 paper would also be appreciated.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Academic paper citation&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. &lt;a href="https://doi.org/10.1371/journal.pone.0196391"&gt;https://doi.org/10.1371/journal.pone.0196391&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;em&gt;Personal use citation&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Include a link to this Zenodo page - &lt;a href="https://zenodo.org/record/1188976"&gt;https://zenodo.org/record/1188976&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Commercial Licenses&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Commercial licenses for the RAVDESS can be purchased.&amp;nbsp; If you are interested in using the RAVDESS in a commercial setting, please contact us at &lt;a href="mailto:ravdess@gmail.com?subject=RAVDESS%20Commercial%20License"&gt;ravdess@gmail.com&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Contact Information&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you would like further information about the RAVDESS, to purchase a commercial license, or if you experience any issues downloading files, please contact us at &lt;a href="mailto:ravdess@gmail.com?subject=RAVDESS%20feedback%20from%20Zenodo"&gt;ravdess@gmail.com&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example Videos&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Watch a sample of the RAVDESS &lt;a href="https://www.youtube.com/watch?v=Y7OQoNEu3dY"&gt;speech&lt;/a&gt; and &lt;a href="https://www.youtube.com/watch?v=XQkmH4oYZkg"&gt;song&lt;/a&gt; videos.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Emotion Classification Users&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you&amp;#39;re interested in using machine learning to classify emotional expressions with the RAVDESS, please see our new RAVDESS Facial Landmark Tracking data set [&lt;a href="https://zenodo.org/record/3255102"&gt;Zenodo project page&lt;/a&gt;].&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Construction and Validation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Full details on the construction and perceptual validation of the RAVDESS are described in our PLoS ONE paper - &lt;a href="https://doi.org/10.1371/journal.pone.0196391"&gt;https://doi.org/10.1371/journal.pone.0196391&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The RAVDESS contains 7356 files. Each file&amp;nbsp;was rated 10 times on emotional validity, intensity, and genuineness. Ratings were provided by 247 individuals who were characteristic of untrained adult research participants from North America. A further set of 72 participants provided test-retest data. High levels of emotional validity, interrater reliability,&amp;nbsp;and test-retest intrarater reliability were reported. Validation data is open-access, and can be downloaded along with our paper from &lt;a href="http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0196391"&gt;PLoS ONE&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Description&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) contains 7356 files (total size: 24.8 GB). The database contains 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity (normal, strong), with an additional neutral expression. All conditions are available in three modality formats: Audio-only&amp;nbsp;(16bit, 48kHz .wav), Audio-Video (720p H.264, AAC 48kHz, .mp4), and Video-only (no sound).&amp;nbsp;&amp;nbsp;Note, there are no song files for Actor_18.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Audio-only&amp;nbsp;files&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Audio-only files of all actors (01-24) are available as two separate zip files (~200 MB each):&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;Speech file (Audio_Speech_Actors_01-24.zip, 215 MB) contains 1440 files: 60 trials per actor x 24 actors = 1440.&amp;nbsp;&lt;/li&gt;
	&lt;li&gt;Song file (Audio_Song_Actors_01-24.zip, 198 MB) contains 1012 files: 44 trials per actor x 23 actors = 1012.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Audio-Visual and Video-only files&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Video files are provided as separate zip downloads for each actor (01-24, ~500 MB each), and are split into separate speech and song downloads:&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;Speech files (Video_Speech_Actor_01.zip to Video_Speech_Actor_24.zip) collectively contains 2880 files: 60 trials per actor x 2 modalities (AV, VO) x&amp;nbsp;24 actors&amp;nbsp;= 2880.&lt;/li&gt;
	&lt;li&gt;Song files (Video_Song_Actor_01.zip to Video_Song_Actor_24.zip) collectively contains 2024 files: 44 trials per actor x 2 modalities (AV, VO) x&amp;nbsp;23 actors&amp;nbsp;= 2024.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;File Summary&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In total, the RAVDESS collection includes 7356 files (2880+2024+1440+1012 files).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;File naming convention&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each of the 7356 RAVDESS files has a unique filename. The filename consists of a 7-part numerical identifier (e.g., 02-01-06-01-02-01-12.mp4). These identifiers define the stimulus characteristics:&amp;nbsp;&lt;br&gt;
&lt;br&gt;
&lt;em&gt;Filename identifiers&amp;nbsp;&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;Modality (01 = full-AV, 02 = video-only, 03 = audio-only).&lt;/li&gt;
	&lt;li&gt;Vocal channel (01 = speech, 02 = song).&lt;/li&gt;
	&lt;li&gt;Emotion (01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, 08 = surprised).&lt;/li&gt;
	&lt;li&gt;Emotional intensity (01 = normal, 02 = strong). NOTE: There is no strong intensity for the &amp;#39;neutral&amp;#39; emotion.&lt;/li&gt;
	&lt;li&gt;Statement (01 = &amp;quot;Kids are talking by the door&amp;quot;, 02 = &amp;quot;Dogs are sitting by the door&amp;quot;).&lt;/li&gt;
	&lt;li&gt;Repetition (01 = 1st repetition, 02 = 2nd repetition).&lt;/li&gt;
	&lt;li&gt;Actor (01 to 24. Odd numbered actors are male, even numbered actors are female).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;br&gt;
&lt;em&gt;Filename example: 02-01-06-01-02-01-12.mp4&amp;nbsp;&lt;/em&gt;&lt;/p&gt;

&lt;ol&gt;
	&lt;li&gt;Video-only (02)&lt;/li&gt;
	&lt;li&gt;Speech (01)&lt;/li&gt;
	&lt;li&gt;Fearful (06)&lt;/li&gt;
	&lt;li&gt;Normal intensity (01)&lt;/li&gt;
	&lt;li&gt;Statement &amp;quot;dogs&amp;quot; (02)&lt;/li&gt;
	&lt;li&gt;1st Repetition (01)&lt;/li&gt;
	&lt;li&gt;12th Actor (12)&lt;/li&gt;
	&lt;li&gt;Female, as the actor ID number is even.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;License information&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The RAVDESS is released under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License,&amp;nbsp;&lt;a href="https://creativecommons.org/licenses/by-nc-sa/4.0/"&gt;CC BY-NC-SA 4.0&lt;/a&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;Commercial licenses for the RAVDESS can also be purchased.&amp;nbsp; For more information, contact us at &lt;a href="mailto:ravdess@gmail.com?subject=RAVDESS%20Commercial%20License"&gt;ravdess@gmail.com&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Related Data sets&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;RAVDESS Facial Landmark Tracking data set [&lt;a href="https://zenodo.org/record/3255102"&gt;Zenodo project page&lt;/a&gt;].&lt;/li&gt;
&lt;/ul&gt;</description>
    <description descriptionType="Other">Funding Information
Natural Sciences and Engineering Research Council of Canada: 2012-341583 
Hear the world research chair in music and emotional speech from Phonak</description>
    <description descriptionType="Other">{"references": ["Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. https://doi.org/10.1371/journal.pone.0196391"]}</description>
  </descriptions>
</resource>
50,706
180,793
views
downloads
All versions This version
Views 50,70650,803
Downloads 180,793180,794
Data volume 90.6 TB90.6 TB
Unique views 44,84244,936
Unique downloads 14,89714,898

Share

Cite as