Dataset Open Access

The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)

Livingstone, Steven R.; Russo, Frank A.


DataCite XML Export

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-4" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4.1/metadata.xsd">
  <identifier identifierType="DOI">10.5281/zenodo.1188976</identifier>
  <creators>
    <creator>
      <creatorName>Livingstone, Steven R.</creatorName>
      <givenName>Steven R.</givenName>
      <familyName>Livingstone</familyName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0002-6364-6410</nameIdentifier>
      <affiliation>University of Wisconsin, River Falls</affiliation>
    </creator>
    <creator>
      <creatorName>Russo, Frank A.</creatorName>
      <givenName>Frank A.</givenName>
      <familyName>Russo</familyName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0002-2939-6358</nameIdentifier>
      <affiliation>Ryerson University</affiliation>
    </creator>
  </creators>
  <titles>
    <title>The Ryerson Audio-Visual Database Of Emotional Speech And Song (Ravdess)</title>
  </titles>
  <publisher>Zenodo</publisher>
  <publicationYear>2018</publicationYear>
  <subjects>
    <subject>emotion</subject>
    <subject>emotion expression</subject>
    <subject>emotion perception</subject>
    <subject>emotion database</subject>
    <subject>facial expressions</subject>
    <subject>vocal expressions</subject>
    <subject>stimulus validation</subject>
    <subject>face</subject>
    <subject>voice</subject>
    <subject>multimodal communication</subject>
    <subject>RAVDESS</subject>
    <subject>emotion classification</subject>
  </subjects>
  <dates>
    <date dateType="Issued">2018-04-05</date>
  </dates>
  <resourceType resourceTypeGeneral="Dataset"/>
  <alternateIdentifiers>
    <alternateIdentifier alternateIdentifierType="url">https://zenodo.org/record/1188976</alternateIdentifier>
  </alternateIdentifiers>
  <relatedIdentifiers>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsCitedBy">10.1371/journal.pone.0196391</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsVersionOf">10.5281/zenodo.1188975</relatedIdentifier>
  </relatedIdentifiers>
  <version>1.0.0</version>
  <rightsList>
    <rights rightsURI="https://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike</rights>
    <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
  </rightsList>
  <descriptions>
    <description descriptionType="Abstract">&lt;p&gt;&lt;strong&gt;Contact Information&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you experience any issues downloading the RAVDESS, or if would like further information about the database, please contact us at &lt;a href="mailto:ravdess@gmail.com?subject=RAVDESS%20feedback%20from%20Zenodo"&gt;ravdess@gmail.com&lt;/a&gt;.&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Construction and Validation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Construction and validation of the RAVDESS is described in our paper: Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391.&amp;nbsp;&lt;a href="https://doi.org/10.1371/journal.pone.0196391"&gt;https://doi.org/10.1371/journal.pone.0196391&lt;/a&gt;.&amp;nbsp;&lt;/p&gt;

&lt;p&gt;Our Open Access paper is made freely available&amp;nbsp;and can be downloaded without restriction from &lt;a href="http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0196391"&gt;PLoS ONE&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The RAVDESS contains 7356 files. Each file&amp;nbsp;was rated 10 times on emotional validity, intensity, and genuineness. Ratings were provided by 247 individuals who were characteristic of untrained adult research participants from North America. A further set of 72 participants provided test-retest data. High levels of emotional validity, interrater reliability,&amp;nbsp;and test-retest intrarater reliability were reported. Validation data is open-access, and can be downloaded along with our paper from &lt;a href="http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0196391"&gt;PLOS ONE&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Description&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This dataset contains the complete set of 7356 RAVDESS files (total size: 24.8 GB). Each of the 24 actors consists of three modality formats: Audio-only&amp;nbsp;(16bit, 48kHz .wav), Audio-Video (720p H.264, AAC 48kHz, .mp4), and Video-only (no sound).&amp;nbsp;&amp;nbsp;Note, there are no song files for Actor_18.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Audio-only&amp;nbsp;files&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Audio-only files of all actors (01-24) are available as two separate zip files (~200 MB each):&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;Speech file (Audio_Speech_Actors_01-24.zip, 215 MB) contains 1440 files: 60 trials per actor x 24 actors = 1440.&amp;nbsp;&lt;/li&gt;
	&lt;li&gt;Song file (Audio_Song_Actors_01-24.zip, 198 MB) contains 1012 files: 44 trials per actor x 23 actors = 1012.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Audio-Visual and Video-only files&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Video files are provided as separate zip downloads for each actor (01-24, ~500 MB each), and are split into separate speech and song downloads:&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;Speech files (Video_Speech_Actor_01.zip to Video_Speech_Actor_24.zip) collectively contains 2880 files: 60 trials per actor x 2 modalities (AV, VO) x&amp;nbsp;24 actors&amp;nbsp;= 2880.&lt;/li&gt;
	&lt;li&gt;Song files (Video_Song_Actor_01.zip to Video_Song_Actor_24.zip) collectively contains 2024 files: 44 trials per actor x 2 modalities (AV, VO) x&amp;nbsp;23 actors&amp;nbsp;= 2024.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;File Summary&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In total, the RAVDESS collection includes 7356 files (2880+2024+1440+1012 files).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;License information&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The RAVDESS is released under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License,&amp;nbsp;&lt;a href="https://creativecommons.org/licenses/by-nc-sa/4.0/"&gt;CC BY-NA-SC 4.0&lt;/a&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to cite the RAVDESS&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Academic citation&amp;nbsp;&lt;/em&gt;&lt;br&gt;
If you use the RAVDESS in an academic publication, please use the following citation:&amp;nbsp;&lt;/p&gt;

&lt;p&gt;Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. &lt;a href="https://doi.org/10.1371/journal.pone.0196391"&gt;https://doi.org/10.1371/journal.pone.0196391&lt;/a&gt;.&lt;br&gt;
&lt;br&gt;
&lt;em&gt;All other attributions&amp;nbsp;&lt;/em&gt;&lt;br&gt;
If you use the RAVDESS in a form other than an academic publication, such as in a blog post, school project, or non-commercial product, please use the following attribution: &amp;quot;&lt;a href="https://zenodo.org/record/1188976"&gt;The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)&lt;/a&gt;&amp;quot; by Livingstone &amp;amp; Russo is licensed under&amp;nbsp;&lt;a href="https://creativecommons.org/licenses/by-nc-sa/4.0/"&gt;CC BY-NA-SC 4.0&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;File naming convention&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each of the 7356 RAVDESS files has a unique filename. The filename consists of a 7-part numerical identifier (e.g., 02-01-06-01-02-01-12.mp4). These identifiers define the stimulus characteristics:&amp;nbsp;&lt;br&gt;
&lt;br&gt;
&lt;em&gt;Filename identifiers&amp;nbsp;&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;Modality (01 = full-AV, 02 = video-only, 03 = audio-only).&lt;/li&gt;
	&lt;li&gt;Vocal channel (01 = speech, 02 = song).&lt;/li&gt;
	&lt;li&gt;Emotion (01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, 08 = surprised).&lt;/li&gt;
	&lt;li&gt;Emotional intensity (01 = normal, 02 = strong). NOTE: There is no strong intensity for the &amp;#39;neutral&amp;#39; emotion.&lt;/li&gt;
	&lt;li&gt;Statement (01 = &amp;quot;Kids are talking by the door&amp;quot;, 02 = &amp;quot;Dogs are sitting by the door&amp;quot;).&lt;/li&gt;
	&lt;li&gt;Repetition (01 = 1st repetition, 02 = 2nd repetition).&lt;/li&gt;
	&lt;li&gt;Actor (01 to 24. Odd numbered actors are male, even numbered actors are female).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;br&gt;
&lt;em&gt;Filename example: 02-01-06-01-02-01-12.mp4&amp;nbsp;&lt;/em&gt;&lt;/p&gt;

&lt;ol&gt;
	&lt;li&gt;Video-only (02)&lt;/li&gt;
	&lt;li&gt;Speech (01)&lt;/li&gt;
	&lt;li&gt;Fearful (06)&lt;/li&gt;
	&lt;li&gt;Normal intensity (01)&lt;/li&gt;
	&lt;li&gt;Statement &amp;quot;dogs&amp;quot; (02)&lt;/li&gt;
	&lt;li&gt;1st Repetition (01)&lt;/li&gt;
	&lt;li&gt;12th Actor (12)&lt;/li&gt;
	&lt;li&gt;Female, as the actor ID number is even.&lt;/li&gt;
&lt;/ol&gt;</description>
    <description descriptionType="Other">Funding Information
Natural Sciences and Engineering Research Council of Canada: 2012-341583 
Hear the world research chair in music and emotional speech from Phonak</description>
    <description descriptionType="Other">{"references": ["Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. https://doi.org/10.1371/journal.pone.0196391"]}</description>
  </descriptions>
</resource>

Share

Cite as