Dataset Open Access
Swanson, Riley; Livingstone, Steven R.; Russo, Frank A.
If you would like further information about the RAVDESS Facial Landmark Tracking data set, or if you experience any issues downloading files, please contact us at firstname.lastname@example.org.
Watch a sample of the facial tracking results.
This data set contains tracked facial landmark movements from the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) [RAVDESS Zenodo page]. Motion tracking of actors' faces was produced by OpenFace 2.1.0 (Baltrusaitis, T., Zadeh, A., Lim, Y. C., & Morency, L. P., 2018). Tracked information includes: facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.
This data set contains tracking for all 2452 RAVDESS trials. All tracking movement data are contained in "FacialTracking_Actors_01-24.zip", which contains 2452 .CSV files. Each actor has 104 tracked trials (60 speech, 44 song). Note, there are no song files for Actor 18.
Total Tracked Files = (24 Actors x 60 Speech trials) + (23 Actors x 44 Song trials) = 2452 files.
Tracking results for each trial are provided as individual comma separated value files (CSV format). File naming convention of tracked files is identical to that of the RAVDESS. For example, tracked file "01-01-01-01-01-01-01.csv" corresponds to RAVDESS audio-video file "01-01-01-01-01-01-01.mp4". For a complete description of the RAVDESS file naming convention and experimental manipulations, please see the RAVDESS Zenodo page.
Tracking overlay videos for all trials are also provided (720p Xvid, .avi), one zip file per Actor. As the RAVDESS does not contain "ground truth" facial landmark locations, the overlay videos provide a visual 'sanity check' for researchers to confirm the general accuracy of the tracking results. The file naming convention of tracking overlay videos also matches that of the RAVDESS. For example, tracking video "01-01-01-01-01-01-01.avi" corresponds to RAVDESS audio-video file "01-01-01-01-01-01-01.mp4".
Tracking File Output Format
This data set retained OpenFace's data output format, described here in detail. The resolution of all input videos was 1280x720. When tracking output units are in pixels, their range of values is (0,0) (top left corner) to (1280,720) (bottom right corner).
Columns 1-3 = Timing and Detection Confidence
Columns 4-291 = Eye Gaze Detection
Columns 292-297 = Head pose
Columns 298-433 = Facial Landmarks locations in 2D
Columns 434-637 = Facial Landmarks locations in 3D
Columns 638-677 = Rigid and non-rigid shape parameters
Parameters of a point distribution model (PDM) that describe the rigid face shape (location, scale and rotation) and non-rigid face shape (deformation due to expression and identity). For more details, please refer to chapter 4.2 of my Tadas Baltrusaitis's PhD thesis [download link].
Columns 687-712 = Facial Action Units
Facial Action Units (AUs) are a way to describe human facial movements (Ekman, Friesen, and Hager, 2002) [wiki link]. More information on OpenFace's implementation of AUs can be found here.
Note, OpenFace's columns 2 and 5 (face_id and success, respectively) were not included in this data set. These values were redundant as a single face was detected in all frames, in all 2452 trials.
Tracking Overlay Videos
Tracking overlay videos visualize most aspects of the tracking output described above.
Camera Parameters and 3D Calibration Procedure
This data set contains accurate estimates of actors' 3D head poses. To produce these, camera parameters at the time of recording were required (distance from camera to actor, and camera field of view). These values were used with OpenCV's camera calibration procedure, described here, to produce estimates of the camera's focal length and optical center at the time of actor recordings. The four values produced by the calibration procedure (fx,fy,cx,cy) were input to OpenFace as command line arguments during facial tracking, described here, to produce accurate estimates of 3D head pose.
The use of OpenCV's calibration procedure was required as the video camera used in the RAVDESS recordings did not report focal length values. Unlike SLR cameras, most video cameras do not provide this information to the user due to their dynamic focus feature. For all RAVDESS recordings, camera distance, field of view, and focal point (manual fixed camera focus) were kept constant.
The RAVDESS Facial Landmark Tracking data set is released under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, CC BY-NA-SC 4.0.
How to cite the RAVDESS Facial Tracking data set
If you use the RAVDESS Facial Tracking data set in an academic publication, please cite both references:
All other attributions
If you use the RAVDESS in a form other than an academic publication, such as in a blog post, school project, or non-commercial product, please use the following attribution: "RAVDESS Facial Landmark Tracking" by Swanson, Livingstone, & Russo is licensed under CC BY-NA-SC 4.0.
Related Data sets
Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. https://doi.org/10.1371/journal.pone.0196391
|All versions||This version|
|Data volume||5.3 TB||5.3 TB|