AVbook, a high-frame-rate corpus of narrative audiovisual speech for investigating multimodal speech perception

Enrico Varano; Tobias Reichenbach

doi:10.5281/zenodo.7387047

Published December 1, 2022 | Version 1

Video/Audio Open

AVbook, a high-frame-rate corpus of narrative audiovisual speech for investigating multimodal speech perception

1. Imperial College London
2. Friedrich-Alexander-University Erlangen-Nuremberg

Seeing a speaker's face can help substantially in understanding them, in particular in challenging listening conditions. Research into the neurobiological mechanisms behind the audiovisual integration has recently begun to employ continuous natural speech. However, these efforts are impeded by a lack of high-quality audiovisual recordings of a speaker narrating a longer text. Here we seek to close this gap by developing AVbook, an audiovisual speech corpus designed for cognitive neuroscience studies and audiovisual speech recognition. The corpus consists of 3.6 hours of audiovisual recordings of two speakers, one male and one female, reading 59 passages from a narrative English text. The recordings were acquired at a high frame rate of 119.88 frames per second. The corpus includes a sets of multiple-choice questions to test attention to the different passages. We verified the efficacy of these questions in a pilot study. A short written summary is also provided for each recording. To enable audiovisual synchronization when presenting the stimuli, four videos of an electronic clapperboard were recorded with the corpus. The corpus is available for download to support research into the neurobiology of audiovisual speech processing as well as the development of computer algorithms for audiovisual speech recognition.

Files

All unedited recordings.zip

Files (10.4 GB)

Name	Size
All unedited recordings.zip md5:53383f9ae2920bc08c4f9876269f8d87	1.3 kB	Preview Download
AVbook.zip md5:06554792e61bc1f48fc832fcc0ff8ca0	7.6 GB	Preview Download
Code.zip md5:34818919a178705ac9acb70b6fe7f1c8	2.4 kB	Preview Download
Electronic clapperboard.zip md5:0c31b04be5792c2ad654b3aa9a396818	777.7 MB	Preview Download
OpenFace2.0 output.zip md5:a7e3fc9cd2de8f77eb3354f0fe361bde	2.0 GB	Preview Download
Permissions.zip md5:a9a2fe00b11f26deed3a63647766ee2e	374.3 kB	Preview Download
QandA.csv md5:82708f8d3118facdb78bb6ebbf7f1a1c	52.1 kB	Preview Download
Script.zip md5:ad931ca3fba2d86e4f1e2d1ff0fef174	3.3 MB	Preview Download
Sync signal.zip md5:22032ffe0d971777bc932768d68fcf31	47.9 kB	Preview Download

	All versions	This version
Views	680	422
Downloads	1,055	650
Data volume	1.9 TB	1.3 TB

AVbook, a high-frame-rate corpus of narrative audiovisual speech for investigating multimodal speech perception

Authors/Creators

Description

Files

All unedited recordings.zip

Files (10.4 GB)