Published April 22, 2024 | Version 1
Preprint Open

AVbook, a high-frame-rate corpus of narrative audiovisual speech for investigating multimodal speech perception

  • 1. Imperial College London, UK
  • 2. Friedrich-Alexander-University Erlangen-Nürnberg, Germany
  • 1. Lyon Neuroscience Research Center, CNRS UMR5292, Inserm U1028, Université Claude Bernard Lyon 1, Université Jean Monnet Saint-Étienne, Lyon, France
  • 2. ENTPE, Laboratoire Génie Civil et Bâtiment, Vaulx-en-Velin, France
  • 3. Starkey France, Créteil, France
  • 4. ENTPE, Laboratoire de Tribologie et Dynamique des Systèmes, Vaulx-en-Velin, France

Description

We present an audiovisual speech corpus that is designed for cognitive neuroscience studies and that can also be employed for research on audiovisual speech recognition. The corpus consists of 3.6 hours of audiovisual recordings of two speakers, one male and one female, reading passages from a narrative English text. The visual recordings were acquired at a high frame rate of 119.88 frames per second (fps) and exported at a high resolution of 528×718 pixels. The speech is pronounced with a neutral British accent and is directed at the camera. Both speakers read the same 59 passages of a book, for a total of 1h50' each. The passage scripts, largely contiguous within a non-fiction source book chosen for its compelling content, were selected and lightly edited to keep subjects who might listen to it interested and alert. As tools to test comprehension and attention, sets of four multiple-choice questions were written for each passage. A short written summary is also provided for each recording. To enable audiovisual synchronisation when presenting the stimuli, four videos of an electronic clapperboard were recorded in line with the corpus. Stimulus synchronisation of 0 ±4 ms was achieved by pairing these with a high frame rate commercial monitor and a photo-sensor. The audiovisual speech material, the corresponding text, synchronization material, comprehension questions and written summaries set are available on the web for research use.

Notes

Funding:
  • U.S. Army: 71931-LS-INT
  • Royal British Legion Centre for Blast Injury Studies

Files

ISH2022_Varano_Reichenbach.pdf

Files (768.5 kB)

Name Size Download all
md5:1bb3facfc88cf0ddb1a7b3a86b20de00
768.5 kB Preview Download

Additional details

Funding

Personalized fitting and evaluation of hearing aids with EEG responses EP/M026728/1
UK Research and Innovation
Towards a multisensory hearing aid: Engineering synthetic audiovisual and audiotactile signals to aid hearing in noisy backgrounds EP/R032602/1
UK Research and Innovation