Published June 26, 2018 | Version v1
Video/Audio Open

The Grid Audio-Visual Lombard Speech Corpus

  • 1. University of Sheffield, UK
  • 2. Universite de Toulon, France

Description

Lombard Grid is a bi-view audiovisual Lombard speech corpus which can be used to support joint computational/behavioral studies in speech perception. The corpus includes 54 talkers, with 100 utterances per talker (50 Lombard and 50 plain utterances). This dataset follows the same sentence format as the audiovisual Grid corpus, and can thus be considered as an extension of that corpus. The sentence sets used in the Lombard Grid corpus are unique, however, and have not been utilized by the Grid corpus.

It offers two synchronised views of the talkers (front and side) to facilitate analysis of speech from different angles. A bespoke head-mounted camera system was used to collect both front and profile views of the talkers.

Statistics: 54 talkers: 30 female talkers and 24 male talkers; 5,400 (audio, front video and side video) utterances (16,200 files in total): 50% Lombard utterances, 50% plain reference utterances.

The dataset is described in detail in the paper,

Najwa Alghamdi, Steve Maddock, Ricard Marxer, Jon Barker and Guy J. Brown,, "A corpus of audio-visual Lombard speech with frontal and profile views", The Journal of the Acoustical Society of America 143, El523 (2018) 

The paper is available online at White Rose Online Research.

------------------------------------------------------------------------------------

Notes on Filenaming

Filename format

SPKR_COND_UTTERANCE.wav|.mov - e.g., s8_p_sbbi9p.wav

*SPKR = s1 to s55

*COND = l or p, where l=> Lombard, p=> plain (i.e. non-Lombard)

*UTTERANCE = 6-character Grid utterance code, e.g. 'pgag6a' which means 'place green at g 6 again'

Metadata format

*SPKR = s1 to s55

*SESSION = 1 or 2

*INDEX = 1 to 10 for ordering of the recording blocks

*SUBINDEX = 1 to 10 for ordering of utterance in a 10-utterance block.

*COND = l or r, where l=> Lombard, p=> plain (i.e. non-Lombard)

*UTTERANCE = 6-character Grid utterance code, e.g. 'pgag6a' which means 'place green at g 6 again'

If a sentence is spoken incorrectly then the filename will be

_WRONG.wav e.g. s8_2_38_8_r_lrwizp_WRONG_lrbizp.wav

*TRANS = the Grid utterance code for what was actually said.

Files

lombardgrid_alignment.zip

Files (2.5 GB)

Name Size Download all
md5:0cd68f258d1dbf15f05c3681f8715bab
2.5 MB Preview Download
md5:fa0cfc739705323b53ba50e148b3a144
652.6 MB Preview Download
md5:63b546f53267ae3f4cffe0c772317ce1
837.2 MB Preview Download
md5:070bf2b1570d9381215e7e57d8f58a21
64.9 kB Preview Download
md5:7fdc4b94f1b04de896e890f4ade355e0
992.6 MB Preview Download

Additional details

Related works

Is supplement to
Journal article: 10.1121/1.5042758 (DOI)
References
Dataset: 10.5281/zenodo.3625687 (DOI)

Funding

Towards visually-driven speech enhancement for cognitively-inspired multi-modal hearing-aid devices (AV-COGHEAR) EP/M026981/1
UK Research and Innovation