Video/Audio Open Access

The Grid Audio-Visual Lombard Speech Corpus

Najwa Alghamdi; Steve Maddock; Ricard Marxer; Jon Barker; Guy J Brown

Lombard Grid is a bi-view audiovisual Lombard speech corpus which can be used to support joint computational/behavioral studies in speech perception. The corpus includes 54 talkers, with 100 utterances per talker (50 Lombard and 50 plain utterances). This dataset follows the same sentence format as the audiovisual Grid corpus, and can thus be considered as an extension of that corpus. The sentence sets used in the Lombard Grid corpus are unique, however, and have not been utilized by the Grid corpus.

It offers two synchronised views of the talkers (front and side) to facilitate analysis of speech from different angles. A bespoke head-mounted camera system was used to collect both front and profile views of the talkers.

Statistics: 54 talkers: 30 female talkers and 24 male talkers; 5,400 (audio, front video and side video) utterances (16,200 files in total): 50% Lombard utterances, 50% plain reference utterances.

The dataset is described in detail in the paper,

Najwa Alghamdi, Steve Maddock, Ricard Marxer, Jon Barker and Guy J. Brown,, "A corpus of audio-visual Lombard speech with frontal and profile views", The Journal of the Acoustical Society of America 143, El523 (2018) 

The paper is available online at White Rose Online Research.


Notes on Filenaming

Filename format

SPKR_COND_UTTERANCE.wav|.mov - e.g., s8_p_sbbi9p.wav

*SPKR = s1 to s55

*COND = l or p, where l=> Lombard, p=> plain (i.e. non-Lombard)

*UTTERANCE = 6-character Grid utterance code, e.g. 'pgag6a' which means 'place green at g 6 again'

Metadata format

*SPKR = s1 to s55

*SESSION = 1 or 2

*INDEX = 1 to 10 for ordering of the recording blocks

*SUBINDEX = 1 to 10 for ordering of utterance in a 10-utterance block.

*COND = l or r, where l=> Lombard, p=> plain (i.e. non-Lombard)

*UTTERANCE = 6-character Grid utterance code, e.g. 'pgag6a' which means 'place green at g 6 again'

If a sentence is spoken incorrectly then the filename will be

_WRONG.wav e.g. s8_2_38_8_r_lrwizp_WRONG_lrbizp.wav

*TRANS = the Grid utterance code for what was actually said.

Files (2.5 GB)
Name Size
2.5 MB Download
652.6 MB Download
837.2 MB Download
64.9 kB Download
992.6 MB Download
All versions This version
Views 2222
Downloads 00
Data volume 0 Bytes0 Bytes
Unique views 2020
Unique downloads 00


Cite as