Video/Audio Open Access

The Grid Audio-Visual Lombard Speech Corpus

Najwa Alghamdi; Steve Maddock; Ricard Marxer; Jon Barker; Guy J Brown

Lombard Grid is a bi-view audiovisual Lombard speech corpus which can be used to support joint computational/behavioral studies in speech perception. The corpus includes 54 talkers, with 100 utterances per talker (50 Lombard and 50 plain utterances). This dataset follows the same sentence format as the audiovisual Grid corpus, and can thus be considered as an extension of that corpus. The sentence sets used in the Lombard Grid corpus are unique, however, and have not been utilized by the Grid corpus.

It offers two synchronised views of the talkers (front and side) to facilitate analysis of speech from different angles. A bespoke head-mounted camera system was used to collect both front and profile views of the talkers.

Statistics: 54 talkers: 30 female talkers and 24 male talkers; 5,400 (audio, front video and side video) utterances (16,200 files in total): 50% Lombard utterances, 50% plain reference utterances.

The dataset is described in detail in the paper,

Najwa Alghamdi, Steve Maddock, Ricard Marxer, Jon Barker and Guy J. Brown,, "A corpus of audio-visual Lombard speech with frontal and profile views", The Journal of the Acoustical Society of America 143, El523 (2018) 

The paper is available online at White Rose Online Research.

------------------------------------------------------------------------------------

Notes on Filenaming

Filename format

SPKR_COND_UTTERANCE.wav|.mov - e.g., s8_p_sbbi9p.wav

*SPKR = s1 to s55

*COND = l or p, where l=> Lombard, p=> plain (i.e. non-Lombard)

*UTTERANCE = 6-character Grid utterance code, e.g. 'pgag6a' which means 'place green at g 6 again'

Metadata format

*SPKR = s1 to s55

*SESSION = 1 or 2

*INDEX = 1 to 10 for ordering of the recording blocks

*SUBINDEX = 1 to 10 for ordering of utterance in a 10-utterance block.

*COND = l or r, where l=> Lombard, p=> plain (i.e. non-Lombard)

*UTTERANCE = 6-character Grid utterance code, e.g. 'pgag6a' which means 'place green at g 6 again'

If a sentence is spoken incorrectly then the filename will be

_WRONG.wav e.g. s8_2_38_8_r_lrwizp_WRONG_lrbizp.wav

*TRANS = the Grid utterance code for what was actually said.

Files (2.5 GB)
Name Size
lombardgrid_alignment.zip
md5:0cd68f258d1dbf15f05c3681f8715bab
2.5 MB Download
lombardgrid_audio.zip
md5:fa0cfc739705323b53ba50e148b3a144
652.6 MB Download
lombardgrid_front.zip
md5:63b546f53267ae3f4cffe0c772317ce1
837.2 MB Download
lombardgrid_json.zip
md5:070bf2b1570d9381215e7e57d8f58a21
64.9 kB Download
lombardgrid_side.zip
md5:7fdc4b94f1b04de896e890f4ade355e0
992.6 MB Download
18
0
views
downloads
All versions This version
Views 1818
Downloads 00
Data volume 0 Bytes0 Bytes
Unique views 1616
Unique downloads 00

Share

Cite as