The Grid Audio-Visual Lombard Speech Corpus

Najwa Alghamdi; Steve Maddock; Ricard Marxer; Jon Barker; Guy J Brown

doi:10.5281/zenodo.3736465

Published June 26, 2018 | Version v1

Video/Audio Open

The Grid Audio-Visual Lombard Speech Corpus

1. University of Sheffield, UK
2. Universite de Toulon, France

Lombard Grid is a bi-view audiovisual Lombard speech corpus which can be used to support joint computational/behavioral studies in speech perception. The corpus includes 54 talkers, with 100 utterances per talker (50 Lombard and 50 plain utterances). This dataset follows the same sentence format as the audiovisual Grid corpus, and can thus be considered as an extension of that corpus. The sentence sets used in the Lombard Grid corpus are unique, however, and have not been utilized by the Grid corpus.

It offers two synchronised views of the talkers (front and side) to facilitate analysis of speech from different angles. A bespoke head-mounted camera system was used to collect both front and profile views of the talkers.

Statistics: 54 talkers: 30 female talkers and 24 male talkers; 5,400 (audio, front video and side video) utterances (16,200 files in total): 50% Lombard utterances, 50% plain reference utterances.

The dataset is described in detail in the paper,

Najwa Alghamdi, Steve Maddock, Ricard Marxer, Jon Barker and Guy J. Brown,, "A corpus of audio-visual Lombard speech with frontal and profile views", The Journal of the Acoustical Society of America 143, El523 (2018)

The paper is available online at White Rose Online Research.

------------------------------------------------------------------------------------

Notes on Filenaming

Filename format

SPKR_COND_UTTERANCE.wav|.mov - e.g., s8_p_sbbi9p.wav

*SPKR = s1 to s55

*COND = l or p, where l=> Lombard, p=> plain (i.e. non-Lombard)

*UTTERANCE = 6-character Grid utterance code, e.g. 'pgag6a' which means 'place green at g 6 again'

Metadata format

*SPKR = s1 to s55

*SESSION = 1 or 2

*INDEX = 1 to 10 for ordering of the recording blocks

*SUBINDEX = 1 to 10 for ordering of utterance in a 10-utterance block.

*COND = l or r, where l=> Lombard, p=> plain (i.e. non-Lombard)

*UTTERANCE = 6-character Grid utterance code, e.g. 'pgag6a' which means 'place green at g 6 again'

If a sentence is spoken incorrectly then the filename will be

_WRONG.wav e.g. s8_2_38_8_r_lrwizp_WRONG_lrbizp.wav

*TRANS = the Grid utterance code for what was actually said.

Files

lombardgrid_alignment.zip

Files (2.5 GB)

Name	Size
lombardgrid_alignment.zip md5:0cd68f258d1dbf15f05c3681f8715bab	2.5 MB	Preview Download
lombardgrid_audio.zip md5:fa0cfc739705323b53ba50e148b3a144	652.6 MB	Preview Download
lombardgrid_front.zip md5:63b546f53267ae3f4cffe0c772317ce1	837.2 MB	Preview Download
lombardgrid_json.zip md5:070bf2b1570d9381215e7e57d8f58a21	64.9 kB	Preview Download
lombardgrid_side.zip md5:7fdc4b94f1b04de896e890f4ade355e0	992.6 MB	Preview Download

Additional details

Is supplement to: Journal article: 10.1121/1.5042758 (DOI)
References: Dataset: 10.5281/zenodo.3625687 (DOI)

UK Research and Innovation
Towards visually-driven speech enhancement for cognitively-inspired multi-modal hearing-aid devices (AV-COGHEAR) EP/M026981/1

	All versions	This version
Views	929	929
Downloads	378	378
Data volume	237.4 GB	237.4 GB

lombardgrid_alignment.zip

Files (2.5 GB)

Related works

Funding

The Grid Audio-Visual Lombard Speech Corpus

Authors/Creators

Description

Files

lombardgrid_alignment.zip

Files (2.5 GB)

Additional details

Related works

Funding