The Grid Audio-Visual Lombard Speech Corpus
- 1. University of Sheffield, UK
- 2. Universite de Toulon, France
Description
Lombard Grid is a bi-view audiovisual Lombard speech corpus which can be used to support joint computational/behavioral studies in speech perception. The corpus includes 54 talkers, with 100 utterances per talker (50 Lombard and 50 plain utterances). This dataset follows the same sentence format as the audiovisual Grid corpus, and can thus be considered as an extension of that corpus. The sentence sets used in the Lombard Grid corpus are unique, however, and have not been utilized by the Grid corpus.
It offers two synchronised views of the talkers (front and side) to facilitate analysis of speech from different angles. A bespoke head-mounted camera system was used to collect both front and profile views of the talkers.
Statistics: 54 talkers: 30 female talkers and 24 male talkers; 5,400 (audio, front video and side video) utterances (16,200 files in total): 50% Lombard utterances, 50% plain reference utterances.
The dataset is described in detail in the paper,
Najwa Alghamdi, Steve Maddock, Ricard Marxer, Jon Barker and Guy J. Brown,, "A corpus of audio-visual Lombard speech with frontal and profile views", The Journal of the Acoustical Society of America 143, El523 (2018)
The paper is available online at White Rose Online Research.
------------------------------------------------------------------------------------
Notes on Filenaming
Filename format
SPKR_COND_UTTERANCE.wav|.mov - e.g., s8_p_sbbi9p.wav
*SPKR = s1 to s55
*COND = l or p, where l=> Lombard, p=> plain (i.e. non-Lombard)
*UTTERANCE = 6-character Grid utterance code, e.g. 'pgag6a' which means 'place green at g 6 again'
Metadata format
*SPKR = s1 to s55
*SESSION = 1 or 2
*INDEX = 1 to 10 for ordering of the recording blocks
*SUBINDEX = 1 to 10 for ordering of utterance in a 10-utterance block.
*COND = l or r, where l=> Lombard, p=> plain (i.e. non-Lombard)
*UTTERANCE = 6-character Grid utterance code, e.g. 'pgag6a' which means 'place green at g 6 again'
If a sentence is spoken incorrectly then the filename will be
_WRONG.wav e.g. s8_2_38_8_r_lrwizp_WRONG_lrbizp.wav
*TRANS = the Grid utterance code for what was actually said.
Files
lombardgrid_alignment.zip
Files
(2.5 GB)
Name | Size | Download all |
---|---|---|
md5:0cd68f258d1dbf15f05c3681f8715bab
|
2.5 MB | Preview Download |
md5:fa0cfc739705323b53ba50e148b3a144
|
652.6 MB | Preview Download |
md5:63b546f53267ae3f4cffe0c772317ce1
|
837.2 MB | Preview Download |
md5:070bf2b1570d9381215e7e57d8f58a21
|
64.9 kB | Preview Download |
md5:7fdc4b94f1b04de896e890f4ade355e0
|
992.6 MB | Preview Download |
Additional details
Related works
- Is supplement to
- Journal article: 10.1121/1.5042758 (DOI)
- References
- Dataset: 10.5281/zenodo.3625687 (DOI)
Funding
- Towards visually-driven speech enhancement for cognitively-inspired multi-modal hearing-aid devices (AV-COGHEAR) EP/M026981/1
- UK Research and Innovation