Published March 23, 2018 | Version 1.1
Dataset Open

CSF18

  • 1. Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab

Description

CSF18 - Multimodal database of French Cued-speech (revised in 2022)

Dataset used in "Visual recognition of continuous Cued Speech using a tandem CNN-HMM approach", by Liu, Hueber, Feng, Beautemps (submitted to Interspeech 2018)

476 sentences (i.e. 2 repetitions of 238 sentences) uttered by a professional French Cued-speech coder

  • video/  PNG images, 576x720, 50fps (after deinterleave)
  • audio/  WAV, 16kHz, 16bits
  • prompt.txt: Text prompt of the recorded sentences 
  • corpus_mlf.txt: Phonetic transcription aligned on the audio signal (HTK format, Master Label File) obtained using the LiaPhon phonetizer and a forced-alignment HMM-based procedure (no manual check)
  • corpus_mlf_updated_icassp2022.txt: Manually checked/cleaned version of corpus_mlf.txt (see Sankar et al., ICASSP 2022 paper)
  • phonelist.txt: list of the 34 labels used to encode French phonemes at GIPSA-lab.

Files

audio.zip

Files (41.6 GB)

Name Size Download all
md5:41c138061132b01cc9fb445d4ec36a90
49.5 MB Preview Download
md5:8cd65f2baf85ffc9fcfa2b0ed340c18b
258.6 kB Preview Download
md5:df4f1819fa200422b0e32bd574f35539
252.8 kB Preview Download
md5:e165f7a5a5d080c670fc1d3b7b74be60
78 Bytes Preview Download
md5:579b033bc87ce1b72fd9ee0aaf3f0cd6
10.4 kB Preview Download
md5:84dcd604ecce862486bd8f1e86098ff4
1.4 kB Download
md5:ebc356bd861159552458bc761da14cc5
41.5 GB Preview Download

Additional details

Funding

Comm4CHILD – Communication for Children with Hearing Impairment to optimise Language Development 860755
European Commission