DeepPredSpeech: computational models of predictive speech coding based on deep learning

Hueber, Thomas; Tatulli, Eric; Girin, Laurent; Schwartz, Jean-Luc

doi:10.5281/zenodo.1487974

Published November 14, 2018 | Version 1.0

Dataset Open

DeepPredSpeech: computational models of predictive speech coding based on deep learning

1. Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab

This dataset contains all data, source code, pre-trained computational predictive models and experimental results related to:

Hueber T., Tatulli E., Girin L., Schwatz, J-L "How predictive can be predictions in the neurocognitive processing of auditory and audiovisual speech? A deep learning study." (biorXiv preprint https://doi.org/10.1101/471581).

Raw data are extracted from the publicly available database NTCD-TIMIT (10.5281/zenodo.260228).
- Audio recordings are available in the audio_clean/ directory
- Post-processed lip image sequences are available in the lips_roi/ directory (67x67 pixels, 8bits, obtained by lossless inverse DCT-2D transform from the DCT feature available in the original repository of NTCD-TIMIT)
- Phonetic segmentation (extracted from NTCD-TIMIT original zenodo repository) is available in the HTK MLF file volunteer_labelfiles.mlf
Audio features (MFCC-spectrogram and log-spectrogram) are available in the mfcc_16k/ and fft_16k/ directories.
Models (audio-only, video-only and audiovisual, based on deep feed-forward neural networks and/or convolutional neural network, in .h5 format, trained with Keras 2.0 toolkit) and data normalization parameters (in .dat scikit-learn format) are available in models_mfcc/ and models_logspectro/ directories
Predicted and target (ground truth) MFCC-spectro (resp. log-spectro) for the test databases (1909 sentences), and for the different values of \(\tau_p\) or \(\tau_f\) are available in pred_testdb_mfccspectro/ (resp. pred_testdb_logspectro/) directory

Source code for extracting audio features, training and evaluating the models is available on GitHub https://github.com/thueber/DeepPredSpeech/

All directories have been zipped before upload.

Feel free to contact me for more details.

Thomas Hueber, Ph. D., CNRS research fellow, GIPSA-lab, Grenoble, France, thomas.hueber@gipsa-lab.fr

Files

audio_clean.zip

Files (31.8 GB)

Name	Size	Download all
audio_clean.zip md5:95760da33c73583a800a04512add3860	854.4 MB	Preview Download
fft_16k.zip md5:73cd4942408ec21b5d9b1cde001ee8d5	1.1 GB	Preview Download
lips_roi.zip md5:97697d79d3c83ca9b8376f29dafa85c2	2.1 GB	Preview Download
mfcc_16k.zip md5:bd422555e2241b7dda8032410116c743	112.7 MB	Preview Download
models_logspectro.zip md5:fb7c2825fa4e0adb42debc5594460fce	2.7 GB	Preview Download
models_mfcc.zip md5:89b551f3945aac090af4364d7d66c223	429.3 MB	Preview Download
pred_testdb_logspectro.zip md5:726e74a82f13791c4e5a4ebbaf9f3867	18.4 GB	Preview Download
pred_testdb_mfcc.zip md5:85cb6c27733849d71308ef21cd685a93	6.1 GB	Preview Download
volunteer_labelfiles.mlf md5:4f0749a409ec998cc64a7be440de343c	4.3 MB	Download

Additional details

European Commission
SPEECH UNIT(E)S - The multisensory-motor unity of speech 339152

	All versions	This version
Views	914	482
Downloads	467	208
Data volume	3.2 TB	930.4 GB

DeepPredSpeech: computational models of predictive speech coding based on deep learning

Creators

Description

Files

audio_clean.zip

Files (31.8 GB)

Additional details

Funding