Published November 14, 2018 | Version 1.0
Dataset Open

DeepPredSpeech: computational models of predictive speech coding based on deep learning

  • 1. Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab


This dataset contains all data, source code, pre-trained computational predictive models and experimental results related to:  

Hueber T., Tatulli E., Girin L., Schwatz, J-L "How predictive can be predictions in the neurocognitive processing of auditory and audiovisual speech? A deep learning study." (biorXiv preprint 

  • Raw data are extracted from the publicly available database NTCD-TIMIT (10.5281/zenodo.260228). 
    • Audio recordings are available in the audio_clean/ directory
    • Post-processed lip image sequences are available in the lips_roi/ directory (67x67 pixels, 8bits, obtained by lossless inverse DCT-2D transform from the DCT feature available in the original repository of NTCD-TIMIT)
    • Phonetic segmentation (extracted from NTCD-TIMIT original zenodo repository) is available in the HTK MLF file volunteer_labelfiles.mlf
  • Audio features (MFCC-spectrogram and log-spectrogram) are available in the mfcc_16k/ and fft_16k/ directories. 
  • Models (audio-only, video-only and audiovisual, based on deep feed-forward neural networks and/or convolutional neural network, in .h5 format, trained with Keras 2.0 toolkit) and data normalization parameters (in .dat scikit-learn format) are available in models_mfcc/ and models_logspectro/ directories
  • Predicted and target (ground truth) MFCC-spectro (resp. log-spectro) for the test databases (1909 sentences), and for the different values of \(\tau_p\) or \(\tau_f\) are available in pred_testdb_mfccspectro/ (resp. pred_testdb_logspectro/) directory

Source code for extracting audio features, training and evaluating the models is available on GitHub

All directories have been zipped before upload.

Feel free to contact me for more details.

Thomas Hueber, Ph. D., CNRS research fellow, GIPSA-lab, Grenoble, France, 


Files (31.9 GB)

Name Size Download all
854.4 MB Preview Download
83.8 MB Preview Download
1.1 GB Preview Download
2.1 GB Preview Download
112.7 MB Preview Download
2.7 GB Preview Download
429.3 MB Preview Download
18.4 GB Preview Download
6.1 GB Preview Download
4.3 MB Download

Additional details


SPEECH UNIT(E)S – The multisensory-motor unity of speech 339152
European Commission