Synthetically Spoken STAIR

William N. Havard; Jean-Pierre Chevrot; Laurent Besacier

doi:10.5281/zenodo.1495070

Published November 23, 2018 | Version v1

Dataset Open

Synthetically Spoken STAIR

1. Université Grenoble Alpes, LIG/GETALP and LIDILEM
2. Université Grenoble Alpes, LIDILEM
3. Université Grenoble Alpes, LIG/GETALP

This dataset consists of synthetically spoken captions for the STAIR dataset. Following the same methodology as Chrupała et al. (see article | dataset | code) we generated speech for each caption of the STAIR dataset using Google's Text-to-Speech API.

This dataset was used for visually grounded speech experiments (see article accepted at ICASSP2019).

@INPROCEEDINGS{8683069, 
author={W. N. {Havard} and J. {Chevrot} and L. {Besacier}}, 
booktitle={ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
title={Models of Visually Grounded Speech Signal Pay Attention to Nouns: A Bilingual Experiment on English and Japanese}, 
year={2019}, 
volume={}, 
number={}, 
pages={8618-8622}, 
keywords={information retrieval;natural language processing;neural nets;speech processing;word processing;artificial neural attention;human attention;monolingual models;part-of-speech tags;nouns;neural models;visually grounded speech signal;English language;Japanese language;word endings;cross-lingual speech-to-speech retrieval;grounded language learning;attention mechanism;cross-lingual speech retrieval;recurrent neural networks.}, 
doi={10.1109/ICASSP.2019.8683069}, 
ISSN={2379-190X}, 
month={May},}

The dataset comprises the following files :

mp3-stair.tar.gz : MP3 files of each caption in the STAIR dataset. Filenames have the following pattern imageID_captionID, where both imageID and captionID correspond to those provided in the original dataset (see annotation format here)
dataset.mfcc.npy : Numpy array with MFCC vectors for each caption. MFCC were extracted using python_speech_features with default configuration. To know to which caption the MFCC vectors belong to, you can use the files dataset.words.txt and dataset.ids.txt.
dataset.words.txt : Captions corresponding to each MFCC vector (line number = position in Numpy array, starting from 0)
dataset.ids.txt : IDs of the captions (imageID_captionID) corresponding to each MFCC vector (line number = position in Numpy array, starting from 0)
Splits
- test
  - test.txt : captions comprising the test split
  - test_ids.txt: IDs of the captions in the test split
  - test_tagged.txt : tagged version of the test split
  - test-alignments.json.zip : Forced alignments of all the captions in the test split. (dictionary where the key corresponds to the caption ID in the STAIR dataset). Due to an unknown error during upload, the JSON file had to be zipped...
- train
  - train.txt : captions comprising the train split
  - train_ids.txt : IDs of the captions in the train split
  - train_tagged.txt : tagged version of the train split
- val
  - val.txt : captions comprising the val split
  - val_ids.txt : IDs of the captions in the val split
  - val_tagged.txt : tagged version of the val split

Files

dataset.ids.txt

Files (25.7 GB)

Name	Size	Download all
dataset.ids.txt md5:acd4d84ecb4e40a53512151488284434	8.4 MB	Preview Download
dataset.mfcc.npy md5:7945bb5878278c2154cf8c8b059caa51	14.8 GB	Download
dataset.words.txt md5:b3b4603377ef93c33c7428573ebecdd6	39.2 MB	Preview Download
mp3-stair.tar.gz md5:f9fcc4576565e8e64849c5ba9bbb4454	10.6 GB	Download
test-alignments.json.zip md5:2edac7baf32f6de0cb1589a921508b3d	12.5 MB	Preview Download
test.txt md5:4c0f7d3febc04c886c4f185677214e0b	1.6 MB	Preview Download
test_ids.txt md5:5118ace7924914865c65a611da45b652	341.9 kB	Preview Download
test_tagged.txt md5:0d85ada7e154a9868ff3d5c6585c5e37	5.4 MB	Preview Download
train.txt md5:82ab832cff603d42a55862aaa4dca4d2	36.0 MB	Preview Download
train_ids.txt md5:6142c7127ce15254d4608db095aa2f1f	7.7 MB	Preview Download
train_tagged.txt md5:40f85a75fb21ebc5ee9e0de265c92714	122.9 MB	Preview Download
val.txt md5:5868433eabc01d507fdd01029b937e80	1.6 MB	Preview Download
val_ids.txt md5:0b252fe5d15db3dfd16146e2524097a8	342.1 kB	Preview Download
val_tagged.txt md5:723ff27c0c591d723158c8da2699d367	5.4 MB	Preview Download

Citations

Oops! Something went wrong while fetching results.

	All versions	This version
Views	457	455
Downloads	4,153	4,140
Data volume	1.8 TB	1.8 TB

Synthetically Spoken STAIR

Creators

Description

Files

dataset.ids.txt

Files (25.7 GB)