Synthetically Spoken COCO

doi:10.5281/zenodo.400926

Published April 5, 2017 | Version v1

Dataset Open

Synthetically Spoken COCO

1. Tilburg University

Synthetically Spoken COCO

Version 1.0

This dataset contain synthetically generated spoken versions of MS COCO [1] captions. This
dataset was created as part the research reported in [5].
The speech was generated using gTTS [2]. The dataset consists of the following files:

- dataset.json: Captions associated with MS COCO images. This information comes from [3].
- sentid.txt: List of caption IDs. This file can be used to locate MFCC features of the MP3 files
in the numpy array stored in dataset.mfcc.npy.
- mp3.tgz: MP3 files with the audio. Each file name corresponds to caption ID in dataset.json
and in sentid.txt.
- dataset.mfcc.npy: Numpy array with the Mel Frequence Cepstral Coefficients extracted from
the audio. Each row corresponds to a caption. The order or the captions corresponds to the
ordering in the file sentid.txt. MFCCs were extracted using [4].

[1] http://mscoco.org/dataset/#overview
[2] https://pypi.python.org/pypi/gTTS
[3] https://github.com/karpathy/neuraltalk
[4] https://github.com/jameslyons/python_speech_features
[5] https://arxiv.org/abs/1702.01991

Files

dataset.json

Files (31.6 GB)

Name	Size	Download all
dataset.json md5:b136a751c7bdeb10f715d2cf7e554093	144.2 MB	Preview Download
dataset.mfcc.npy md5:e2e35865bf0ef0c1bfec44b97a50a309	23.3 GB	Download
mp3.tgz md5:fab24afd1c69811260d0bd534e67db9a	8.1 GB	Download
README md5:40a316c2a8cade80050b833c74761260	1.1 kB	Download
sentid.txt md5:7e8f7a44398c06429f56f26cba121583	4.2 MB	Preview Download

Additional details

https://arxiv.org/abs/1702.01991

	All versions	This version
Views	884	882
Downloads	2,050	2,028
Data volume	1.3 TB	1.3 TB

Synthetically Spoken COCO

Creators

Description

Files

dataset.json

Files (31.6 GB)

Additional details

References