Places Audio Captions (Japanese) 100k

Ohishi, Yasunori; Kimura, Akisato; Kawanishi, Takahito; Kashino, Kunio; Harwath, David; Glass, James

doi:10.5281/zenodo.5563425

Published November 18, 2021 | Version v1

Video/Audio Open

Places Audio Captions (Japanese) 100k

1. NTT Corporation
2. MIT CSAIL

The Places Audio Caption (Japanese) 100K Corpus contains approximately 100,000 Japanese spoken captions for natural images drawn from the Places 205 image dataset.

This speech corpus was collected to investigate the learning of spoken language (words, sub-word units, higher-level semantics, etc.) from visually-grounded speech. For a description of the corpus, see:

@INPROCEEDINGS{Ohishi2020trilingual,
  author={Ohishi, Yasunori and Kimura, Akisato and Kawanishi, Takahito and Kashino, Kunio and Harwath, David and Glass, James},
  booktitle={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
  title={Trilingual Semantic Embeddings of Visually Grounded Speech with Self-Attention Mechanisms}, 
  year={2020},
  pages={4352-4356},
}

The corpus only includes audio recordings, and not the associated images. You will need to separately download the Places image dataset here.

The data is distributed under the Creative Commons Attribution-ShareAlike (CC BY-SA) license (link).

If you use this data in your own publications, please cite the paper above.

Files

Files (43.5 GB)

Name	Size	Download all
PlacesJapanese100k.tar.gz md5:4a4d1093363001e2a7c8c3ca5aa46533	43.5 GB	Download

	All versions	This version
Views	447	444
Downloads	105	105
Data volume	6.0 TB	6.0 TB

Places Audio Captions (Japanese) 100k

Creators

Description

Files

Files (43.5 GB)