Published December 21, 2020 | Version 1.0
Dataset Open

A studyforrest extension, an annotation of spoken language in the German dubbed movie ``Forrest Gump'' and its audio-description (annotation)

  • 1. Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Centre Jülich, Jülich, 52425, Germany; Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University, Düsseldorf, 40225, Germany

Description

This dataset contains the annotation of speech spoken in the research cut (Hanke et al. 2014; Hanke et al., 2016) of the movie "Forrest Gump" (Zemeckis, 1994) and its audio-description that was broadcast as an additional audio track (Koop et al., 2009) for visually impaired listeners on Swiss public television. The corresponding paper is hosted on github (https://github.com/psychoinformatics-de/studyforrest-paper-speechannotation) and published in f1000research (https://doi.org/10.12688/f1000research.27621.1).

Notes

Here we present an annotation of speech in the audio-visual movie "Forrest Gump" and its audio-description for a visually impaired audience, as an addition to a large public functional brain imaging dataset (studyforrest.org). The annotation provides information about the exact timing of each of the more than 2500 spoken sentences, 16000 words (including 202 non-speech vocalizations), 66000 phonemes, and their corresponding speaker. Additionally, for every word, we provide lemmatization, a simple part-of-speech-tagging (15 grammatical categories), a detailed part-of-speech tagging (43 grammatical categories), syntactic dependencies, and a semantic analysis based on word embedding which represents each word in a 300-dimensional semantic space. To validate the dataset's quality, we build a model of hemodynamic brain activity based on information drawn from the annotation. Results suggest that the annotation's content and quality enable independent researchers to create models of brain activity correlating with a variety of linguistic aspects under conditions of near-real-life complexity

Files

Files (88.9 MB)

Name Size Download all
md5:353dcdc6bf4e54d6dbdbec37dcc1108a
88.9 MB Download

Additional details

Related works

Continues
Dataset: 10.17605/OSF.IO/GFRME (DOI)
Is documented by
Preprint: https://github.com/psychoinformatics-de/studyforrest-paper-speechannotation (URL)
Journal article: 10.12688/f1000research.27621.1 (DOI)
Is supplemented by
Dataset: 10.5281/zenodo.4382188 (DOI)