Online film subtitles as a corpus: An ngram-based approach

Levshina, Natalia

doi:10.5281/zenodo.582336

Published May 22, 2017 | Version v1

Journal article Open

Online film subtitles as a corpus: An ngram-based approach

Levshina, Natalia

This paper investigates online film subtitles as a separate register of communication from a quantitative perspective. Subtitles from films in English and other languages translated into English are compared with registers of spoken and written communication represented by large corpora of British and American English. A series of quantitative analyses based of n-gram frequencies demonstrate that subtitles are not fundamentally different from other registers of English and that they represent a close approximation of British and American informal conversations. However, it is shown that the subtitles are different from the conversations with regard to several functional characteristics, which are typical of the language of scripted dialogues in films and TV series in general. Namely, the language of subtitles is more emotional and dynamic, but less spontaneous, vague and narrative than that of normally occurring conversations. The paper also compares subtitles in original English and subtitles translated from other languages and detects variation that can be explained by differences in communicative styles.

Files

Levshina_SubtitlesAsCorpus_revised.pdf

Files (740.7 kB)

Name	Size	Download all
Levshina_SubtitlesAsCorpus_revised.pdf md5:0aadb7bf05ee866a99ab2ccba20467ff	740.7 kB	Preview Download

Additional details

FormGram – Form-frequency correspondences in grammar 670985: European Commission

	All versions	This version
Views	136	136
Downloads	116	116
Data volume	90.4 MB	90.4 MB

Online film subtitles as a corpus: An ngram-based approach

Creators

Description

Files

Levshina_SubtitlesAsCorpus_revised.pdf

Files (740.7 kB)

Additional details

Funding