Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published May 22, 2017 | Version v1
Journal article Open

Online film subtitles as a corpus: An ngram-based approach

Description

This paper investigates online film subtitles as a separate register of communication from a quantitative perspective. Subtitles from films in English and other languages translated into English are compared with registers of spoken and written communication represented by large corpora of British and American English. A series of quantitative analyses based of n-gram frequencies demonstrate that subtitles are not fundamentally different from other registers of English and that they represent a close approximation of British and American informal conversations. However, it is shown that the subtitles are different from the conversations with regard to several functional characteristics, which are typical of the language of scripted dialogues in films and TV series in general. Namely, the language of subtitles is more emotional and dynamic, but less spontaneous, vague and narrative than that of normally occurring conversations. The paper also compares subtitles in original English and subtitles translated from other languages and detects variation that can be explained by differences in communicative styles.

Files

Levshina_SubtitlesAsCorpus_revised.pdf

Files (740.7 kB)

Name Size Download all
md5:0aadb7bf05ee866a99ab2ccba20467ff
740.7 kB Preview Download

Additional details

Funding

FormGram – Form-frequency correspondences in grammar 670985
European Commission