Published September 7, 2022 | Version v1.0
Software Open

A corpus of Vladimir Putin's speeches 2012-2022

Description

This release contains transcripts of Putin's speeches produced from May 7 2012 till August 16 2022. The texts were scraped from the official website kremlin.ru and represented in the XML and CONLLU formats. The latter provides tokens, lemmas, Universal Parts of Speech, Universal Dependencies and morphological features, which were provided with the help of a Stanza model trained on the Syntagrus corpus. The collected texts are classified as "Speeches and Addresses" on kremlin.ru and do not represent the entirety of his speech produced in this period. They mostly contain Putin's prepared monological speech, but also include some spontaneous speech (e.g., answers to journalists' questions). The main bulk of utterances belong to Putin, but there are some produced by other speakers (politicians, journalists, guests, etc.). The speaker's identity can be found in the XML tags "speaker" and in the #speaker =... line in the CONLLU format.

Files

levshina/Putin_Corpus-v1.0.zip

Files (34.7 MB)

Name Size Download all
md5:770f030a6091e239728db658cd175a59
34.7 MB Preview Download

Additional details

Related works