Software Open Access

A corpus of Vladimir Putin's speeches 2012-2022

Levshina, Natalia

This release contains transcripts of Putin's speeches produced from May 7 2012 till August 16 2022. The texts were scraped from the official website kremlin.ru and represented in the XML and CONLLU formats. The latter provides tokens, lemmas, Universal Parts of Speech, Universal Dependencies and morphological features, which were provided with the help of a Stanza model trained on the Syntagrus corpus. The collected texts are classified as "Speeches and Addresses" on kremlin.ru and do not represent the entirety of his speech produced in this period. They mostly contain Putin's prepared monological speech, but also include some spontaneous speech (e.g., answers to journalists' questions). The main bulk of utterances belong to Putin, but there are some produced by other speakers (politicians, journalists, guests, etc.). The speaker's identity can be found in the XML tags "speaker" and in the #speaker =... line in the CONLLU format.

Files (34.7 MB)
Name Size
levshina/Putin_Corpus-v1.0.zip
md5:770f030a6091e239728db658cd175a59
34.7 MB Download
176
18
views
downloads
All versions This version
Views 176176
Downloads 1818
Data volume 624.5 MB624.5 MB
Unique views 158158
Unique downloads 1818

Share

Cite as