A corpus of Vladimir Putin's speeches 2012-2022
Creators
Description
This release contains transcripts of Putin's speeches produced from May 7 2012 till August 16 2022. The texts were scraped from the official website kremlin.ru and represented in the XML and CONLLU formats. The latter provides tokens, lemmas, Universal Parts of Speech, Universal Dependencies and morphological features, which were provided with the help of a Stanza model trained on the Syntagrus corpus. The collected texts are classified as "Speeches and Addresses" on kremlin.ru and do not represent the entirety of his speech produced in this period. They mostly contain Putin's prepared monological speech, but also include some spontaneous speech (e.g., answers to journalists' questions). The main bulk of utterances belong to Putin, but there are some produced by other speakers (politicians, journalists, guests, etc.). The speaker's identity can be found in the XML tags "speaker" and in the #speaker =... line in the CONLLU format.
Files
levshina/Putin_Corpus-v1.0.zip
Files
(34.7 MB)
Name | Size | Download all |
---|---|---|
md5:770f030a6091e239728db658cd175a59
|
34.7 MB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/levshina/Putin_Corpus/tree/v1.0 (URL)