There is a newer version of the record available.

Published August 24, 2022 | Version 0.0.0
Other Open

SVKCorp: Corpus of Debates in the National Council of the Slovak Republic

Description

This is a repository for the corpus of transcripts of parliamentary debates in the National Council of the Slovak Republic (https://www.nrsr.sk/web/). Transcripts of speeches were text-mined and cleaned from machine-readable word documents and the official online database of the Parliament. The corpus covers the period of 1994-2020 and counts seven complete terms with 375 024 speeches. The repository contains two types of data files. The corpus is stored on per term basis in .RDS files with the following taxonomy – SK_term_*.RDS. For further information on data structure, see the accompanying codebook. Apart from the main corpus, the repository also contains annotated speeches in the full CoNNL-U format (see SK_speeches_anno_*.RDS). The annotation was done using Trankit analytical pipeline with the default Slovak language model.

Notes

This is a "work in progress" project. Please notify the author if you notice any systematic issues with the corpus. The 375k entries in the corpus were not checked manually.

Files

Files (2.0 GB)

Name Size Download all
md5:e1bc0aebc03ac3c10627c46514ac10e2
178.1 MB Download
md5:9dd497380a50e97968a001f20965befe
275.9 MB Download
md5:d2cdb81f209ae66c36cb0d07e7d1c659
179.5 MB Download
md5:51f281c99e6ca9e5e6e54663de4c7d2a
217.8 MB Download
md5:c49d50b92f60ce5ae2ed7ebe9e49f572
99.8 MB Download
md5:96e2f08c67d6b94cb4204f06873cba81
413.6 MB Download
md5:fb7d773777194d17dd0d5494d402bda5
292.6 MB Download
md5:6755bbbfece56bc2f19d67e3fa3f2000
33.1 MB Download
md5:b2a1edcdb0a0e0ee7a5ef54549eb386b
50.7 MB Download
md5:9491865113d42a68a018467108675f43
31.4 MB Download
md5:2ec388d440c4f4c52be84284cdc667ef
39.9 MB Download
md5:3721f137ea4d1676a1d841f27b196661
18.4 MB Download
md5:a31a6d59eea898d5febf5491cfd05f0c
75.9 MB Download
md5:7c34a07cbd8be60f10f7cabaf47267fe
53.6 MB Download