CROCorp: Corpus of Parliamentary Debates in Croatia
Description
The repository contains a cleaned and pre-processed corpus of parliamentary debates from the Croatian Parliament (Sabor). The corpus is accompanied by the metadata on elected representatives and their political parties. It covers the period of 2003-2020 (five complete terms) and counts over 500 thousand speeches.
If you use the dataset, please cite: Mochtak, Michal, Josip Glaurdić, and Christophe Lesschaeve (2022): CROCorp: Corpus of Parliamentary Debates in Croatia (v1.1.1), https://doi.org/10.5281/zenodo.6521372.
v1.1.1 (latest version)
- added the concept DOI to codebooks (DOI was generated only after the repository was published)
v1.1.0
- improved coding of dummy variable "moderator" (using less error-prone alghoritm for detecting the modertor role)
- fixed issue with agenda points which are conncatenated while preserving a unique web link
- recoded agenda points tags using better ML model (transformer architecture)
v1.0.0
- originally posted on GESIS repository (migrated to ZENODO due to limitations concerning the concept DOI)
Notes
Files
CODEBOOK_CRO_corpus.pdf
Files
(394.5 MB)
Name | Size | Download all |
---|---|---|
md5:99299c11a65e0a4c28aca0f8e61d17b4
|
162.8 kB | Preview Download |
md5:7b2f74d99b37ea21119fa63fb81f72e9
|
160.5 kB | Preview Download |
md5:fec441be729e846fdc1cd3d8310e8995
|
159.1 kB | Preview Download |
md5:9f64701b5b365f617a5df677bbd66430
|
76.2 MB | Download |
md5:910a26d15b3d8ac7cdfac45dea0a7e41
|
66.9 MB | Download |
md5:fadd0e7f3e40dca5fa3f8000a3abefd2
|
93.6 MB | Download |
md5:48b3d069967a30926697a46c9c2b9a8b
|
8.9 MB | Download |
md5:6b1c3b7e1fb0f850a3eaaf33c7b17648
|
148.3 MB | Download |
md5:f44528cbe4320cde847d1f436d85b4c1
|
132.9 kB | Download |
md5:895d878c4b200a0455b72c4d5b11de53
|
13.4 kB | Download |