Published August 28, 2022 | Version 1.0.0
Other Open

Corpus of OSCE Elections Monitoring Reports


This is a repository for the corpus of election monitoring reports collected from the official website of the Organization for Security and Cooperation in Europe (OSCE) and its subsidiary Office for Democratic Institutions and Human Rights (ODIHR) (link). Monitoring reports stored as machine-readable pdf files were text-mined and re-organized into a convenient tabular format (i.e. data frame). The corpus covers the period of 1995-2020 and counts 415 reports parsed into 11 529 topical sections. The corpus is stored in R's native .RDS file. Apart from the main corpus, the repository also contains annotated speeches in the full CoNNL-U format. The annotation was done using Trankit analytical pipeline with the default English language model.

If you use the dataset, please cite it with the meta link for the whole repository: Mochtak, Michal and Adam Drnovsky (2022): Corpus of OSCE Elections Monitoring Reports, v1 (1995-2020),

For the paper introducing the dataset, please cite: Mochtak, Michal, Adam Drnovsky, and Christophe Lesschaeve (2022): "Bias in the Eye of Beholder? 25 Years of Election Monitoring in Europe". Democratization, 29 (5): 899-917. (link)


Please notify the authors if you notice any systematic problems with the corpus. Although we did our best to eliminate potential issues caused by text mining, it was impossible to check all data entries manually.



Files (85.8 MB)

Name Size Download all
171.8 kB Preview Download
71.2 MB Download
14.3 MB Download

Additional details

Related works

Is supplement to
Journal article: 10.1080/13510347.2021.2019219 (DOI)