AustroParl Corpus of Parliamentary Debates

doi:10.5281/zenodo.3819505

Published May 10, 2020 | Version 0.1.0

Dataset Open

AustroParl Corpus of Parliamentary Debates

1. University of Duisburg-Essen

The AustroParl Corpus of Parliamentary Debates, prepared in the PolMine Project, comprises all protocols of plenary sessions in the Austrian Nationalrat between 1996 and 2019. The corpus is built based on pdf documents issued by the Nationalrat. The R package frappp has been used to extract structural information from the orginal text and to prepare an XML version of the corpus (preliminary TEI format). The structural annotation comprises speaker, party affiliation, parliamentary group affiliation, role, legislative period, session, date, interjections, year and agenda item.

This release offers a linguistically annotated and indexed format of the corpus. As part of the corpus preparation pipeline, the data has been linguistically annotated (using the TreeTagger and StanfordNLP) and imported into the Corpus Workbench (CWB). The linguistic annotation comprises POS-tagging and lemmatization.

This language resource is still very much in development and comes without any guarantees.

Files

Files (1.1 GB)

Name	Size	Download all
austroparl_lda_250_2019-07-26.rds md5:fbc8e56e4a3afd930eaf6e5e4f5c0d60	169.3 MB	Download
austroparl_v0.1.0.tar.gz md5:b09f835946ef438267e1b3f42d6db00e	904.0 MB	Download

	All versions	This version
Views	363	363
Downloads	51	51
Data volume	49.9 GB	49.9 GB

AustroParl Corpus of Parliamentary Debates

Creators

Description

Files

Files (1.1 GB)