Published July 22, 2024 | Version v2.2.0-rc1
Dataset Restricted

GermaParl Corpus of Plenary Protocols

  • 1. University of Duisburg-Essen

Description

The GermaParl Corpus of Parliamentary Protocols has been prepared in the PolMine Project and covers debates in the German Bundestags since the first meeting on September 7, 1949. GermaParl v2.2.0-rc2 covers debates until June 28, 2024. It prepares a forthcoming public release. The most important new feature of GermaParl v2.2.0 is the inclusion of an annotation layer with DBpedia URIs.

GermaParl is a quality-tested resource on German parliamentarism. Newly added material for recent partliamentary sessions has not yet been tested as comprehensively as the data in previous public releases. This is why we call on beta users to be aware of remaining errors in the data and to contribute to the quality of the resource by giving feedback.

Beta users of GermaParl v2.2.0-rc1 need to request access to the corpus and are kindly asked to give feedback on any issues they encounter, so that GermaParl v2.2.0 will be a trustworthy, high-quality resource for research on parliamentary proceedings.

The beta release includes a linguistically annotated indexed version (Corpus Workbench / CWB data format) of GermaParl.

Beta users are requested to proceed as follows:

Request access: Click the respective button on this page. A personal invitation to serve as a beta user is not required to be eligible. A short and telling note on the research interest you associate with GermaParl Beta will help us to make a quick decision.

Confirm Email: You will then receive an email from Zenodo to verify your email-address. It may take a while (up to an hour) until you receive this message. If you still do not find it in your Inbox, check the SPAM folder of your mail account. Confirm your Email address.

Confirmation of access: We need to confirm data access manually. We will consider incoming requests on a continuous basis, but please allow 2-3 days for a response.

Join GitHub issue tracker: To process feedback systematically, we use the issue tracker of a private GitHub repository. We will invite you to be a collaborator of this repository. To be able to invite you to the GitHub repository, we will ask you to provide us with your GitHub account. Please consider creating a GitHub account if you do not yet have one.

Download and install corpus: Once we have confirmed data access, Zenodo will send you an Email with a download link. Please retain this download link. If you work with the CWB variant of GermaParl, we suggest to install the corpus using functionality included in the R package cwbtools, using the following code. Insert the download link. A proper internet connection is advisable: The size of the corpus tarball is ~2,6 GB. 

# insert download link
zenodo_url <- "INSERT-ZENODO-LINK-HERE"

# install cwbtools
install.packages("cwbtools")

# install corpus
library(cwbtools)
tmp_tarball <- zenodo_get_tarball(url = zenodo_url)
corpus_install(tarball = tmp_tarball)

# install polmineR
install.packages("polmineR")

# check installation
library(polmineR)
corpus("GERMAPARL2")

If you have not used CWB indexed corpora before, the installation process will suggest and create directories for data storage. This involves defining the environment variable CORPUS_REGISTRY permanently for future R sessions.

Explore GermaParl and give feedback: Given the size of the data, it is impossible to manually check the data throughout. Remaining errors are to be expected. Your feedback will help us to prepare a consolidated official release of the updated version GermaParl!

Acknowledgements:

  • We gratefully acknowledge funding from the German National Research Data Infrastructure (Nationale Forschungsdaten-Infrastruktur / NFDI). Funding from KonsortSWD (project number 442494171) has advanced the data preparation tool set to facilitate the robust annotation of additional annotation layers in large corpora (such as Named Entities). This is instrumental for linking parliamentary data with other data. KonsortSWD is funded by the German Research Foundation (DFG) as part of the National Research Data Infrastructure Germany (Nationale Forschungsdateninfrastruktur, NFDI) under project number 442494171.
  • Funding from the Text+ consortium is instrumental for updates of the corpus, quality control and keeping data formats up with current and future developments. Text+ is funded by the German Research Foundation (DFG) as part of the NFDI under project number 460033370.
  • The data quality of GermaParl we are able to offer at this stage has benefitted significantly from a cooperation with the SOLDISK project at the University of Hildesheim, and comprehensive manual quality control of the data carried out by the SOLDISK team. A very special thanks goes to Hannes Schammann, Max Kisselew, Franziska Ziegler, Carina Böker, Jennifer Elsner and Carolin McCrea.
  • We also would like to thank our beta users which provided us with invaluable feedback and greatly enhanced the quality of the data over the course of multiple release candidates. 

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Request access

If you would like to request access to these files, please fill out the form below.

You need to satisfy these conditions in order for this request to be accepted:

If you have not been invited to serve as a beta user, please write a short and telling note on the research interest you associate with GermaParl. Access is restricted to academic users. In line with the envisaged Creative Commons licence of the official release (CC BY-SA), and results you report should refer to the PolMine Project and the authors of the corpus. Beta users are not authorized to share the data and are asked to share any issues they encounter using the issue tracker we offer for this purpose.

You are currently not logged in. Do you have an account? Log in here