Leonhardt, Christoph
Blätte, Andreas
2020-05-10
<p>The <em>ParisParl Corpus of Parliamentary Debates</em>, prepared in the <a href="http://polmine.github.io">PolMine Project</a>, comprises all protocols of plenary sessions in the French <em>Assemblée nationale</em> between 1996 and 2019. The corpus is built based on pdf documents issued by the <em>Assemblée nationale</em>. The R package <a href="https://polmine.github.io/frappp_slides/slides_en.html">frappp</a> has been used to extract structural information from the orginal text and to prepare an XML version of the corpus (preliminary TEI format). The structural annotation comprises speaker, party affiliation, parliamentary group affiliation, role, legislative period, session, date, interjections, year and agenda item.</p>
<p>This release offers a linguistically annotated and indexed format of the corpus. As part of the corpus preparation pipeline, the data has been linguistically annotated (using the <a href="https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/">TreeTagger</a> and <a href="https://stanfordnlp.github.io/stanfordnlp/">StanfordNLP</a>) and imported into the <a href="http://cwb.sourceforge.net/">Corpus Workbench (CWB)</a>. The linguistic annotation comprises POS-tagging and lemmatization.</p>
<p>This language resource is still very much in development and comes without any guarantees.</p>
https://doi.org/10.5281/zenodo.3819374
oai:zenodo.org:3819374
fra
Zenodo
https://doi.org/10.5281/zenodo.3819373
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
corpus, parliamentary protocols, France, Assembé Nationale
ParisParl Corpus of Parliamentary Debates
info:eu-repo/semantics/other