3923917
doi
10.5281/zenodo.3923917
oai:zenodo.org:3923917
Zinsmeister, Heike
Universität Hamburg
LeiKo
Jablotschkin, Sarah
Universität Hamburg
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Leichte Sprache
einfache Sprache
easy-to-read
Plain Language
Corpus
ANNIS
text simplification
news texts
newspaper
annotation
linguistics
corpus linguistics
conll
Easy Language
Easy German
Plain German
<p>LeiKo is a comparable corpus of German easy-to-read news texts. This freely available resource is systematically compiled and linguistically annotated for linguistic and computational linguistic research. LeiKo consists of 216 news and newspaper texts (approx. 56,600 tokens) and their meta data structured in four subcorpora according to the websites they were published on. All texts are tokenized, lemmatized, part-of-speech tagged and dependency parsed and can be queried in ANNIS (Krause/Zeldes 2016). A core corpus of 40 texts is manually corrected.<br>
<br>
Version 0.9 contains only the core corpus with lemma and pos annotations and can be queried here: <a href="https://corpora.uni-hamburg.de/hzsk/de/hzsk_access/annis/leiko">https://corpora.uni-hamburg.de/hzsk/de/hzsk_access/annis/leiko</a><br>
<br>
Version 1.0 comprises all 216 texts and not only lemma and pos annotations, but also syntactic annotations and metadata. The corpus is provided in the annis format, which can be directly imported into ANNIS Kickstarter.</p>
<p>Version 1.1 is identical to version 1.0, but in addition to the annis files contains all corpus texts in the conll format.</p>
<p>In version 1.3, some parts in the tazleicht subcorpus were added that were missing due to a mistake in the Python script written to gather the texts from the websites. In addition, the syntactic segmentation for lists was revised and the URL for each text was added to the set of meta data.<br>
<br>
Further versions with additional manual annotation levels will follow.</p>
<p>If you use the corpus, please cite:</p>
<p>Jablotschkin, Sarah / Heike Zinsmeister (2020): „LeiKo: A corpus of easy-to-read German“. Poster presentation at the Computational Linguistics Poster Session in the course of the 42nd annual conference of the Deutsche Gesellschaft für Sprachwissenschaft (DGfS) in Hamburg.</p>
<p>(The version of the poster you find in the download section was corrected for typos.)</p>
If you use the corpus, please cite:
Jablotschkin, Sarah / Heike Zinsmeister (2020): "LeiKo: A corpus of easy-to-read German". Poster presentation at the Computational Linguistics Poster Session in the course of the 42nd annual conference of the Deutsche Gesellschaft für Sprachwissenschaft (DGfS) in Hamburg.
(The version of the poster you find in the download section was corrected for typos.)
Zenodo
2020-06-30
info:eu-repo/semantics/other
3626763
1.3
1647451375.65788
3359810
md5:dd46ba13e79c975840c9996d4393a23b
https://zenodo.org/records/3923917/files/LeiKo_1.3.zip
public
10.5281/zenodo.3626763
isVersionOf
doi