There is a newer version of the record available.

Published November 18, 2019 | Version 1.0
Dataset Open

LeiKo

  • 1. Universität Hamburg

Description

LeiKo is a comparable corpus of German easy-to-read news texts. This freely available resource is systematically compiled and linguistically annotated for linguistic and computational linguistic research. LeiKo consists of 216 news and newspaper texts (approx. 56,600 tokens) and their meta data structured in four subcorpora according to the websites they were published on. All texts are tokenized, lemmatized, part-of-speech tagged and dependency parsed and can be queried in ANNIS (Krause/Zeldes 2016). A core corpus of 40 texts is manually corrected.

Version 0.9 contains only the core corpus with lemma and pos annotations and can be queried here: https://corpora.uni-hamburg.de/hzsk/de/hzsk_access/annis/leiko

Version 1.0 comprises all 216 texts and not only lemma and pos annotations, but also syntactic annotations and metadata.

Further versions with additional manual annotation levels will follow.

The corpus is provided in the annis format, which can be directly imported into ANNIS Kickstarter.

Files

Files (11.8 MB)

Name Size Download all
md5:bca66716b9ca2d13a4c28be69b276777
3 Bytes Download
md5:026c348f477693e7032239f1340718b6
290.8 kB Download
md5:3a9688d5399c2836f76979bdb9a1c4be
30.0 kB Download
md5:9cd89c8146e648a826efbc14ee443775
85.4 kB Download
md5:aed68c4d1dbc6213518bcc6ca0262223
1.2 MB Download
md5:2ea8207e5c8871e690007ab98d67e5f5
4.5 MB Download
md5:a3738dc5515ad2d54e2bccbfa9a99b10
3.7 MB Download
md5:b858edba95d89a7ea77852f40788b940
1.7 MB Download
md5:ed326c2f79965e98d908e7db53a7b76a
178 Bytes Download
md5:7ae5ff0e5979ba3d607755724e2955e8
344.9 kB Download