Manually Annotated Instances of Ich ('I') from the German KoLas Corpus
Authors/Creators
- 1. Universität Stuttgart
- 2. Leuphana Universität Lüneburg
Description
Dataset used in Andresen/Knorr (2020). The dataset comprises 360 instances of ich ('I') taken from the German learner corpus KoLaS (Andresen/Knorr 2017, see http://hdl.handle.net/11022/0000-0001-B732-8 for full corpus access) and manually annotated with categories taken from Steinhoff (2007).
Column descriptions:
- document: name of the document by which it can be found in the KoLaS corpus
- code_annotator1 - code_annotator4: Annotations by four annotators. Possible values: Verfasser-Ich (author I), Forscher-Ich (researcher I), Erzähler-Ich (narrator I)
- max_agreement_freq: Highest number of anntators that agreed on one label
- max_agreement_label: Label on which the highest number of annotators agreed
- context_before: 150 characters of context before the match
- match: the match itself (either ich or Ich)
- context_after: 150 characters of context after the match
References
Andresen M, Knorr D. KoLaS – Ein Lernendenkorpus in der Schreibberatungsausbildung einsetzen. Zeitschrift Schreiben. Published online July 5, 2017:10-17.
Andresen M, Knorr D. Exploring the Use of the Pronoun I in German Academic Texts with Machine Learning. In: Burghardt M, Müller-Birn C, eds. Methoden und Anwendungen der Computational Humanities. Lecture Notes in Informatics (LNI). Gesellschaft für Informatik; 2020.
Steinhoff T. Zum ich-Gebrauch in Wissenschaftstexten. Zeitschrift für germanistische Linguistik. 2007;35(1-2):1–26.
Files
Andresen_Knorr_2020_Ich_Annotations.txt
Files
(155.1 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:ea5a9257b8aac2513bec81f15dcb14fd
|
155.1 kB | Preview Download |
Additional details
References
- Andresen M, Knorr D. Exploring the Use of the Pronoun I in German Academic Texts with Machine Learning. In: Burghardt M, Müller-Birn C, eds. Methoden und Anwendungen der Computational Humanities. Lecture Notes in Informatics (LNI). Gesellschaft für Informatik; 2020.