Published August 25, 2020 | Version v1
Dataset Open

Manually Annotated Instances of Ich ('I') from the German KoLas Corpus

  • 1. Universität Stuttgart
  • 2. Leuphana Universität Lüneburg

Description

Dataset used in Andresen/Knorr (2020). The dataset comprises 360 instances of ich ('I') taken from the German learner corpus KoLaS (Andresen/Knorr 2017, see http://hdl.handle.net/11022/0000-0001-B732-8 for full corpus access) and manually annotated with categories taken from Steinhoff (2007).

Column descriptions:

  • document: name of the document by which it can be found in the KoLaS corpus
  • code_annotator1 - code_annotator4: Annotations by four annotators. Possible values: Verfasser-Ich (author I), Forscher-Ich (researcher I), Erzähler-Ich (narrator I)
  • max_agreement_freq: Highest number of anntators that agreed on one label
  • max_agreement_label: Label on which the highest number of annotators agreed
  • context_before: 150 characters of context before the match
  • match: the match itself (either ich or Ich)
  • context_after: 150 characters of context after the match

References

Andresen M, Knorr D. KoLaS – Ein Lernendenkorpus in der Schreibberatungsausbildung einsetzen. Zeitschrift Schreiben. Published online July 5, 2017:10-17.

Andresen M, Knorr D. Exploring the Use of the Pronoun I in German Academic Texts with Machine Learning. In: Burghardt M, Müller-Birn C, eds. Methoden und Anwendungen der Computational Humanities. Lecture Notes in Informatics (LNI). Gesellschaft für Informatik; 2020.

Steinhoff T. Zum ich-Gebrauch in Wissenschaftstexten. Zeitschrift für germanistische Linguistik. 2007;35(1-2):1–26.

Files

Andresen_Knorr_2020_Ich_Annotations.txt

Files (155.1 kB)

Name Size Download all
md5:ea5a9257b8aac2513bec81f15dcb14fd
155.1 kB Preview Download

Additional details

References

  • Andresen M, Knorr D. Exploring the Use of the Pronoun I in German Academic Texts with Machine Learning. In: Burghardt M, Müller-Birn C, eds. Methoden und Anwendungen der Computational Humanities. Lecture Notes in Informatics (LNI). Gesellschaft für Informatik; 2020.