Published September 23, 2019 | Version 1.0
Dataset Open

German Innsbruck Corpus (GermInnC) 1800-1950

  • 1. University of Innsbruck

Description

A digital corpus on variation in German (1800-1950)

The German Innsbruck Corpus (GermInnC) 1800-1950 is a digitised corpus built after the fashion of the German Manchester Corpus (GerManC) 1650-1800 (cf. Scheible et al. 2011; Durrell et al. 2012). Hence, the corpus design of the GermInnC is balanced according to period, region and genre.

The GermInnC consists of ca. 840,000 tokens, ca. 120,000 per genre (seven in total: Drama, Humanities, Legal texts, Narrative prose, Newspapers, Scientific texts, Sermons). It is subdivided into three periods, 1800-1850, 1851-1900 und 1901-1950, as well as five regions, North German, West Central German, East Central German, West Upper German (including Switzerland), East Upper German (including Austria).

The corpus can be retrieved in a raw version, a lemmatised, fully-annotated version, or an “all data” file (including metadata annotation of file names and periods) for further import and processing. The Stuttgart Tag Set (STTS) and the POS-Tagger TreeTagger was used for linguistic annotation.

Two documentation files (word and excel, both included in the download package), provide a more detailed description of the corpus and the digitisation.

The corpus may be of interest to all scholars working on the history of the German language, standardisation of German, variation and change, historical sociolinguistics, and Germanic linguistics.

The corpus was generously funded by the early career funding of the University of Innsbruck (October 2018 through September 2019).

Files

GermInnC_release23092019.zip

Files (11.8 MB)

Name Size Download all
md5:36c218c1965f7c0ead790294241063e8
11.8 MB Preview Download