Published December 4, 2020 | Version v1
Dataset Open

Token-based data sets for the analysis of the academic language of literary studies and linguistics

  • 1. Universität Stuttgart

Description

These are the token-based data sets used for my PhD thesis ("Potentiale syntaktischer Annotationen für die datengeleitete Sprachbeschreibung am Beispiel der Wissenschaftssprachen der Germanistik", publication in progress).

For python scripts and further data see https://github.com/melandresen/dissertation.

Due to copyright law, the annotated texts of the corpus could only be published without the token layer. The files provided here include the token-based frequency data that have been derived from the origial texts and can be used as input to the analysis scripts in the GitHub-Repository.

Files

1_linear_token+pos.txt

Files (1.9 GB)

Name Size Download all
md5:bde4fbff7d9ad0ef51010da55330a50b
48.4 MB Preview Download
md5:4a18e39a245a94bce3858ba0c4de465e
42.5 MB Preview Download
md5:26be587b9dbf6e4bbd1ca2dfe05cf27d
547.1 MB Preview Download
md5:399238d106ebc4c005235dea1126ec20
641.5 MB Preview Download
md5:4ecb85a8a7a5c0fa82a1e665adefdf8c
623.1 MB Preview Download