Published December 4, 2020
| Version v1
Dataset
Open
Token-based data sets for the analysis of the academic language of literary studies and linguistics
Description
These are the token-based data sets used for my PhD thesis ("Potentiale syntaktischer Annotationen für die datengeleitete Sprachbeschreibung am Beispiel der Wissenschaftssprachen der Germanistik", publication in progress).
For python scripts and further data see https://github.com/melandresen/dissertation.
Due to copyright law, the annotated texts of the corpus could only be published without the token layer. The files provided here include the token-based frequency data that have been derived from the origial texts and can be used as input to the analysis scripts in the GitHub-Repository.
Files
1_linear_token+pos.txt
Files
(1.9 GB)
Name | Size | Download all |
---|---|---|
md5:bde4fbff7d9ad0ef51010da55330a50b
|
48.4 MB | Preview Download |
md5:4a18e39a245a94bce3858ba0c4de465e
|
42.5 MB | Preview Download |
md5:26be587b9dbf6e4bbd1ca2dfe05cf27d
|
547.1 MB | Preview Download |
md5:399238d106ebc4c005235dea1126ec20
|
641.5 MB | Preview Download |
md5:4ecb85a8a7a5c0fa82a1e665adefdf8c
|
623.1 MB | Preview Download |