Token-based data sets for the analysis of the academic language of literary studies and linguistics

doi:10.5281/zenodo.4306015

Published December 4, 2020 | Version v1

Dataset Open

Token-based data sets for the analysis of the academic language of literary studies and linguistics

Melanie Andresen¹

1. Universität Stuttgart

These are the token-based data sets used for my PhD thesis ("Potentiale syntaktischer Annotationen für die datengeleitete Sprachbeschreibung am Beispiel der Wissenschaftssprachen der Germanistik", publication in progress).

For python scripts and further data see https://github.com/melandresen/dissertation.

Due to copyright law, the annotated texts of the corpus could only be published without the token layer. The files provided here include the token-based frequency data that have been derived from the origial texts and can be used as input to the analysis scripts in the GitHub-Repository.

Files

1_linear_token+pos.txt

Files (1.9 GB)

Name	Size	Download all
1_linear_token+pos.txt md5:bde4fbff7d9ad0ef51010da55330a50b	48.4 MB	Preview Download
1_linear_token.txt md5:4a18e39a245a94bce3858ba0c4de465e	42.5 MB	Preview Download
3_linear_token+pos.txt md5:26be587b9dbf6e4bbd1ca2dfe05cf27d	547.1 MB	Preview Download
3_syntactic_token+pos+deprel.txt md5:399238d106ebc4c005235dea1126ec20	641.5 MB	Preview Download
3_syntactic_token+pos.txt md5:4ecb85a8a7a5c0fa82a1e665adefdf8c	623.1 MB	Preview Download

	All versions	This version
Views	297	292
Downloads	131	126
Data volume	68.4 GB	63.9 GB

Token-based data sets for the analysis of the academic language of literary studies and linguistics

Creators

Description

Files

1_linear_token+pos.txt

Files (1.9 GB)