Published December 15, 2022 | Version 2.3.0
Dataset Open

DWUG DE: Diachronic Word Usage Graphs for German

  • 1. University of Stuttgart
  • 2. King's College London, The Alan Turing Institute
  • 3. University of Gothenburg
  • 4. University of Cambridge

Description

This data collection contains diachronic Word Usage Graphs (WUGs) for German. Find a description of the data format, code to process the data and further datasets on the WUGsite.

We provide additional data under misc/:

  • dwug_de_sense: a subset of DWUG DE was annotated with classical word sense definitions (DWUG DE Sense, see misc/dwug_de_sense/data/*/judgments_senses.csv). This folder provides clusterings, change scores and inferred binary semantic proximity labels for this subset. The clusters, statistics and proximity labels under maj_2 and maj_3 were derived from the sense annotation by removing instances where not at least 2/3 annotators agree on the label. Note that the binary proximity labels under misc/dwug_de_sense/data/*/maj_[2/3]/judgments.csv ('0' for different sense, '1' for same sense) were derived from the sense annotation, and not directly judged by humans (in contrast to other WUG data sets). Note that consequently also the scores EARLIER, LATER and COMPARE are not calculated directly from human judgments (as for other WUG data sets), but from the inferred binary proximity labels. Please find the code deriving the clusters, change scores and proximity labels in the WUG repository. The data set is described in more detail in Schlechtweg (2022).

See previous versions for additional testsets.

Please find more information on the provided data in the paper referenced below.

Version: 2.3.0, 15.12.2022. Contains additional clusterings, change scores and binary semantic proximity labels derived from DWUG DE Sense. Important: Version 2.0.0 extends previous versions with one more annotation round and new clusterings.

Reference

Dominik Schlechtweg, Nina Tahmasebi, Simon Hengchen, Haim Dubossarsky, Barbara McGillivray. 2021. DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.

Dominik Schlechtweg. 2022. Human and Computational Measurement of Lexical Semantic Change. PhD thesis. University of Stuttgart.

Notes

Contains additional clusterings, change scores and binary semantic proximity labels derived from DWUG DE Sense. Important: Version 2.0.0 extends previous versions with one more annotation round and new clusterings.

Files

dwug_de.zip

Files (14.6 MB)

Name Size Download all
md5:0a5eea68d253f5343932b6055759a3b0
14.6 MB Preview Download

Additional details

Related works

Continues
Dataset: 10.5281/zenodo.5541274 (DOI)
Is published in
Conference paper: arXiv:2104.08540 (arXiv)
Is supplement to
Dataset: 10.5281/zenodo.5255227 (DOI)
Dataset: 10.5281/zenodo.5090647 (DOI)
Dataset: 10.5281/zenodo.5544443 (DOI)