DWUG DE: Diachronic Word Usage Graphs for German
Creators
- 1. University of Stuttgart
- 2. King's College London, The Alan Turing Institute
- 3. University of Gothenburg
- 4. University of Cambridge
Description
This data collection contains diachronic Word Usage Graphs (WUGs) for German. Find a description of the data format, code to process the data and further datasets on the WUGsite.
We provide additional data under misc/
:
- dwug_de_sense: a subset of DWUG DE was annotated with classical word sense definitions (DWUG DE Sense, see
misc/dwug_de_sense/data/*/judgments_senses.csv
). This folder provides clusterings, change scores and inferred binary semantic proximity labels for this subset. The clusters, statistics and proximity labels undermaj_2
andmaj_3
were derived from the sense annotation by removing instances where not at least 2/3 annotators agree on the label. Note that the binary proximity labels undermisc/dwug_de_sense/data/*/maj_[2/3]/judgments.csv
('0' for different sense, '1' for same sense) were derived from the sense annotation, and not directly judged by humans (in contrast to other WUG data sets). Note that consequently also the scores EARLIER, LATER and COMPARE are not calculated directly from human judgments (as for other WUG data sets), but from the inferred binary proximity labels. Please find the code deriving the clusters, change scores and proximity labels in the WUG repository. The data set is described in more detail in Schlechtweg (2022).
See previous versions for additional testsets.
Please find more information on the provided data in the paper referenced below.
Version: 2.3.0, 15.12.2022. Contains additional clusterings, change scores and binary semantic proximity labels derived from DWUG DE Sense. Important: Version 2.0.0 extends previous versions with one more annotation round and new clusterings.
Reference
Dominik Schlechtweg, Nina Tahmasebi, Simon Hengchen, Haim Dubossarsky, Barbara McGillivray. 2021. DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.
Dominik Schlechtweg. 2022. Human and Computational Measurement of Lexical Semantic Change. PhD thesis. University of Stuttgart.
Notes
Files
dwug_de.zip
Files
(14.6 MB)
Name | Size | Download all |
---|---|---|
md5:0a5eea68d253f5343932b6055759a3b0
|
14.6 MB | Preview Download |
Additional details
Related works
- Continues
- Dataset: 10.5281/zenodo.5541274 (DOI)
- Is published in
- Conference paper: arXiv:2104.08540 (arXiv)
- Is supplement to
- Dataset: 10.5281/zenodo.5255227 (DOI)
- Dataset: 10.5281/zenodo.5090647 (DOI)
- Dataset: 10.5281/zenodo.5544443 (DOI)