TestWUG EN: Test Word Usage Graphs for English
Description
This data collection contains test Word Usage Graphs (WUGs) for English. Find a description of the data format, code to process the data and further datasets on the WUGsite.
The data is provided for testing purposes and thus contains specific data cases, which are sometimes artificially created, sometimes picked from existing data sets. The data contains the following cases:
- afternoon_nn: sampled from DWUG EN 2.0.1. 200 uses partly annotated by multiple annotators with 427 judgments. Has clear cluster structure with only one cluster, no graded change, no binary change, and medium agreement of 0.62 Krippendorff's alpha.
- arm: standard textbook example for semantic proximity (see reference below). Fully connected graph with six words uses, annotated by author.
- plane_nn: sampled from DWUG EN 2.0.1. 200 uses partly annotated by multiple annotators with 1152 judgments. Has clear cluster structure, high graded change, binary change, and high agreement of 0.82 Krippendorff's alpha.
- target: similar to arm, but with only two repeated sentences. Fully connected graph with six words uses, annotated by author. Same sentence (exactly same string) is annotated with 4, different string is annotated with 1.
Please find more information in the paper referenced below.
Version: 1.0.0, 05.05.2023.
Reference
Dominik Schlechtweg. 2023. Human and Computational Measurement of Lexical Semantic Change. PhD thesis. University of Stuttgart.
Notes
Files
testwug_en.zip
Files
(519.7 kB)
Name | Size | Download all |
---|---|---|
md5:b1bb63e7cc3029ecbcf526069b101ff0
|
519.7 kB | Preview Download |