There is a newer version of the record available.

Published May 5, 2023 | Version 1.0.0
Dataset Open

TestWUG EN: Test Word Usage Graphs for English

  • 1. University of Stuttgart

Description

This data collection contains test Word Usage Graphs (WUGs) for English. Find a description of the data format, code to process the data and further datasets on the WUGsite.

The data is provided for testing purposes and thus contains specific data cases, which are sometimes artificially created, sometimes picked from existing data sets. The data contains the following cases:

  • afternoon_nn: sampled from DWUG EN 2.0.1. 200 uses partly annotated by multiple annotators with 427 judgments. Has clear cluster structure with only one cluster, no graded change, no binary change, and medium agreement of 0.62 Krippendorff's alpha.
  • arm: standard textbook example for semantic proximity (see reference below). Fully connected graph with six words uses, annotated by author.
  • plane_nn: sampled from DWUG EN 2.0.1. 200 uses partly annotated by multiple annotators with 1152 judgments. Has clear cluster structure, high graded change, binary change, and high agreement of 0.82 Krippendorff's alpha.
  • target: similar to arm, but with only two repeated sentences. Fully connected graph with six words uses, annotated by author. Same sentence (exactly same string) is annotated with 4, different string is annotated with 1.

Please find more information in the paper referenced below.

Version: 1.0.0, 05.05.2023.

Reference

Dominik Schlechtweg. 2023. Human and Computational Measurement of Lexical Semantic Change. PhD thesis. University of Stuttgart.

Notes

For testing purposes only.

Files

testwug_en.zip

Files (519.7 kB)

Name Size Download all
md5:b1bb63e7cc3029ecbcf526069b101ff0
519.7 kB Preview Download