Published June 11, 2024 | Version 1.0.1
Dataset Open

Lexical Semantic Change Cause-Type-Definitions Benchmark

  • 1. ROR icon University of Gothenburg
  • 2. ROR icon Vrije Universiteit Brussel
  • 3. ROR icon Research Foundation - Flanders
  • 4. ROR icon KU Leuven

Description

The Lexical Semantic Change Cause-Type-Definitions (LSC-CTD) Benchmark is a digitised dataset that builds on and extends the Blank's seminal 1997 taxonomy of semantic change. This collection categorises 657 instances of linguistic evolution across the vocabulary of the Romance languages, with additional entries of German and English instances. Each entry is accompanied by a new pair (Old and New Meaning) of english definitions, manually curated by a historical linguist.

The dataset includes a detailed classification of causes of change such as semantic wear, lexical gap, orphaned word, lexical complexity, atypical actant structure, frame, socio-cultural change, abstract concept, atypical part of speech, new concept, taboo, expressivity and prototype. It also includes types of semantic shift as classified by Blank, i.e. specialisation, generalisation, co-hyponymous transfer, auto-antonym, metaphor, antiphrasis, metonymy, auto-converse, ellipsis, folk etymology, analogy, meaning dilution, meaning reinforcement and doubtful cases. 


Reference

The accompanying paper where this resource is described in detail will be published at ACL 2024.

Pierluigi Cassotti, Stefano De Pascale, and Nina Tahmasebi. 2024. Using Synchronic Definitions and Semantic Relations to Classify Semantic Change Types. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4539–4553, Bangkok, Thailand. Association for Computational Linguistics.

Files

Files (196.9 kB)

Name Size Download all
md5:cc64cc7d19dfe28467aa7e5662037185
196.9 kB Download