Published October 25, 2021 | Version v1
Conference paper Open

Heteronym Sense Linking

  • 1. Austrian Centre for Digital Humanities and Cultural Heritage
  • 2. German Research Center for Artificial Intelligence (DFKI)
  • 3. Data Science Insitute, NUI Galway

Description

In this paper we present ongoing work which aims to semi-automatically connect pronunciation information to lexical semantic resources which currently lack such information, with a focus on WordNet. This is particularly relevant for the cases of heteronyms — homographs that have different meanings associated with different pronunciations — as this is a factor that implies a re-design and adaptation of the formal representation of the targeted lexical semantic resources: in the case of heteronyms it is not enough to just add a slot for pronunciation information to each WordNet entry. Also, there are numerous tools and resources which rely on WordNet, so we hope that enriching WordNet with valuable pronunciation information can prove beneficial for many applications in the future. Our work consists of compiling a small gold standard dataset of heteronymous words, which contains short documents created for each WordNet sense, in total 136 senses matched with their pronunciation from Wiktionary. For the task of matching WordNet senses with their corresponding Wiktionary entries, we train several supervised classifiers which rely on various similarity metrics, and we explore whether these metrics can serve as useful features as well as the quality of the different classifiers tested on our dataset. Finally, we explain in what way these results could be stored in OntoLex-Lemon and integrated to the Open English WordNet.

Files

eLex_2021_32_pp503-513.pdf

Files (633.4 kB)

Name Size Download all
md5:553dff107e9df23093d50e4dc348d803
633.4 kB Preview Download