Heteronym Sense Linking
Creators
- 1. Austrian Centre for Digital Humanities and Cultural Heritage
- 2. German Research Center for Artificial Intelligence (DFKI)
- 3. Data Science Insitute, NUI Galway
Description
In this paper we present ongoing work which aims to semi-automatically connect pronunciation information to lexical semantic resources which currently lack such information, with a focus on WordNet. This is particularly relevant for the cases of heteronyms — homographs that have different meanings associated with different pronunciations — as this is a factor that implies a re-design and adaptation of the formal representation of the targeted lexical semantic resources: in the case of heteronyms it is not enough to just add a slot for pronunciation information to each WordNet entry. Also, there are numerous tools and resources which rely on WordNet, so we hope that enriching WordNet with valuable pronunciation information can prove beneficial for many applications in the future. Our work consists of compiling a small gold standard dataset of heteronymous words, which contains short documents created for each WordNet sense, in total 136 senses matched with their pronunciation from Wiktionary. For the task of matching WordNet senses with their corresponding Wiktionary entries, we train several supervised classifiers which rely on various similarity metrics, and we explore whether these metrics can serve as useful features as well as the quality of the different classifiers tested on our dataset. Finally, we explain in what way these results could be stored in OntoLex-Lemon and integrated to the Open English WordNet.
Files
eLex_2021_32_pp503-513.pdf
Files
(633.4 kB)
Name | Size | Download all |
---|---|---|
md5:553dff107e9df23093d50e4dc348d803
|
633.4 kB | Preview Download |