A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment
Creators
- Sina Ahmadi1
- John P. McCrae1
- Sanni Nimb2
- Thomas Troelsgård2
- Sussi Olsen3
- Bolette S. Pedersen4
- Thierry Declerck5
- Tanja Wissik6
- Monica Monachini6
- Andrea Bellandi7
- Fahad Khan7
- Irene Pisani7
- Simon Krek8
- Veronika Lipp9
- Tamás Váradi9
- László Simon9
- András Győrffy9
- Carole Tiberius10
- Tanneke Schoonheim10
- Yifat Ben Moshe11
- Maya Rudich11
- Raya Abu Ahmad11
- Dorielle Lonke11
- Kira Kovalenko12
- Margit Langemets13
- Jelena Kallas13
- Oksana Dereza1
- Theodorus Fransen1
- David Cillessen1
- David Lindemann14
- Mikel Alonso14
- Ana Salgado15
- José Luis Sancho16
- Rafael-J. Ureña-Ruiz16
- Kiril Simov17
- Petya Osenova17
- Zara Kancheva17
- Ivaylo Radev17
- Ranka Stanković18
- Cvetana Krstev18
- Biljana Lazić18
- Aleksandra Marković19
- Andrej Perdih20
- Dejan Gabrovšek20
- 1. National University of Ireland Galway
- 2. Society for Danish Language and Literature
- 3. Centre for Language Technology, University of Copenhagen
- 4. Centre for Language Technology
- 5. University of Copenhagen
- 6. Austrian Centre for Digital Humanities
- 7. Istituto di Linguistica Computazionale "A. Zampolli– CNR"
- 8. Jožef Stefan Institute
- 9. Research Institute for Linguistics
- 10. Dutch Language Institute
- 11. K Dictionaries
- 12. Institute for Linguistic Studies of the Russian Academy of Sciences
- 13. Institute of the Estonian Language
- 14. Euskal Herriko Unibertsitatea, Universidad del País Vasco
- 15. Pórtico da Língua Portuguesa
- 16. Centro de estudios de la Real Academia Española
- 17. Bulgarian Academy of Sciences
- 18. University of Belgrade
- 19. Institute for Serbian Language SASA
- 20. Research Centre of the Slovenian Academy of Sciences and Arts, Fran Ramovš Institute of the Slovenian Language
Description
Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover, senses are annotated with possible semantic relationships such as broadness, narrowness, relatedness, and equivalence. In comparison to previous datasets for this task, this dataset covers a wide range of languages and resources and focuses on the more challenging task of linking general-purpose language. We believe that our data will pave the way for further advances in alignment and evaluation of word senses by creating new solutions, particularly those notoriously requiring data such as neural networks. Our resources are publicly available at https://github.com/elexis-eu/MWSA.
Files
ahmadi2020multilingual.pdf
Files
(701.4 kB)
Name | Size | Download all |
---|---|---|
md5:a5c5a828bdb98af83a4a6f0023ca3304
|
701.4 kB | Preview Download |