Published May 11, 2020 | Version v1
Conference paper Open

A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment

  • 1. National University of Ireland Galway
  • 2. Society for Danish Language and Literature
  • 3. Centre for Language Technology, University of Copenhagen
  • 4. Centre for Language Technology
  • 5. University of Copenhagen
  • 6. Austrian Centre for Digital Humanities
  • 7. Istituto di Linguistica Computazionale "A. Zampolli– CNR"
  • 8. Jožef Stefan Institute
  • 9. Research Institute for Linguistics
  • 10. Dutch Language Institute
  • 11. K Dictionaries
  • 12. Institute for Linguistic Studies of the Russian Academy of Sciences
  • 13. Institute of the Estonian Language
  • 14. Euskal Herriko Unibertsitatea, Universidad del País Vasco
  • 15. Pórtico da Língua Portuguesa
  • 16. Centro de estudios de la Real Academia Española
  • 17. Bulgarian Academy of Sciences
  • 18. University of Belgrade
  • 19. Institute for Serbian Language SASA
  • 20. Research Centre of the Slovenian Academy of Sciences and Arts, Fran Ramovš Institute of the Slovenian Language

Description

Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover, senses are annotated with possible semantic relationships such as broadness, narrowness, relatedness, and equivalence. In comparison to previous datasets for this task, this dataset covers a wide range of languages and resources and focuses on the more challenging task of linking general-purpose language. We believe that our data will pave the way for further advances in alignment and evaluation of word senses by creating new solutions, particularly those notoriously requiring data such as neural networks. Our resources are publicly available at https://github.com/elexis-eu/MWSA.

Files

ahmadi2020multilingual.pdf

Files (701.4 kB)

Name Size Download all
md5:a5c5a828bdb98af83a4a6f0023ca3304
701.4 kB Preview Download

Additional details

Funding

ELEXIS – European Lexicographic Infrastructure 731015
European Commission