MultiLexBATS: Multilingual Dataset of Lexical Semantic Relations
Creators
- Gromann, Dagmar (Contact person)1
- Gonçalo Oliveira, Hugo (Contact person)2
- Pitarch, Lucia3
- Apostol, Elena-Simona4
- Bernad, Jordi
- Bytyçi, Eliot5
- Cantone, Chiara
- Carvalho, Sara6, 7
- Frontini, Francesca8
- Garabik, Radovan9
- Gracia, Jorge3
- Granata, Letizia
- Khan, Anas Fahad8
- Knez, Timotej10
- Labropoulou, Penny11
- Liebeskind, Chaya12
- di Buono, Maria Pia13
- Ostroški Anić, Ana14
- Rackevičienė, Sigita15
- Rodrigues, Ricardo
- Sérasset, Gilles16, 17, 18
- Selmistraitis, Linas15
- Sidibé, Mahammadou
- Silvano, Purificação19
- Spahiu, Blerina
- Sogutlu, Enriketa
- Stanković, Ranka
- Truică, Ciprian-Octavian20, 21
- Valūnaitė Oleškevičienė, Giedrė
- Zitnik, Slavko
- Zdravkova, Katerina
- 1. University of Vienna
- 2. University of Coimbra
- 3. Universidad de Zaragoza
- 4. Universitatea Națională de Știință și Tehnologie Politehnica București
- 5. University of Prishtina
- 6. University of Aveiro
- 7. Universidade Nova de Lisboa
- 8. Institute for Computational Linguistics "A. Zampolli"
- 9. Slovak Academy of Sciences
- 10. University of Ljubljana
- 11. Athena Research and Innovation Center In Information Communication & Knowledge Technologies
- 12. Jerusalem College of Technology
- 13. University of Naples - L'Orientale
- 14. Institute of Croatian Language and Linguistics
- 15. Mykolas Romeris University
- 16. Université Grenoble Alpes
- 17. Laboratoire d'Informatique de Grenoble
- 18. Universite Grenoble Alpes UFR Informatique Mathématiques et Mathématiques Appliquées de Grenoble
- 19. Universidade do Porto Faculdade de Letras
- 20. Uppsala Universitet
- 21. Universitatea Politehnica din Bucuresti
Description
Understanding the relation between the meanings of words is an important part of comprehending natural language. Prior work has either focused on analysing lexical semantic relations in word embeddings or probing pretrained language models (PLMs), with some exceptions. Given the rarity of highly multilingual benchmarks, it is unclear to what extent PLMs capture relational knowledge and are able to transfer it across languages. To start addressing this question, we propose MultiLexBATS, a multilingual parallel dataset of lexical semantic relations adapted from BATS in 15 languages including low-resource languages, such as Bambara, Lithuanian, and Albanian. As experiment on cross-lingual transfer of relational knowledge, we test the PLMs’ ability to (1) capture analogies across languages, and (2) predict translation targets. We find considerable differences across relation types and languages with a clear preference for hypernymy and antonymy as well as romance languages.
Files
2475_Paper.pdf
Files
(233.9 kB)
Name | Size | Download all |
---|---|---|
md5:b1ecbb671008b71b99849b3b3d4add7a
|
233.9 kB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/nexuslinguarum/MultiLexBATS