XL-WSD-LLM: Extending XL-WSD to evaluate Large Language Models
Contributors
Project manager:
Project members:
Description
This benchmark extends XL-WSD. Starting from XL-WSD, we build a set of prompts for evaluating Large Language Models (LLMs) in two settings. The first is a multiple-choice task, and the second is a generative task in which we assess the quality of the generated definition.
The benchmark consists of three compressed archives. Two archives contain training and test data for each task and language, while another is dedicated to the output of several LLMs that we evaluate. Each dataset includes data split into two folders: FT and TT. FT contains data without machine translation, while TT contains data where missing glosses are automatically translated.
More details are available in the pre-print article "Exploring the Word Sense Disambiguation Capabilities of Large Language Models," published on arXiv.org.
Files
Files
(394.5 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:bcdf35f090c761179d11ff65f6ceec47
|
51.8 MB | Download |
|
md5:59616a82ef2ac519a004cfa058439c5e
|
4.1 MB | Download |
|
md5:c7c1918a3f86c428a5e267d2cc84f4f1
|
338.6 MB | Download |
Additional details
Dates
- Updated
-
2025-03-11