Planned intervention: On Wednesday June 26th 05:30 UTC Zenodo will be unavailable for 10-20 minutes to perform a storage cluster upgrade.
Published July 29, 2021 | Version v1
Conference paper Open

SemEval-2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation (MCL-WiC)

  • 1. Sapienza NLP, Sapienza University of Rome

Description

SemEval-2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation (MCL-WiC)

Task Description

Multilingual and Cross-lingual Word-in-Context Disambiguation (MCL-WiC) is the first SemEval task for Word-in-Context disambiguation which tackles the challenge of capturing the polysemous nature of words without relying on a fixed sense inventory in a multilingual and cross-lingual setting. MCL-WiC provides a single high-quality framework for the performance evaluation of a wide range of approaches aimed at evaluating the capability of a system to deeply understand word meaning. Compared to other datasets, MCL-WiC brings the following novelties:

  • it addresses multilinguality and cross-linguality,
  • it provides coverage of all parts of speech, and
  • it covers a high number of domains and genres.

Participating systems will be asked to perform a binary classification task in which they indicate whether the target word is used in the same meaning (tagged as T for true) or in a different meaning (F for false) in the same language (multilingual sub-task) or across different languages (cross-lingual sub-task). Below you can find two examples of sentence pairs, the first one from the multilingual part and the second one from the cross-lingual part:

  • la souris mange le fromage -- le chat court après la souris
  • click the right mouse button -- le chat court après la souris

In the first sentence pair, the target word souris will be tagged with T (True) since it is used in the same meaning in both sentences. Instead, in the second sentence pair, the target word mouse and its corresponding translation into French are used in two distinct meanings, therefore, in this case, the expected output will be F (False).
MCL-WiC covers the following languages: Arabic, Chinese, English, French and Russian.

Files included

SemEval-2021_MCL-WiC_trial.zip: trial data
SemEval-2021_MCL-WiC_all-datasets.zip: training, development and test data
SemEval-2021_MCL-WiC_test-gold-data.zip: gold answers

Key links

Github data repository: https://github.com/SapienzaNLP/mcl-wic
Codalab website: https://competitions.codalab.org/competitions/27054
Link to the paper: https://aclanthology.org/2021.semeval-1.3.pdf

Reference

Martelli, F., Kalach, N., Tola, G and Navigli, R. SemEval-2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation (MCL-WiC). Proc. of the 15th Workshop on Semantic Evaluation, 2021.

BibTex:

@inproceedings{martelli-etal-2021-mclwic, title = "{S}em{E}val-2021 {T}ask 2: {M}ultilingual and {C}ross-lingual {W}ord-in-{C}ontext {D}isambiguation ({MCL}-{W}i{C})", author= "Martelli, Federico and Kalach, Najla and Tola, Gabriele and Navigli, Roberto", booktitle="Proceedings of the Fifteenth Workshop on Semantic Evaluation (SemEval-2021)", year={2021} }

Files

MCL-WiC.zip

Files (2.9 MB)

Name Size Download all
md5:b3b883ab107437477bcf5912c616ce29
2.9 MB Preview Download

Additional details

Funding

ELEXIS – European Lexicographic Infrastructure 731015
European Commission