Published July 31, 2020 | Version v4
Dataset Open

SemEval-2020 Task 3: Graded Word Similarity in Context

  • 1. Queen Mary University of London
  • 2. Jožef Stefan Institute
  • 3. University of Ljubljana
  • 4. University of Cambridge
  • 5. Tehran Institute for Advanced Studies

Description

For this tasks we ask participants to build systems that try to predict the effect that context has in human perception of similarity of words.

We have seen very interesting work that uses local context to predict discrete changes in meaning: the different senses of a polysemous word. However context also has more subtle, continuous (graded) effects on meaning, even for words not necessarily considered polysemous.

In order to be able to look at these effects we are building several datasets where we ask annotators to score how similar a pair of words are after they have read a short paragraph (which contains the two words). Each pair is scored within two of these paragraphs, allowing us to look at changes in similarity ratings due to context.

CodaLab was used to run this task, you can see the dedicated website and the results of the participants at: https://competitions.codalab.org/competitions/20905

Files

cosimlex_dataset.zip

Files (710.1 kB)

Name Size Download all
md5:7c475831789c4065645ddc2bf4252fc2
203.1 kB Preview Download
md5:ec199c90b8246b4ec57c7804e2d01ddb
357.3 kB Preview Download
md5:b1fe2bc09f3937c78cc48f0369284b81
146.9 kB Preview Download
md5:e39699bfb73966f674aefb54feb338e2
2.8 kB Preview Download

Additional details

Related works

Is cited by
Conference paper: https://www.aclweb.org/anthology/2020.lrec-1.720/ (URL)

Funding

EMBEDDIA – Cross-Lingual Embeddings for Less-Represented Languages in European News Media 825153
European Commission