SemR-11: A Multi-Lingual Gold-Standard for Semantic Similarity and Relatedness for Eleven Languages

doi:10.5281/zenodo.1228905

Published May 12, 2018 | Version v1

Conference paper Open

SemR-11: A Multi-Lingual Gold-Standard for Semantic Similarity and Relatedness for Eleven Languages

1. Insight Centre for Data Analytics, National University of Ireland, Galway
2. Department of Computer Science, Maynooth University
3. Department of Mathematics and Computer Science, University of Passau
4. School of Computer Science, The University of Manchester

This work describes SemR-11, a multi-lingual dataset for evaluating semantic similarity and relatedness for 11 languages (German, French, Russian, Italian, Dutch, Chinese, Portuguese, Swedish, Spanish, Arabic and Persian). Semantic similarity and relatedness gold standards have been initially used to support the evaluation of semantic distance measures in the context of linguistic and knowledge resources and distributional semantic models. SemR-11 builds upon the English gold-standards of Miller & Charles (MC), Rubenstein & Goodenough (RG), WordSimilarity 353 (WS-353), and Simlex-999, providing a canonical translation for them. The final dataset consists of 15,917 word pairs and can be used to support the construction and evaluation of semantic similarity/relatedness and distributional semantic models. As a case study, the SemR-11 test collections was used to investigate how different distributional semantic models built from corpora in different languages and with different sizes perform in computing semantic relatedness similarity and relatedness tasks.

Files

LREC-SemR-11.pdf

Files (479.3 kB)

Name	Size	Download all
LREC-SemR-11.pdf md5:fb8b1788eaacd2b8226dab4b4cdb51d9	479.3 kB	Preview Download

Additional details

SSIX – Social Sentiment analysis financial IndeXes 645425: European Commission

	All versions	This version
Views	157	157
Downloads	59	59
Data volume	29.7 MB	29.7 MB

SemR-11: A Multi-Lingual Gold-Standard for Semantic Similarity and Relatedness for Eleven Languages

Creators

Description

Files

LREC-SemR-11.pdf

Files (479.3 kB)

Additional details

Funding