Published October 23, 2018 | Version v1
Book chapter Open

Exploiting multilingual lexical resources to predict MWE compositionality

  • 1. The University of Melbourne

Description

Semantic idiomaticity is the extent to which the meaning of a multiword expression (MWE) cannot be predicted from the meanings of its component words. Much
work in natural language processing on semantic idiomaticity has focused on compositionality prediction, wherein a binary or continuous-valued compositionality
score is predicted for an MWE as a whole, or its individual component words. One
source of information for making compositionality predictions is the translation
of an MWE into other languages. This chapter extends two previously-presented
studies – Salehi & Cook (2013) and Salehi et al. (2014) – that propose methods for
predicting compositionality that exploit translation information provided by multilingual lexical resources, and that are applicable to many kinds of MWEs in a
wide range of languages. These methods make use of distributional similarity of
an MWE and its component words under translation into many languages, as well
as string similarity measures applied to definitions of translations of an MWE and
its component words. We evaluate these methods over English noun compounds,
English verb-particle constructions, and German noun compounds. We show that
the estimation of compositionality is improved when using translations into multiple languages, as compared to simply using distributional similarity in the source
language. We further find that string similarity complements distributional similarity.

 

Files

13.pdf

Files (303.9 kB)

Name Size Download all
md5:86cc79dff36b98bcab48011daf70c38f
303.9 kB Preview Download