Published September 7, 2017 | Version v1
Conference paper Open

Sense-Aware Statistical Machine Translation using Adaptive Context-Dependent Clustering

  • 1. Idiap Research Institute

Description

Statistical machine translation (SMT) systems use local cues from n-gram translation and language models to select the translation of each source word.  Such systems do not explicitly perform word sense disambiguation (WSD), although this would enable them to select translations depending on the hypothesized sense of each word.  Previous attempts to constrain word translations based on the results of generic WSD systems have suffered from their limited accuracy. We demonstrate that WSD systems can be adapted to help SMT, thanks to three key achievements: (1)~we consider a larger context for WSD than SMT can afford to consider; (2)~we adapt the number of senses per word to the ones observed in the training data using clustering-based WSD with K-means; and (3)~we initialize sense-clustering with definitions or examples extracted from WordNet.  Our WSD system is competitive, and in combination with a factored SMT system improves noun and verb translation from English to Chinese, Dutch, French, German, and Spanish.

Files

wmt2017.pdf

Files (744.0 kB)

Name Size Download all
md5:f9b06e55d6252e7cf5397da0bf8c6ddf
744.0 kB Preview Download

Additional details

Funding

SUMMA – Scalable Understanding of Multilingual Media 688139
European Commission