Published June 18, 2018 | Version v1
Book chapter Open

Integration of machine translation in on-line multilingual applications: Domain adaptation

Description

Large amounts of bilingual corpora are used in the training process of statistical
machine translation systems. Usually a general domain is used as the training corpus. When the system is tested using data from the same domain, the obtained
results are satisfactory, but if the test set belongs to a different domain, the trans-
lation quality decreases. This is due to insufficient lexical coverage, wrong choice
in case of polysemous words, and differences in discourse style between the two
domains. Thus, the need to adapt the system is an ongoing research task in ma-
chine translation. Some challenges in performing domain adaptation are to decide
which part of the system requires adaptation and to choose what method needs to
be applied. In this paper, we used language model interpolation as a domain adaptation method and proved that it is a fast state of the art method that can be used in
building adapted translation systems even when sparse domain specific material
is available (i.e. especially in the case of low-resourced language pairs). The best
improvement was of 15 bleu points over the baseline system.

 

Files

7.pdf

Files (395.5 kB)

Name Size Download all
md5:790b274f1937f8edabee808ff664d731
395.5 kB Preview Download