Integration of machine translation in on-line multilingual applications: Domain adaptation
Creators
Description
Large amounts of bilingual corpora are used in the training process of statistical
machine translation systems. Usually a general domain is used as the training corpus. When the system is tested using data from the same domain, the obtained
results are satisfactory, but if the test set belongs to a different domain, the trans-
lation quality decreases. This is due to insufficient lexical coverage, wrong choice
in case of polysemous words, and differences in discourse style between the two
domains. Thus, the need to adapt the system is an ongoing research task in ma-
chine translation. Some challenges in performing domain adaptation are to decide
which part of the system requires adaptation and to choose what method needs to
be applied. In this paper, we used language model interpolation as a domain adaptation method and proved that it is a fast state of the art method that can be used in
building adapted translation systems even when sparse domain specific material
is available (i.e. especially in the case of low-resourced language pairs). The best
improvement was of 15 bleu points over the baseline system.
Files
7.pdf
Files
(395.5 kB)
Name | Size | Download all |
---|---|---|
md5:790b274f1937f8edabee808ff664d731
|
395.5 kB | Preview Download |