Domain Adaptation of Document-Level NMT in IWSLT19

doi:10.5281/zenodo.3525546

Published November 2, 2019 | Version v1

Conference paper Open

Domain Adaptation of Document-Level NMT in IWSLT19

1. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics, Malostranské námeˇstí 25, 118 00 Prague, Czech Republic & Microsoft, 1 Microsoft Way, Redmond, WA 98121, USA
2. Microsoft, 1 Microsoft Way, Redmond, WA 98121, USA

We describe our four NMT systems submitted to the IWSLT19 shared task in English→Czech text-to-text translation of TED talks. The goal of this study is to understand the interactions between document-level NMT and domain adaptation. All our systems are based on the Transformer model implemented in the Tensor2Tensor framework. Two of the systems serve as baselines, which are not adapted to the TED talks domain: SENTBASE is trained on single sen- tences, DOCBASE on multi-sentence (document-level) sequences. The other two submitted systems are adapted to TED talks: SENTFINE is fine-tuned on single sentences, DOCFINE is fine-tuned on multi-sentence sequences. We present both automatic-metrics evaluation and manual analysis of the translation quality, focusing on the differences between the four systems.

Files

IWSLT2019_paper_35.pdf

Files (153.5 kB)

Name	Size	Download all
IWSLT2019_paper_35.pdf md5:7d9efc618d2862f227e4c86d309c2590	153.5 kB	Preview Download

	All versions	This version
Views	185	184
Downloads	138	138
Data volume	22.0 MB	22.0 MB

Domain Adaptation of Document-Level NMT in IWSLT19

Creators

Description

Files

IWSLT2019_paper_35.pdf

Files (153.5 kB)