Generic and Specialized Word Embeddings for Multi-Domain Machine Translation

doi:10.5281/zenodo.3524979

Published November 2, 2019 | Version v1

Conference paper Open

Generic and Specialized Word Embeddings for Multi-Domain Machine Translation

1. SYSTRAN / 5 rue Feydeau, 75002 Paris, France & LIMSI, CNRS, Université Paris-Saclay 91405 Orsay, France
2. SYSTRAN / 5 rue Feydeau, 75002 Paris, France
3. LIMSI, CNRS, Université Paris-Saclay 91405 Orsay, France

Supervised machine translation works well when the train and test data are sampled from the same distribution. When this is not the case, adaptation techniques help ensure that the knowledge learned from out-of-domain texts generalises to in-domain sentences. We study here a related setting, multi-domain adaptation, where the number of domains is potentially large and adapting separately to each domain would waste training resources. Our proposal transposes to neural machine translation the feature expansion technique of (Daumé III, 2007): it isolates domain-agnostic from domain-specific lexical representations, while sharing the most of the network across domains. Our experiments use two architectures and two language pairs: they show that our approach, while simple and computationally inexpensive, outperforms several strong baselines and delivers a multi-domain system that successfully translates texts from diverse sources.

Files

IWSLT2019_paper_10.pdf

Files (336.1 kB)

Name	Size	Download all
IWSLT2019_paper_10.pdf md5:33644a5b7a68b952b82c4e9c6deddc3c	336.1 kB	Preview Download

	All versions	This version
Views	248	247
Downloads	128	128
Data volume	47.0 MB	47.0 MB

Generic and Specialized Word Embeddings for Multi-Domain Machine Translation

Creators

Description

Files

IWSLT2019_paper_10.pdf

Files (336.1 kB)