Exploring Unsupervised Pretraining Objectives for Machine Translation

doi:10.5281/zenodo.6672712

Published July 9, 2021 | Version v1

Conference paper Open

Exploring Unsupervised Pretraining Objectives for Machine Translation

1. University of Edinburgh

Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT), by drastically reducing the need for large parallel data. Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder. In this work, we systematically compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context. We pretrain models with different methods on English↔German, English↔Nepali and English↔Sinhala monolingual data, and evaluate them on NMT. In (semi-) supervised NMT, varying the pretraining objective leads to surprisingly small differences in the finetuned performance, whereas unsupervised NMT is much more sensitive to it. To understand these results, we thoroughly study the pretrained models and verify that they encode and use information in different ways. We conclude that finetuning on parallel data is mostly sensitive to few properties that are shared by most models, such as a strong decoder, in contrast to unsupervised NMT that also requires models with strong cross-lingual abilities.

Files

2021.findings-acl.261.pdf

Files (17.2 MB)

Name	Size	Download all
2021.findings-acl.261.pdf md5:b9d2af7915540aec6ac2d04b4e08a138	17.2 MB	Preview Download

Additional details

GoURMET – Global Under-Resourced MEedia Translation 825299: European Commission

	All versions	This version
Views	30	30
Downloads	58	58
Data volume	1.0 GB	1.0 GB

Exploring Unsupervised Pretraining Objectives for Machine Translation

Creators

Description

Files

2021.findings-acl.261.pdf

Files (17.2 MB)

Additional details

Funding