Towards making the most of BERT in neural machine translation

ByteDance AI

doi:10.5281/zenodo.6766335

Published February 1, 2020 | Version v1

Conference paper Open

Towards making the most of BERT in neural machine translation

ByteDance AI¹

1. ByteDance

Can we utilize extremely large monolingual text to improve neural machine translation without the expensive back-translation? Neural machine translation models are trained on parallel bilingual corpus. Even the large ones only include 20 to 40 millions of parallel sentence pairs. In the meanwhile, pre-trained language models such as BERT and GPT are trained on usually billions of monolingual sentences. Direct use BERT as the initialization for Transformer encoder could not gain any benefit, due to the catastrophic forgetting problem of BERT knowledge during further training on MT data. This example shows how to run the CTNMT (Yang et al. 2020) training method that integrates BERT into a Transformer MT model, the first successful method to do so.

Files

ckpt.ctnmt.zip

Files (2.8 GB)

Name	Size	Download all
ckpt.ctnmt.zip md5:34ba4f4ddd4de8db88ec0f30e5769a5a	2.8 GB	Preview Download

Views

Downloads

Show more details

	All versions	This version
Views	44	44
Downloads	7	7
Data volume	19.5 GB	19.5 GB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

Zenodo

Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: June 28, 2022
Modified: June 28, 2022

Towards making the most of BERT in neural machine translation

Creators

Description

Files

ckpt.ctnmt.zip

Files (2.8 GB)