Data Augmentation for End-to-End Speech Translation: FBK@IWSLT '19

Di Gangi, Mattia A.; Negri, Matteo; Nguyen, Viet Nhat; Tebbifakhr, Amirhossein; Turchi, Marco

doi:10.5281/zenodo.3525492

Published November 2, 2019 | Version v1

Conference paper Open

Data Augmentation for End-to-End Speech Translation: FBK@IWSLT '19

1. Fondazione Bruno Kessler, Trento, Italy & University of Trento, Italy
2. Fondazione Bruno Kessler, Trento, Italy
3. University of Trento, Italy

This paper describes FBK’s submission to the end-to-end speech translation (ST) task at IWSLT 2019. The task consists in the “direct” translation (i.e. without intermediate discrete representation) of English speech data derived from TED Talks or lectures into German texts. Our participation had a twofold goal: i) testing our latest models, and ii) eval- uating the contribution to model training of different data augmentation techniques. On the model side, we deployed our recently proposed S-Transformer with logarithmic distance penalty, an ST-oriented adaptation of the Transformer architecture widely used in machine translation (MT). On the training side, we focused on data augmentation techniques recently proposed for ST and automatic speech recognition (ASR). In particular, we exploited augmented data in different ways and at different stages of the process. We first trained an end-to-end ASR system and used the weights of its encoder to initialize the decoder of our ST model (transfer learning). Then, we used an English-German MT system trained on large data to translate the English side of the English-French training set into German, and used this newly-created data as additional training material. Finally, we trained our models using SpecAugment, an augmentation technique that randomly masks portions of the spectrograms in order to make them different at every training epoch. Our synthetic corpus and SpecAugment resulted in an improvement of 5 BLEU points over our baseline model on the test set of MuST-C En-De, reaching the score of 22.3 with a single end-to-end system.

Files

IWSLT2019_paper_29.pdf

Files (225.1 kB)

Name	Size	Download all
IWSLT2019_paper_29.pdf md5:a55f57f52a1a233dd7abef7cb228f832	225.1 kB	Preview Download

	All versions	This version
Views	731	730
Downloads	382	382
Data volume	93.0 MB	93.0 MB

Data Augmentation for End-to-End Speech Translation: FBK@IWSLT '19

Authors/Creators

Description

Files

IWSLT2019_paper_29.pdf

Files (225.1 kB)