Conference paper Open Access

On Using SpecAugment for End-to-End Speech Translation

Bahar, Parnia; Zeyer, Albert; Schlüter, Ralf; Ney, Hermann

This work investigates a simple data augmentation technique, SpecAugment, for end-to-end speech translation. SpecAugment is a low-cost implementation method applied directly to the audio input features and it consists of masking blocks of frequency channels, and/or time steps. We apply SpecAugment on end-to-end speech translation tasks and achieve up to +2.2% BLEU on LibriSpeech Audiobooks En→Fr and +1.2% on IWSLT TED-talks En→De by alleviating overfitting to some extent. We also examine the effectiveness of the method in a variety of data scenarios and show that the method also leads to significant improvements in various data conditions irrespective of the amount of training data.

Files (769.0 kB)
Name Size
IWSLT2019_paper_19.pdf
md5:b101a778188679cfbdd072bdb714062c
769.0 kB Download
91
73
views
downloads
All versions This version
Views 9189
Downloads 7373
Data volume 56.1 MB56.1 MB
Unique views 8179
Unique downloads 6363

Share

Cite as