00000nam##2200000uu#4500 3525010 doi 10.5281/zenodo.3525010 oai:zenodo.org:3525010 user-iwslt2019 Zeyer, Albert Human Language Technology and Pattern Recognition Group Computer Science Department, RWTH Aachen University, 52062 Aachen, Germany & AppTek, 52062 Aachen, Germany Schlüter, Ralf Human Language Technology and Pattern Recognition Group, Computer Science Department, RWTH Aachen University, 52062 Aachen, Germany Ney, Hermann Human Language Technology and Pattern Recognition Group Computer Science Department, RWTH Aachen University, 52062 Aachen, Germany & AppTek, 52062 Aachen, Germany On Using SpecAugment for End-to-End Speech Translation Bahar, Parnia Human Language Technology and Pattern Recognition Group Computer Science Department, RWTH Aachen University, 52062 Aachen, Germany & AppTek, 52062 Aachen, Germany info:eu-repo/semantics/openAccess Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode cc-by-4.0 spdx <p>This work investigates a simple data augmentation technique, SpecAugment, for end-to-end speech translation. SpecAugment is a low-cost implementation method applied directly to the audio input features and it consists of masking blocks of frequency channels, and/or time steps. We apply SpecAugment on end-to-end speech translation tasks and achieve up to +2.2% BLEU on LibriSpeech Audiobooks En→Fr and +1.2% on IWSLT TED-talks En→De by alleviating overfitting to some extent. We also examine the effectiveness of the method in a variety of data scenarios and show that the method also leads to significant improvements in various data conditions irrespective of the amount of training data.</p> Zenodo 2019-11-02 user-iwslt2019 info:eu-repo/semantics/conferencePaper 20200120164430.0 768990 md5:b101a778188679cfbdd072bdb714062c https://zenodo.org/records/3525010/files/IWSLT2019_paper_19.pdf open 10.5281/zenodo.3525009 isVersionOf doi