ESPnet How2 Speech Translation System for IWSLT 2019: Pre-training, Knowledge Distillation, and Going Deeper

Inaguma, Hirofumi; Kiyono, Shun; Soplin, Nelson Enrique Yalta; Suzuki, Jun; Duh, Kevin; Watanabe, Shinji

doi:10.5281/zenodo.3525560

Published November 2, 2019 | Version v1

Conference paper Restricted

ESPnet How2 Speech Translation System for IWSLT 2019: Pre-training, Knowledge Distillation, and Going Deeper

1. Kyoto University
2. RIKEN AIP & Tohoku University
3. Waseda University
4. Tohoku University & RIKEN AIP
5. Johns Hopkins University

This paper describes the ESPnet submissions to the How2 Speech Translation task at IWSLT2019. In this year, we mainly build our systems based on Transformer architectures in all tasks and focus on the end-to-end speech translation (E2E-ST). We first compare RNN-based models and Transformer, and then confirm Transformer models significantly and consistently outperform RNN models in all tasks and corpora. Next, we investigate pre-training of E2E-ST models with the ASR and MT tasks. On top of the pre-training, we further explore knowledge distillation from the NMT model and the deeper speech encoder, and confirm drastic improvements over the baseline model. All of our codes are publicly available in ESPnet.

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Citations

Oops! Something went wrong while fetching results.

	All versions	This version
Views	362	360
Downloads	19	19
Data volume	3.8 MB	3.8 MB

ESPnet How2 Speech Translation System for IWSLT 2019: Pre-training, Knowledge Distillation, and Going Deeper

Creators

Description

Files

Restricted