Conference paper Closed Access

ESPnet How2 Speech Translation System for IWSLT 2019: Pre-training, Knowledge Distillation, and Going Deeper

Inaguma, Hirofumi; Kiyono, Shun; Soplin, Nelson Enrique Yalta; Suzuki, Jun; Duh, Kevin; Watanabe, Shinji


MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nam##2200000uu#4500</leader>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2019-11-02</subfield>
  </datafield>
  <controlfield tag="005">20191108100749.0</controlfield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">user-iwslt2019</subfield>
  </datafield>
  <controlfield tag="001">3525560</controlfield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">user-iwslt2019</subfield>
    <subfield code="o">oai:zenodo.org:3525560</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;This paper describes the ESPnet submissions to the How2 Speech Translation task at IWSLT2019. In this year, we mainly build our systems based on Transformer architectures in all tasks and focus on the end-to-end speech translation (E2E-ST). We first compare RNN-based models and Transformer, and then confirm Transformer models significantly and consistently outperform RNN models in all tasks and corpora. Next, we investigate pre-training of E2E-ST models with the ASR and MT tasks. On top of the pre-training, we further explore knowledge distillation from the NMT model and the deeper speech encoder, and confirm drastic improvements over the baseline model. All of our codes are publicly available in ESPnet.&lt;/p&gt;</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">RIKEN AIP &amp; Tohoku University</subfield>
    <subfield code="a">Kiyono, Shun</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Waseda University</subfield>
    <subfield code="a">Soplin, Nelson Enrique Yalta</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Tohoku University &amp; RIKEN AIP</subfield>
    <subfield code="a">Suzuki, Jun</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Johns Hopkins University</subfield>
    <subfield code="a">Duh, Kevin</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Johns Hopkins University</subfield>
    <subfield code="a">Watanabe, Shinji</subfield>
  </datafield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">closed</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">publication</subfield>
    <subfield code="b">conferencepaper</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">Kyoto University</subfield>
    <subfield code="a">Inaguma, Hirofumi</subfield>
  </datafield>
  <datafield tag="041" ind1=" " ind2=" ">
    <subfield code="a">eng</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.3525560</subfield>
    <subfield code="2">doi</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">ESPnet How2 Speech Translation System for IWSLT 2019: Pre-training, Knowledge Distillation, and Going Deeper</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">doi</subfield>
    <subfield code="i">isVersionOf</subfield>
    <subfield code="a">10.5281/zenodo.3525559</subfield>
  </datafield>
</record>
257
22
views
downloads
All versions This version
Views 257257
Downloads 2222
Data volume 3.8 MB3.8 MB
Unique views 212212
Unique downloads 1919

Share

Cite as