Analysis of Positional Encodings for Neural Machine Translation

10.5281/zenodo.3525024 https://zenodo.org/records/3525024 oai:zenodo.org:3525024 Rosendahl, Jan Jan Rosendahl Human Language Technology and Pattern Recognition Group, RWTH Aachen University, Germany Tran, Viet Anh Khoa Viet Anh Khoa Tran Human Language Technology and Pattern Recognition Group, RWTH Aachen University, Germany Wang, Weiyue Weiyue Wang Human Language Technology and Pattern Recognition Group, RWTH Aachen University, Germany Ney, Hermann Hermann Ney Human Language Technology and Pattern Recognition Group, RWTH Aachen University, Germany Analysis of Positional Encodings for Neural Machine Translation Zenodo 2019 2019-11-02 2020-01-20 10.5281/zenodo.3525023 https://zenodo.org/communities/iwslt2019 Creative Commons Attribution 4.0 International In this work we analyze and compare the behavior of the Transformer architecture when using different positional encoding methods. While absolute and relative positional encoding perform equally strong overall, we show that relative positional encoding is vastly superior (4.4% to 11.9% BLEU) when translating a sentence that is longer than any observed training sentence. We further propose and analyze variations of relative positional encoding and observe that the number of trainable parameters can be reduced without a performance loss, by using fixed encoding vectors or by removing some of the positional encoding vectors.