There is a newer version of the record available.

Published December 6, 2023 | Version v1
Model Open

Multilingual Pixel Representations for Translation and Effective Cross-lingual Transfer

  • 1. ROR icon Johns Hopkins University
  • 2. ROR icon Microsoft (United States)

Description

Pretrained multilingual translation models using either pixel or subword (bpe) representations trained on the many-to-one parallel TED-59 dataset, accompanying the EMNLP'23 paper "Multilingual Pixel Representations for Translation and Effective Cross-lingual Transfer." 

Models can be interacted with on the command line or through a script similarly to other fairseq models, but require our code extension for rendered text with pixel representations. Each model zip file contains: the fairseq model checkpoint, vocab files, language list file, and relevant sentencepiece model(s).

We additionally package the TED-59 data here in raw extracted format for ease of comparison (original dataset release and paper by Qi et. al 2018). 

For more information, see our:

  • Paper describing the method and training data  [arXiv]
  • Code repository with scripts  [github]

Files

pixel_model.zip

Files (2.3 GB)

Name Size Download all
md5:f1ccd393440d83a1c5d3198c0ea597af
977.1 MB Preview Download
md5:a444ed3dd38e57bef3d7b83094bee407
923.1 MB Preview Download
md5:38f4f507258622505031cfa7fd3fb9d2
404.8 MB Preview Download

Additional details

Related works

Dates

Accepted
2023-12-06
EMNLP