Multilingual Pixel Representations for Translation and Effective Cross-lingual Transfer
Description
Pretrained multilingual translation models using either pixel or subword (bpe) representations trained on the many-to-one parallel TED-59 dataset, accompanying the EMNLP'23 paper "Multilingual Pixel Representations for Translation and Effective Cross-lingual Transfer."
Models can be interacted with on the command line or through a script similarly to other fairseq models, but require our code extension for rendered text with pixel representations. Each model zip file contains: the fairseq model checkpoint, vocab files, language list file, and relevant sentencepiece model(s).
We additionally package the TED-59 data here in raw extracted format for ease of comparison (original dataset release and paper by Qi et. al 2018).
For more information, see our:
Files
pixel_model.zip
Additional details
Related works
- Is supplement to
- Preprint: https://arxiv.org/abs/2305.14280 (URL)
- Software: https://github.com/esalesky/visrep/tree/multi (URL)
Dates
- Accepted
-
2023-12-06EMNLP