Multilingual Pixel Representations for Translation and Effective Cross-lingual Transfer

doi:10.5281/zenodo.10086264

Published December 6, 2023 | Version v1

Model Open

Multilingual Pixel Representations for Translation and Effective Cross-lingual Transfer

1. Johns Hopkins University
2. Microsoft (United States)

Pretrained multilingual translation models using either pixel or subword (bpe) representations trained on the many-to-one parallel TED-59 dataset, accompanying the EMNLP'23 paper "Multilingual Pixel Representations for Translation and Effective Cross-lingual Transfer."

Models can be interacted with on the command line or through a script similarly to other fairseq models, but require our code extension for rendered text with pixel representations. Each model zip file contains: the fairseq model checkpoint, vocab files, language list file, and relevant sentencepiece model(s).

We additionally package the TED-59 data here in raw extracted format for ease of comparison (original dataset release and paper by Qi et. al 2018).

For more information, see our:

Paper describing the method and training data [arXiv]
Code repository with scripts [github]

Files

pixel_model.zip

Files (2.3 GB)

Name	Size	Download all
pixel_model.zip md5:f1ccd393440d83a1c5d3198c0ea597af	977.1 MB	Preview Download
subword_model.zip md5:a444ed3dd38e57bef3d7b83094bee407	923.1 MB	Preview Download
ted59_data.zip md5:38f4f507258622505031cfa7fd3fb9d2	404.8 MB	Preview Download

Additional details

Is supplement to: Preprint: https://arxiv.org/abs/2305.14280 (URL); Software: https://github.com/esalesky/visrep/tree/multi (URL)

Accepted: 2023-12-06

EMNLP

	All versions	This version
Views	191	171
Downloads	90	69
Data volume	69.3 GB	53.8 GB

Multilingual Pixel Representations for Translation and Effective Cross-lingual Transfer

Files

pixel_model.zip

Files (2.3 GB)

Additional details

Related works

Dates

Multilingual Pixel Representations for Translation and Effective Cross-lingual Transfer

Creators

Description

Files

pixel_model.zip

Files (2.3 GB)

Additional details

Related works

Dates