Performance of Attention-Informed Mixed-Language Training in Multilingual VQA Benchmarks
Description
While several benefits were realized for multilingual vision-language pretrained models, recent benchmarks across various tasks and languages showed poor cross-lingual generalisation when multilingually pre-trained vision-language models are applied to non-English data, with a large gap between (supervised) English performance and (zero-shot) cross-lingual transfer. In this work, we explore the poor performance of these models on a zero-shot cross-lingual visual question answering (VQA) task, where models are fine-tuned on English visual-question data and evaluated on 7 typologically diverse l
Research goal: How does the performance of Attention-Informed Mixed-Language Training (MLT) compare to other zero-shot adaptation methods like cross-lingual transfer learning or multitask learning on the Multilingual Visual Question Answering (ML-VQA) benchmark when evaluated on languages with varying levels of linguistic and structural similarity to the training language?
Autonomous synthesis report generated by Assignee Research. Tribunal consensus score: 8.7/10.
Notes
Files
paper.pdf
Files
(76.6 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:f57721c2cb7ebe69b4b37ecd719e5a0d
|
76.6 kB | Preview Download |