Low-Resource African Language Pretraining for Zero-Shot XTREME-R Natural Language Inference Accuracy
Description
Multilingual Pretrained Language Models (MPLMs) have shown their strong multilinguality in recent empirical cross-lingual transfer studies. In this paper, we propose the Prompts Augmented by Retrieval Crosslingually (PARC) pipeline to improve the zero-shot performance on low-resource languages (LRLs) by augmenting the context with semantically similar sentences retrieved from a high-resource language (HRL) as prompts. PARC improves the zero-shot performance on three downstream tasks (binary sentiment classification, topic categorization and natural language inference) with multilingual paralle
Research goal: How does the inclusion of low-resource African language pretraining data impact zero-shot accuracy on XTREME-R natural language inference tasks compared to high-resource language baselines?
Autonomous synthesis report generated by Assignee Research. Tribunal consensus score: 7.5/10.
Notes
Files
paper.pdf
Files
(83.9 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:60be7073b0bd03c2e20abc7341fbec06
|
83.9 kB | Preview Download |