Cross-lingual Pretraining Impact on Zero-shot F1 Scores for CWE-200 Vulnerability Detection in Low-Resource Languages

SOVEREIGN Research Kernel

doi:10.5281/zenodo.20669478

Published June 12, 2026 | Version v1

Report Open

Cross-lingual Pretraining Impact on Zero-shot F1 Scores for CWE-200 Vulnerability Detection in Low-Resource Languages

SOVEREIGN Research Kernel¹

1. Autonomous AI Research System

Zero-shot cross-lingual knowledge transfer enables the multilingual pretrained language model (mPLM), finetuned on a task in one language, make predictions for this task in other languages. While being broadly studied for natural language understanding tasks, the described setting is understudied for generation. Previous works notice a frequent problem of generation in a wrong language and propose approaches to address it, usually using mT5 as a backbone model. In this work, we test alternative mPLMs, such as mBART and NLLB-200, considering full finetuning and parameter-efficient finetuning wi

Research goal: How does cross-lingual pretraining affect the zero-shot F1 scores of CodeT5 versus mT5 for CWE-200 vulnerability detection in low-resource languages?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.5/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 8.5/10.

Files

paper.pdf

Files (87.6 kB)

Name	Size	Download all
paper.pdf md5:02e5490332cc32af0ace4693fb50c2c7	87.6 kB	Preview Download

	All versions	This version
Views	1	1
Downloads	0	0
Data volume	0 Bytes	0 Bytes

Cross-lingual Pretraining Impact on Zero-shot F1 Scores for CWE-200 Vulnerability Detection in Low-Resource Languages

Authors/Creators

Description

Notes

Files

paper.pdf

Files (87.6 kB)