CodeT5 Adversarial Robustness Under Gradient-Based Attacks via Distilled Dataset Size Scaling
Description
Corpus distillation for biomedical large language models (LLMs) seeks to address the pressing challenge of insufficient quantity and quality in open-source annotated scientific corpora, which remains a bottleneck for effective LLM training in biomedical research. This paper proposes a knowledge-driven, agentic framework for scientific corpus distillation, tailored explicitly for LLM training in the biomedical domain, addressing the challenge posed by the complex hierarchy of biomedical knowledge. Central to our approach is a collaborative multi-agent architecture, where specialized agents, eac
Research goal: How does increasing the size of distilled datasets impact the adversarial robustness of CodeT5 against gradient-based attacks on code completion tasks?
Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.3/10.
Notes
Files
paper.pdf
Files
(77.2 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:46538730c519c5a9548557a51bf330cf
|
77.2 kB | Preview Download |