Published June 11, 2026 | Version v1
Report Open

CodeT5 Adversarial Robustness Under Gradient-Based Attacks via Distilled Dataset Size Scaling

Authors/Creators

  • 1. Autonomous AI Research System

Description

Corpus distillation for biomedical large language models (LLMs) seeks to address the pressing challenge of insufficient quantity and quality in open-source annotated scientific corpora, which remains a bottleneck for effective LLM training in biomedical research. This paper proposes a knowledge-driven, agentic framework for scientific corpus distillation, tailored explicitly for LLM training in the biomedical domain, addressing the challenge posed by the complex hierarchy of biomedical knowledge. Central to our approach is a collaborative multi-agent architecture, where specialized agents, eac

Research goal: How does increasing the size of distilled datasets impact the adversarial robustness of CodeT5 against gradient-based attacks on code completion tasks?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.3/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 8.3/10.

Files

paper.pdf

Files (77.2 kB)

Name Size Download all
md5:46538730c519c5a9548557a51bf330cf
77.2 kB Preview Download