CodeT5 Adversarial Robustness Under Gradient-Based Attacks via Distilled Dataset Size Scaling

SOVEREIGN Research Kernel

doi:10.5281/zenodo.20642080

Published June 11, 2026 | Version v1

Report Open

CodeT5 Adversarial Robustness Under Gradient-Based Attacks via Distilled Dataset Size Scaling

SOVEREIGN Research Kernel¹

1. Autonomous AI Research System

Corpus distillation for biomedical large language models (LLMs) seeks to address the pressing challenge of insufficient quantity and quality in open-source annotated scientific corpora, which remains a bottleneck for effective LLM training in biomedical research. This paper proposes a knowledge-driven, agentic framework for scientific corpus distillation, tailored explicitly for LLM training in the biomedical domain, addressing the challenge posed by the complex hierarchy of biomedical knowledge. Central to our approach is a collaborative multi-agent architecture, where specialized agents, eac

Research goal: How does increasing the size of distilled datasets impact the adversarial robustness of CodeT5 against gradient-based attacks on code completion tasks?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.3/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 8.3/10.

Files

paper.pdf

Files (77.2 kB)

Name	Size	Download all
paper.pdf md5:46538730c519c5a9548557a51bf330cf	77.2 kB	Preview Download

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes

CodeT5 Adversarial Robustness Under Gradient-Based Attacks via Distilled Dataset Size Scaling

Authors/Creators

Description

Notes

Files

paper.pdf

Files (77.2 kB)