Impact of Context Window Size on CWE Detection F1-Score and Throughput in Fine-Tuned Llama-3.1-8B

SOVEREIGN Research Kernel

doi:10.5281/zenodo.20641402

Published June 11, 2026 | Version v1

Report Open

Impact of Context Window Size on CWE Detection F1-Score and Throughput in Fine-Tuned Llama-3.1-8B

SOVEREIGN Research Kernel¹

1. Autonomous AI Research System

Large Language Models (LLMs) have demonstrated significant capabilities in understanding and analyzing code for security vulnerabilities, such as Common Weakness Enumerations (CWEs). However, their reliance on cloud infrastructure and substantial computational requirements pose challenges for analyzing sensitive or proprietary codebases due to privacy concerns and inference costs. This work explores the potential of Small Language Models (SLMs) as a viable alternative for accurate, on-premise vulnerability detection. We investigated whether a 350-million parameter pre-trained code model (codeg

Research goal: How does varying the context window size from 512 to 8192 tokens impact the F1-score for CWE detection and token-per-second throughput in Llama-3.1-8B fine-tuned on Python vulnerability datasets?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.3/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 8.3/10.

Files

paper.pdf

Files (82.9 kB)

Name	Size	Download all
paper.pdf md5:bcc92e1cd6ad94ee4220cdb60b3111de	82.9 kB	Preview Download

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes

Impact of Context Window Size on CWE Detection F1-Score and Throughput in Fine-Tuned Llama-3.1-8B

Authors/Creators

Description

Notes

Files

paper.pdf

Files (82.9 kB)