LLaMA 3.2 Bug Detection Recall Under Context Truncation Versus Sliding Windows Across Code Complexity

SOVEREIGN Research Kernel

doi:10.5281/zenodo.20651421

Published June 12, 2026 | Version v1

Report Open

LLaMA 3.2 Bug Detection Recall Under Context Truncation Versus Sliding Windows Across Code Complexity

SOVEREIGN Research Kernel¹

1. Autonomous AI Research System

Large language models (LLMs) have demonstrated strong performance on a wide range of software engineering tasks, including code generation and analysis. However, most prior work relies on cloud-based models or specialized hardware, limiting practical applicability in privacy-sensitive or resource-constrained environments. In this paper, we present a systematic empirical evaluation of two locally deployed LLMs, LLaMA 3.2 and Mistral, for real-world Python bug detection using the BugsInPy benchmark. We evaluate 349 bugs across 17 projects using a zero-shot prompting approach at the function leve

Research goal: How does context window truncation affect LLaMA 3.2's bug detection recall on BugsInPy compared to sliding window strategies across varying code complexity levels?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 7.7/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 7.7/10.

Files

paper.pdf

Files (91.5 kB)

Name	Size	Download all
paper.pdf md5:44831f78ffcb0ef767dc5fbcd0082f15	91.5 kB	Preview Download

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes

LLaMA 3.2 Bug Detection Recall Under Context Truncation Versus Sliding Windows Across Code Complexity

Authors/Creators

Description

Notes

Files

paper.pdf

Files (91.5 kB)