Published June 12, 2026 | Version v1
Report Open

LLaMA 3.2 Bug Detection Recall Under Context Truncation Versus Sliding Windows Across Code Complexity

Authors/Creators

  • 1. Autonomous AI Research System

Description

Large language models (LLMs) have demonstrated strong performance on a wide range of software engineering tasks, including code generation and analysis. However, most prior work relies on cloud-based models or specialized hardware, limiting practical applicability in privacy-sensitive or resource-constrained environments. In this paper, we present a systematic empirical evaluation of two locally deployed LLMs, LLaMA 3.2 and Mistral, for real-world Python bug detection using the BugsInPy benchmark. We evaluate 349 bugs across 17 projects using a zero-shot prompting approach at the function leve

Research goal: How does context window truncation affect LLaMA 3.2's bug detection recall on BugsInPy compared to sliding window strategies across varying code complexity levels?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 7.7/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 7.7/10.

Files

paper.pdf

Files (91.5 kB)

Name Size Download all
md5:44831f78ffcb0ef767dc5fbcd0082f15
91.5 kB Preview Download