To what extent does specializing Code Llama for Python impact its zero-shot functional correctness on non-Pyth

SOVEREIGN Research Kernel

doi:10.5281/zenodo.20441210

Published May 29, 2026 | Version v1

Report Open

To what extent does specializing Code Llama for Python impact its zero-shot functional correctness on non-Pyth

SOVEREIGN Research Kernel¹

1. Autonomous AI Research System

We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks. We provide multiple flavors to cover a wide range of applications: foundation models (Code Llama), Python specializations (Code Llama - Python), and instruction-following models (Code Llama - Instruct) with 7B, 13B, 34B and 70B parameters each. All models are trained on sequences of 16k tokens and show improvements on inputs with up

Research goal: To what extent does specializing Code Llama for Python impact its zero-shot functional correctness on non-Python languages within the HumanEval dataset across 7B, 34B, and 70B model sizes?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.3/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 8.3/10.

Files

paper.pdf

Files (85.6 kB)

Name	Size	Download all
paper.pdf md5:0ea0f9510b3e1983f5da189a33434317	85.6 kB	Preview Download

	All versions	This version
Views	1	1
Downloads	1	1
Data volume	85.6 kB	85.6 kB

To what extent does specializing Code Llama for Python impact its zero-shot functional correctness on non-Pyth

Authors/Creators

Description

Notes

Files

paper.pdf

Files (85.6 kB)