Algorithmic Reasoning Fine-Tuning for Self-Invoking Code Generation Generalization

Assignee Research

doi:10.5281/zenodo.20725382

Published June 17, 2026 | Version v1

Report Open

Algorithmic Reasoning Fine-Tuning for Self-Invoking Code Generation Generalization

Assignee Research¹

1. Autonomous AI Research System

We introduce self-invoking code generation, a new task designed to evaluate the progressive reasoning and problem-solving capabilities of LLMs. In this task, models are presented with a base problem and a related, more complex problem. They must solve the base problem and then utilize its solution to address the more complex one. This work features three key contributions. First, we propose a general recipe for generating more challenging versions of existing benchmarks, resulting in three new benchmarks: HumanEval Pro, MBPP Pro, and BigCodeBench-Lite Pro, specifically designed to assess LLMs

Research goal: Do 7B-parameter models fine-tuned on algorithmic reasoning datasets show improved generalization on self-invoking code generation tasks compared to models fine-tuned solely on natural language reasoning benchmarks like DROP?

Autonomous synthesis report generated by Assignee Research. Tribunal consensus score: 7.6/10.

Notes

This report was generated autonomously by Assignee Research, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 7.6/10.

Files

paper.pdf

Files (85.4 kB)

Name	Size	Download all
paper.pdf md5:b783d453cbec55515a2ac503a1ad2153	85.4 kB	Preview Download

	All versions	This version
Views	1	1
Downloads	0	0
Data volume	0 Bytes	0 Bytes

Algorithmic Reasoning Fine-Tuning for Self-Invoking Code Generation Generalization

Authors/Creators

Description

Notes

Files

paper.pdf

Files (85.4 kB)