aikenkyu001/iterative_self_healing_benchmark: v1.0.0: Scaffolding Trinity for Deterministic LLM Code Generation
Authors/Creators
Description
Overview This repository provides the systematic framework and official implementation of the "Scaffolding Trinity" methodology, designed to overcome the inherent "functional non-determinism" of Large Language Models (LLMs). By introducing a structured formal language, Sigma-Lisp, we establish a reliable pathway for converting complex natural language requirements into deterministic, executable code.
Through 12 experimental phases, this project demonstrates how LLMs can be transformed from probabilistic inference engines into "Deterministic Transducers" that map rigorous specifications to precise implementation with near-perfect reliability.
Key Achievements
- Achievement of Deterministic Success: In the most challenging NL$\to$Lisp$\to$Code pipeline, the gemma3:12b model achieved a 100% success rate across all 30 algorithmic tasks (30/30; Clopper–Pearson 95% CI: [88.4%, 100%]).
- Establishment of the Scaffolding Trinity:
- Specification Scaffold: Lisp-cognizant design documents that align NL requirements with functional programming paradigms.
- Grammar Scaffold: Formal language extensions via Sigma-Lisp, providing conceptual helper functions (e.g., Tries, Queues) as a built-in library.
- Execution Scaffold: A specialized two-stage temperature strategy ($T=1.0 \to T=0.0$) combined with prescriptive Python implementation guides.
- High-Difficulty Benchmark: A comprehensive set of 30 algorithmic challenges, including Sudoku Solvers, Boggle Solvers, and Optimal BST Cost calculations.
Release Contents
- 01_TestDefinitions: Requirement definitions and automated test runners for all 30 tasks.
- 02_Prompts: High-precision prompt templates embodying the Trinity strategy.
- 03_Scripts: Python scripts for automated benchmarking, self-healing loops, and statistical analysis.
- 04_RawData: Complete execution logs from Phase 1 through Phase 12, including raw inference results for multiple model families.
- 05_Reports: Academic papers and final reports available in Japanese, English, and LaTeX formats.
Reproducibility All experimental data included in this release were obtained under reproducible conditions using the Ollama inference server. Detailed parameters, including prompts, fixed random seeds, and temperature settings, are explicitly documented in the accompanying papers and scripts.
Future Directions This release serves as a significant milestone in the study of LLM reliability. Future work will focus on the end-to-end automation of the scaffolding generation process and the validation of this methodology across a broader range of model architectures and industrial-scale tasks.
Author: Fumio Miyata DOI: [Pending Acquisition] Repository: https://github.com/aikenkyu001/iterative_self_healing_benchmark
Files
aikenkyu001/iterative_self_healing_benchmark-V1.0.0.zip
Files
(3.5 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:85fbd6f2a15bf1176d68c5b389c5f0e8
|
3.5 MB | Preview Download |
Additional details
Related works
- Is supplement to
- Software: https://github.com/aikenkyu001/iterative_self_healing_benchmark/tree/V1.0.0 (URL)