Dataset & Experimental Results
Authors/Creators
Description
> **β οΈ For IEEE Access Reviewers**
>
> Here are the examples mentioned in the paper:
>
> - System Prompt for all experiments in dataset/system_prompt/system_prompt.pdf
> - Dataset Prompt Comparison in datasets/prompt_comparison/prompt_comparison.pdf
> - RQ2 Qualitative Analysis in results/rq2_qualitative_analysis/summary/summary.pdf
> - Code Contest dataset transformation in datasets/code_contest_transformation/code_contest_transformation.pdf
This repository contains the experimental data and analysis for a research study demonstrating how **Test-Driven Prompting (TDP)** significantly improves AI code generation across multiple programming languages and difficulty levels. All experimental data, statistical analysis, and qualitative comparisons are included.
## What is Test-Driven Prompting?
Instead of just asking an AI to write code, Test-Driven Prompting includes example test cases in the prompt. This helps the AI understand exactly what the code should do, similar to how human programmers use test-driven development.

## Key Findings
Our study tested 8 different AI models (GPT-4, Claude, Qwen) on 3 programming benchmarks and found:
- π― **Universal Improvement**: TDP worked better in 100% of test cases (16/16 model-dataset combinations)
- π **Significant Gains**: Average 6.09% improvement in code generation success rates, statistical results: (95% CI: [4.01, 8.18], p < 0.0001, Cohen’s d = 1.08)
- π **Efficiency Boost**: Smaller AI models with TDP can outperform larger models using normal prompts
- π‘ **Biggest Impact**: Most helpful for problems with unclear or implicit requirements
The findings suggest that including test cases in your prompts to AI coding assistants can significantly improve the quality of generated code, especially for complex or ambiguous programming tasks.
## Authors
**Muhammad Rizqullah** (mrizqullah@stu.kau.edu.sa) and **Emad Albassam** (ealbassam@kau.edu.sa)
Computer Science Department, King Abdulaziz University, Jeddah, Saudi Arabia
*Corresponding author: Muhammad Rizqullah. Any enquiries about the research should be directed to him.*
## Publication Status
This research is currently under review at [IEEE Access](https://ieeeaccess.ieee.org/), a Scopus Q2 and Web of Science SCIE Q2 journal with an Impact Factor of 3.6.
## Repository Structure
- `datasets/` - Programming problems and test cases from HumanEval, MBPP, and Code Contests
- `raw_results/` - Complete experimental results for each AI model and dataset combination
- `results/` - Statistical analysis and comparison reports
- `abstract/` - Academic paper abstract with statistical findings
## Source Code Availability
Source code of the experimentation framework is available here: https://github.com/madnanrizqu/model-agnostic-empirical-evaluation-of-test-driven-prompt-engineering
Files
thesis-ieee-data.zip
Files
(26.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:0dafea55d0b8d42bfa9eb9e0a0a32775
|
26.1 MB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/madnanrizqu/model-agnostic-empirical-evaluation-of-test-driven-prompt-engineering
- Programming language
- Python , JavaScript
- Development Status
- Active