Dataset & Experimental Results

Rizqullah, Muhammad; Albassam, Emad

doi:10.5281/zenodo.17609435

Published November 14, 2025 | Version v3

Dataset Open

Dataset & Experimental Results

1. King Abdulaziz University

> **⚠️ For IEEE Access Reviewers**
>
> Here are the examples mentioned in the paper:
>
> - System Prompt for all experiments in dataset/system_prompt/system_prompt.pdf
> - Dataset Prompt Comparison in datasets/prompt_comparison/prompt_comparison.pdf
> - RQ2 Qualitative Analysis in results/rq2_qualitative_analysis/summary/summary.pdf

> - Code Contest dataset transformation in datasets/code_contest_transformation/code_contest_transformation.pdf

This repository contains the experimental data and analysis for a research study demonstrating how **Test-Driven Prompting (TDP)** significantly improves AI code generation across multiple programming languages and difficulty levels. All experimental data, statistical analysis, and qualitative comparisons are included.

## What is Test-Driven Prompting?

Instead of just asking an AI to write code, Test-Driven Prompting includes example test cases in the prompt. This helps the AI understand exactly what the code should do, similar to how human programmers use test-driven development.

![alt text](methodology_overview.png)

## Key Findings

Our study tested 8 different AI models (GPT-4, Claude, Qwen) on 3 programming benchmarks and found:

- 🎯 **Universal Improvement**: TDP worked better in 100% of test cases (16/16 model-dataset combinations)
- 📈 **Significant Gains**: Average 6.09% improvement in code generation success rates, statistical results: (95% CI: [4.01, 8.18], p < 0.0001, Cohen’s d = 1.08)
- 🚀 **Efficiency Boost**: Smaller AI models with TDP can outperform larger models using normal prompts
- 💡 **Biggest Impact**: Most helpful for problems with unclear or implicit requirements

The findings suggest that including test cases in your prompts to AI coding assistants can significantly improve the quality of generated code, especially for complex or ambiguous programming tasks.

## Authors

**Muhammad Rizqullah** (mrizqullah@stu.kau.edu.sa) and **Emad Albassam** (ealbassam@kau.edu.sa)
Computer Science Department, King Abdulaziz University, Jeddah, Saudi Arabia

*Corresponding author: Muhammad Rizqullah. Any enquiries about the research should be directed to him.*

## Publication Status

This research is currently under review at [IEEE Access](https://ieeeaccess.ieee.org/), a Scopus Q2 and Web of Science SCIE Q2 journal with an Impact Factor of 3.6.

## Repository Structure

- `datasets/` - Programming problems and test cases from HumanEval, MBPP, and Code Contests
- `raw_results/` - Complete experimental results for each AI model and dataset combination
- `results/` - Statistical analysis and comparison reports
- `abstract/` - Academic paper abstract with statistical findings

## Source Code Availability

Source code of the experimentation framework is available here: https://github.com/madnanrizqu/model-agnostic-empirical-evaluation-of-test-driven-prompt-engineering

Files

thesis-ieee-data.zip

Files (26.1 MB)

Name	Size	Download all
thesis-ieee-data.zip md5:0dafea55d0b8d42bfa9eb9e0a0a32775	26.1 MB	Preview Download

Additional details

Repository URL: https://github.com/madnanrizqu/model-agnostic-empirical-evaluation-of-test-driven-prompt-engineering
Programming language: Python , JavaScript
Development Status: Active

	All versions	This version
Views	51	16
Downloads	9	5
Data volume	310.1 MB	130.5 MB

Dataset & Experimental Results

Authors/Creators

Description

Files

thesis-ieee-data.zip

Files (26.1 MB)

Additional details

Software