LLM4DS-Benchmark: A Dataset for Assessing LLM Performance in Data Science Coding Tasks
Creators
Description
LLM4DS-Benchmark Dataset Description
The LLM4DS-Benchmark dataset is a resource designed to evaluate the performance of Large Language Models (LLMs) in solving data science coding tasks. It was developed as part of the research presented in the paper, “How Effective are LLMs for Data Science Coding? A Controlled Experiment.”
The dataset includes:
- Prompt templates for different types of problems.
- Problem IDs with associated metadata and links.
- Model-generated code solutions for successful outputs.
- Execution results compiled into a comprehensive spreadsheet.
Dataset Contents
1. Prompt Templates (prompt-templates/)
• This folder contains the prompt templates used for three types of problems: algorithm, analytical, and visualization. These prompts were used to automate the generation of the problems listed in the .json files to the prompt format.
2. Problem Metadata (problems-id/)
• Each easy.json, medium.json, and hard.json files organize the selected problems by difficulty and contain metadata for the selected problems, including:
• ID: Unique identifier for the problem.
• Link: Direct URL to the problem on the StrataScratch platform.
• Type: Problem category (algorithm, analytical, or visualization).
• Topics: Main topics associated with the problem.
• Public Problem Descriptions: While the problems are publicly available on the StrataScratch platform, we have omitted full problem descriptions from our repository. Instead, we provide the problem IDs and direct links to the StrataScratch website, ensuring compliance with their terms of service.
3. Generated Code Solutions (generated-code/)
• This folder contains all successfully generated code solutions. It is organized as follows:
• Categories: Subfolders for algorithm, analytical, and visualization problems.
• Difficulty Levels: Each category contains subfolders for easy, medium, and hard problems.
• Problem IDs: Solutions for individual problems are stored in subfolders named after their problem IDs.
• File Format: Solutions are saved as .py files.
4. Execution Results (LLM4DS-Execution-Results.xlsx)
• This Excel file provides a detailed summary of the dataset and the evaluation results. It includes the following sheets:
- Selected Problems: Metadata for the 100 selected problems, including:
• Topics: Main topics covered by each question.
• Reasoning: Why the problem was selected.
• Company: The company that originally used the problem.
- Copilot-Results, ChatGPT-Results, Perplexity-Results, and Claude-Results: Performance results for each LLM on the 100 problems.
For further details, refer to the linked paper.
Files
LLM4DS-Dataset.zip
Files
(515.1 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:81ea43c2922b22714da210e8e028189f
|
515.1 kB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/ABSanthosh/RA-Week-3-work