LLM4DS-Benchmark: A Dataset for Assessing LLM Performance in Data Science Coding Tasks

Boominathan, Santhosh Anitha; Chintakunta, Sai Sanjna; Nascimento, Nathalia; Everton, Guimaraes

doi:10.5281/zenodo.14064111

Published November 11, 2024 | Version V 0

Data paper Open

LLM4DS-Benchmark: A Dataset for Assessing LLM Performance in Data Science Coding Tasks

1. Pennsylvania State University

LLM4DS-Benchmark Dataset Description

The LLM4DS-Benchmark dataset is a resource designed to evaluate the performance of Large Language Models (LLMs) in solving data science coding tasks. It was developed as part of the research presented in the paper, “How Effective are LLMs for Data Science Coding? A Controlled Experiment.”

The dataset includes:

Prompt templates for different types of problems.
Problem IDs with associated metadata and links.
Model-generated code solutions for successful outputs.
Execution results compiled into a comprehensive spreadsheet.

Dataset Contents

1. Prompt Templates (prompt-templates/)

• This folder contains the prompt templates used for three types of problems: algorithm, analytical, and visualization. These prompts were used to automate the generation of the problems listed in the .json files to the prompt format.

2. Problem Metadata (problems-id/)

• Each easy.json, medium.json, and hard.json files organize the selected problems by difficulty and contain metadata for the selected problems, including:

• ID: Unique identifier for the problem.

• Link: Direct URL to the problem on the StrataScratch platform.

• Type: Problem category (algorithm, analytical, or visualization).

• Topics: Main topics associated with the problem.

• Public Problem Descriptions: While the problems are publicly available on the StrataScratch platform, we have omitted full problem descriptions from our repository. Instead, we provide the problem IDs and direct links to the StrataScratch website, ensuring compliance with their terms of service.

3. Generated Code Solutions (generated-code/)

• This folder contains all successfully generated code solutions. It is organized as follows:

• Categories: Subfolders for algorithm, analytical, and visualization problems.

• Difficulty Levels: Each category contains subfolders for easy, medium, and hard problems.

• Problem IDs: Solutions for individual problems are stored in subfolders named after their problem IDs.

• File Format: Solutions are saved as .py files.

4. Execution Results (LLM4DS-Execution-Results.xlsx)

• This Excel file provides a detailed summary of the dataset and the evaluation results. It includes the following sheets:

- Selected Problems: Metadata for the 100 selected problems, including:

• Topics: Main topics covered by each question.

• Reasoning: Why the problem was selected.

• Company: The company that originally used the problem.

- Copilot-Results, ChatGPT-Results, Perplexity-Results, and Claude-Results: Performance results for each LLM on the 100 problems.

For further details, refer to the linked paper.

Files

LLM4DS-Dataset.zip

Files (515.1 kB)

Name	Size	Download all
LLM4DS-Dataset.zip md5:81ea43c2922b22714da210e8e028189f	515.1 kB	Preview Download

Additional details

Repository URL: https://github.com/ABSanthosh/RA-Week-3-work

	All versions	This version
Views	347	270
Downloads	69	43
Data volume	37.6 MB	22.2 MB

LLM4DS-Benchmark: A Dataset for Assessing LLM Performance in Data Science Coding Tasks

Authors/Creators

Description

Files

LLM4DS-Dataset.zip

Files (515.1 kB)

Additional details

Software