Code Red! On the Harmfulness of Applying Off-the-shelf Large Language Models to Programming Tasks

Anonymous

doi:10.5281/zenodo.13758938

Published February 25, 2025 | Version v2

Software Open

Code Red! On the Harmfulness of Applying Off-the-shelf Large Language Models to Programming Tasks

Anonymous

Replication Package for the FSE'25 Paper Titled: Code Red! On the Harmfulness of Applying Off-the-shelf Large Language Models to Programming Tasks

Additional visualisations of our data can be found in the [HuggingFace Space](https://huggingface.co/spaces/an0nymous/benchmarks)

Instructions

To run the project, follow these steps:

1. Create a virtual environment using conda by running: `conda create --name myenv python=3.10.8`. (Note that Python versions newer than 3.10 will not be supported by Autosklearn)
2. Activate the virtual environment by running: `conda activate myenv`.
3. Install the required Python packages by running: `pip install -r requirements.txt`.
4. Create 2 files with the respective API keys: `openai.key` and `openrouter.key` to run a single generation with the models we selected, you will need approximately €1 in OpenAI and €30 in Openrouter credits.

The code is tested on an Ubuntu 20.04 LTS machine with 32GB of RAM and an Intel Core i9-12900HK processor.

Results

The results of the experiments can be found in the `./results` folder, each file contains the results of a single model for a single generation for the entire dataset. The trained classifier can be found in the `./classification_models` folder, we only provide the best performing model to save space.

## Replication steps
1. **Classifier Training**: The `classifier.ipynb` notebook will run the experiments with the labelled data to create the classifiers. The classifiers are saved in the `/classification_models` folder.
2. **Sample Generation**: `generation.ipynb` notebook will run generation with all the models. The results are saved in the `./results`
3. **Sample Tagging**: `tagging.ipynb` will use the classifier from step 1 ot label the samples generated in the previous steps, the results will be saved in the `./results/tagged` folder.
4. **Plotting**: `plots.ipynb` will take all the results and compile them into several figures used in the paper, each figure is saved in the `./plots` folder.

Citation

Please cite our paper if you find our work useful:

```

Files

llm_harm_bench.zip

Files (1.7 GB)

Name	Size	Download all
llm_harm_bench.zip md5:cfe00fe0a570f5c57afdf2532ac6be24	1.7 GB	Preview Download

Additional details

Programming language: Python
Development Status: Wip

	All versions	This version
Views	139	8
Downloads	39	5
Data volume	46.2 GB	8.6 GB

Code Red! On the Harmfulness of Applying Off-the-shelf Large Language Models to Programming Tasks

Authors/Creators

Description

Instructions

Results

Citation

Files

llm_harm_bench.zip

Files (1.7 GB)

Additional details

Software