Planned intervention: We expect a short interruption on Monday, July 13th, at approximately 12:00 (UTC), due to a platform update.

Published 2024 | Version v7

Dataset Open

Can LLMs Replace Manual Annotation of Software Engineering Artifacts?

Blinded

Required Libraries

The following libraries are required to run the scripts in this repository. You can install them using `pip`:

```bash pip install pandas numpy argparse json time random openai copy statistics krippendorff sklearn seaborn matplotlib together anthropic google-generativeai

Make sure to also install any other dependencies required by the specific model API if you plan on using models like GPT-4 or Claude:

openai
anthropic
together

All the experiments were done using python 3.10.11

For each dataset, we have a folder that contains process.py, heatmap.py, ira_sample.py. The folder also contains the relevant datasets and plots.

File Description:

data_result: This folder contains the file with the dataset and few-shot samples. After running process.py, all the results will be accumuted to data_result folder. Note that this folder is already containing all the data and model generated results in .jsonl fomat files. You do not need to run process.py to generate the results.
Plots: This folder is containing the generated plots which can be generated by running heatmap.py and ira_sample.py.
process.py: This file will generate the results/annotations from the model based on the given parameters. We have shared the necessary command to run this file at the bottom. Note that you need API keys from different organizations to run the script. However, we have shared all the model generated results on data_result folder.
heatmap.py: Running this file will generate the heatmap that we presented from Figure 1-5 in the paper. The generated plots will be stored in "Plots" folder.
ira_sample.py: Running this file will generate the plots that we presented from Figure 7-10 in the paper. The generated plots will be stored in "Plots" folder.

Commands for datasets (Except Code Summarization):

Generating samples for different models:

python process.py --model gpt-4 --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --model gpt-3.5-turbo --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --model llama3--fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --model mixtral --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --model claude --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --model gemini --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

For Figure (1-5):

python heatmap.py

For Figure (7-10):

python ira_sample.py

Commands for datasets (Code Summarization):

python process.py --what accurate --model gpt-4 --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --what accurate --model gpt-3.5-turbo --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --what accurate --model llama3--fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --what accurate --model mixtral --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --what accurate --model claude --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --what accurate --model gemini --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

For Figure (1-5):

python heatmap.py

For Figure (7-10):

python ira_sample.py

What="accurate", "adequate", "concise", "similarity"

For Figure 6:

python scatter.py

For Figure 12 & 13, please copy majority.py and probability.py outside the shared folders.

For Figure 12:

python probability.py

For Figure 6:

python majority.py

We also provided sample prompts from all datasets in Prompts.pdf

Files

causality.zip

Files (65.4 MB)

Name	Size	Download all
causality.zip md5:708ccf6c6f7cf52735e1f5abecfcb967	3.8 MB	Preview Download
code_summarization_accuracy_similarity.zip md5:96571d0b27a8da40607f81fc8282ef52	11.2 MB	Preview Download
code_summarization_adequacy_conciseness.zip md5:9c31840c63ea21c78575ed9ee18c6bf4	10.6 MB	Preview Download
majority.py md5:eb7d21bb6b279ed85e2a0c1e650a8238	53.9 kB	Download
name_value_inconsistency.zip md5:f000acd27e7ce6479e84610bbd18924d	201.2 kB	Preview Download
probability.py md5:bfb6ef529c23ebb6020dd3b7be3239c8	6.1 kB	Download
Prompts.pdf md5:6cd91e1176476503244075c3ac25215e	124.1 kB	Preview Download
SA.zip md5:18e66a9ae9d345ba495ee0a3a110682d	11.5 MB	Preview Download
scatter.py md5:25a06fc92817a3b71f5e13b72528d371	937 Bytes	Download
semantic_similarity.zip md5:026269bc9eb8bdb1fd25be788a66b952	28.0 MB	Preview Download

779

Views

2K

Downloads

Show more details

	All versions	This version
Views	779	513
Downloads	1,590	1,342
Data volume	9.8 GB	8.2 GB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: October 10, 2024
Modified: October 10, 2024