Published 2024 | Version v7
Dataset Open

Can LLMs Replace Manual Annotation of Software Engineering Artifacts?

Creators

Description

Required Libraries

The following libraries are required to run the scripts in this repository. You can install them using `pip`:

```bash pip install pandas numpy argparse json time random openai copy statistics krippendorff sklearn seaborn matplotlib together anthropic google-generativeai

Make sure to also install any other dependencies required by the specific model API if you plan on using models like GPT-4 or Claude:

  • openai
  • anthropic
  • together

All the experiments were done using python 3.10.11

For each dataset, we have a folder that contains process.py, heatmap.py, ira_sample.py. The folder also contains the relevant datasets and plots.

File Description:

  1. data_result: This folder contains the file with the dataset and few-shot samples. After running process.py, all the results will be accumuted to data_result folder. Note that this folder is already containing all the data and model generated results in .jsonl fomat files. You do not need to run process.py to generate the results. 
  2. Plots: This folder is containing the generated plots which can be generated by running heatmap.py and ira_sample.py.
  3. process.py: This file will generate the results/annotations from the model based on the given parameters. We have shared the necessary command to run this file at the bottom. Note that you need API keys from different organizations to run the script. However, we have shared all the model generated results on data_result folder. 
  4. heatmap.py: Running this file will generate the heatmap that we presented from Figure 1-5 in the paper. The generated plots will be stored in "Plots" folder. 
  5. ira_sample.py: Running this file will generate the plots that we presented from Figure 7-10 in the paper. The generated plots will be stored in "Plots" folder. 

Commands for datasets (Except Code Summarization):

Generating samples for different models:

python process.py --model gpt-4 --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --model gpt-3.5-turbo --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --model llama3--fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx 

python process.py --model mixtral --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --model claude --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx 

python process.py --model gemini --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

For Figure (1-5):

python heatmap.py

For Figure (7-10):

python ira_sample.py

 

Commands for datasets (Code Summarization):

python process.py --what accurate --model gpt-4 --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --what accurate --model gpt-3.5-turbo --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --what accurate --model llama3--fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx 

python process.py --what accurate --model mixtral --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --what accurate --model claude --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx 

python process.py --what accurate --model gemini --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

 

For Figure (1-5):

python heatmap.py

For Figure (7-10):

python ira_sample.py

What="accurate", "adequate", "concise", "similarity"

 

For Figure 6:

python scatter.py

 

For Figure 12 & 13, please copy majority.py and probability.py outside the shared folders.

For Figure 12:

python probability.py

For Figure 6:

python majority.py

 

We also provided sample prompts from all datasets in Prompts.pdf

Files

causality.zip

Files (65.4 MB)

Name Size Download all
md5:708ccf6c6f7cf52735e1f5abecfcb967
3.8 MB Preview Download
md5:96571d0b27a8da40607f81fc8282ef52
11.2 MB Preview Download
md5:9c31840c63ea21c78575ed9ee18c6bf4
10.6 MB Preview Download
md5:eb7d21bb6b279ed85e2a0c1e650a8238
53.9 kB Download
md5:f000acd27e7ce6479e84610bbd18924d
201.2 kB Preview Download
md5:bfb6ef529c23ebb6020dd3b7be3239c8
6.1 kB Download
md5:6cd91e1176476503244075c3ac25215e
124.1 kB Preview Download
md5:18e66a9ae9d345ba495ee0a3a110682d
11.5 MB Preview Download
md5:25a06fc92817a3b71f5e13b72528d371
937 Bytes Download
md5:026269bc9eb8bdb1fd25be788a66b952
28.0 MB Preview Download