There is a newer version of the record available.

Published March 28, 2023 | Version v7
Conference paper Open

Automatic Semantic Augmentation of Language Model Prompts (for Code Summarization)

Authors/Creators

Description

1. There are 6 data folders and 6 results folders containing the data and result for six languages from CodeXGLUE (Java, Python, Ruby, Js, Go, PHP). For example: Java dataset can be found in Java_data folder and results can be found in Java_result folder. Result folders contain the result generated by different models.

2. For the same project code summarization, we have 3 folders for each project (one for data and two for results). For example: for a wildfly project wildfly_data contains the dataset and wildfly_result contains the results of the same project code summarization. wildflyv2_result presents the result for cross-project setup. To do this experiment, please replace the training data in the wifi_data folder with complete java training data provided in item 1.  

3. In script folder we have two scripts for davinci.py for code-davnci-002 model and turbo.py is gpt-3.5 turbo model. Note that all the program analysis information are already available on the data folder. Just running the following command will generate the expected summary.

Davinci:

python davinci.py --open_key <key> --data_folder Java_data --model davinci --mode BM25 --use_repo no --use_id no --use_dfg no --pause_duration 6  --language Java

Possible options:

use_repo : yes / no

use_dfg : yes / no

use_id : id3 / no 

 

Turbo:

python turbo.py --open_key <key> --data_folder Java_data --model turbo --mode BM25 --use_repo no --use_id no --use_dfg no --pause_duration 2  --language Java

Possible options:

use_repo : yes / no

use_dfg : yes / no

use_id : id3 / no 


4. The repo information is already available in “train.jsonl” files which can be found in every data file.

5. DFG can be extracted by running DFG.py script (in the script folder).

python DFG.py --data_folder <path to the folder containing train.jsonl and test.jsonl> --language <java/python>

6. ID extraction scripts are also provided in the script folders (i.e., java_id.py, python_id.py).

Files

Go_data.zip

Files (710.1 MB)

Name Size Download all
md5:122fdd3aeb13fc916f50ebdf9958a8f6
96.3 MB Preview Download
md5:a21d4306306066e0c8711ff330122a71
76.4 kB Preview Download
md5:404ee51b544cfa0bcb660a5f0200eae2
174.4 MB Preview Download
md5:3f2e0b2b7e25a8e8351ea25ed9940d66
156.5 kB Preview Download
md5:df165fa3fdeaab859892c95ff84c487d
44.7 MB Preview Download
md5:24874ee50f5553b319ad764fa5c1b839
70.8 kB Preview Download
md5:670bc8a2c7925604dd2f51a932116a15
157.5 MB Preview Download
md5:26df3bf93fc1a5082330130896e74796
51.5 kB Preview Download
md5:0cc220a665346f59e7efcff996151981
2.6 MB Preview Download
md5:dceb499d25a32c892a1eea950530a3ab
216.5 MB Preview Download
md5:0992ff75c8fcf808f70b34c0f88231ae
135.0 kB Preview Download
md5:0543d8d4ad793f3b926d00ced1f01649
16.7 MB Preview Download
md5:21509b2f9e23080db75907302899e17f
89.1 kB Preview Download
md5:00e702dbdcd24531fe5ed7451e748d3e
812.2 kB Preview Download
md5:0694a59cb18a54ccb934fc656c3a1f13
14.2 kB Preview Download