Automatic Semantic Augmentation of Language Model Prompts (for Code Summarization)
Authors/Creators
Description
1. There are 6 data folders and 6 results folders containing the data and result for six languages from CodeXGLUE (Java, Python, Ruby, Js, Go, PHP). For example: Java dataset can be found in Java_data folder and results can be found in Java_result folder. Result folders contain the result generated by different models.
2. For the same project code summarization, we have 3 folders for each project (one for data and two for results). For example: for a wildfly project wildfly_data contains the dataset and wildfly_result contains the results of the same project code summarization. wildflyv2_result presents the result for cross-project setup. To do this experiment, please replace the training data in the wifi_data folder with complete java training data provided in item 1.
3. In script folder we have two scripts for davinci.py for code-davnci-002 model and turbo.py is gpt-3.5 turbo model. Note that all the program analysis information are already available on the data folder. Just running the following command will generate the expected summary.
Davinci:
python davinci.py --open_key <key> --data_folder Java_data --model davinci --mode BM25 --use_repo no --use_id no --use_dfg no --pause_duration 6 --language Java
Possible options:
use_repo : yes / no
use_dfg : yes / no
use_id : id3 / no
Turbo:
python turbo.py --open_key <key> --data_folder Java_data --model turbo --mode BM25 --use_repo no --use_id no --use_dfg no --pause_duration 2 --language Java
Possible options:
use_repo : yes / no
use_dfg : yes / no
use_id : id3 / no
4. The repo information is already available in “train.jsonl” files which can be found in every data file.
5. DFG can be extracted by running DFG.py script (in the script folder).
python DFG.py --data_folder <path to the folder containing train.jsonl and test.jsonl> --language <java/python>
6. ID extraction scripts are also provided in the script folders (i.e., java_id.py, python_id.py).
Files
Go_data.zip
Files
(710.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:122fdd3aeb13fc916f50ebdf9958a8f6
|
96.3 MB | Preview Download |
|
md5:a21d4306306066e0c8711ff330122a71
|
76.4 kB | Preview Download |
|
md5:404ee51b544cfa0bcb660a5f0200eae2
|
174.4 MB | Preview Download |
|
md5:3f2e0b2b7e25a8e8351ea25ed9940d66
|
156.5 kB | Preview Download |
|
md5:df165fa3fdeaab859892c95ff84c487d
|
44.7 MB | Preview Download |
|
md5:24874ee50f5553b319ad764fa5c1b839
|
70.8 kB | Preview Download |
|
md5:670bc8a2c7925604dd2f51a932116a15
|
157.5 MB | Preview Download |
|
md5:26df3bf93fc1a5082330130896e74796
|
51.5 kB | Preview Download |
|
md5:0cc220a665346f59e7efcff996151981
|
2.6 MB | Preview Download |
|
md5:dceb499d25a32c892a1eea950530a3ab
|
216.5 MB | Preview Download |
|
md5:0992ff75c8fcf808f70b34c0f88231ae
|
135.0 kB | Preview Download |
|
md5:0543d8d4ad793f3b926d00ced1f01649
|
16.7 MB | Preview Download |
|
md5:21509b2f9e23080db75907302899e17f
|
89.1 kB | Preview Download |
|
md5:00e702dbdcd24531fe5ed7451e748d3e
|
812.2 kB | Preview Download |
|
md5:0694a59cb18a54ccb934fc656c3a1f13
|
14.2 kB | Preview Download |