Published August 18, 2023 | Version v1
Dataset Open

The Artifact of the ESEC/FSE 2023 Paper Titled "Natural Language to Code: How Far are We?"

  • 1. National University of Defense Technology

Description

In this online repository, we release the source code of each of the selected techniques as well as the experiment results from each technique (which are stored in the Results.zip file). For each technique, we also provide our scripts to fine this approach on the CodeSearchNet-Python dataset. For example, finetune.sh/inference.sh are used to finetune/evaluate CodeBERT and they are under "CodeBERT/CodeBERT".

 

Our evaluation dataset CodeSearchNet is a well-known benchmark and it can be downloaded on its official webpage.

 

The code to calculate the evaluation metrics are reused from CodeBLEU.

 

Below is a piece of code generated by CodeT5. In this case, CodeT5 generates a statement recurrently, which leads to the syntactic error. Despite that, the code itself fulfills certain functionalities, and that is why it can achieve a CodeBLEU of 24.9%.

def makeMimiLocal(filename):
    try:
        with open(filename, 'rb') as f:
            data = f.read()
    except IOError:
        data = b''
    data = data.decode('utf-8')
    data = data.replace(b'\x00', b'\x00')
    data = data.replace(b'\x00', b'\x00')
    data = data.replace(b'\x00', b'\x00')
    data = data.replace(b'\x00', b'\x00')
    data = data.replace(b'\x00', b'\x00')
    data = data.replace(b'\x00', b'\x00')
    data = data.replace(b'\x00', b'\x00')
    data = data.replace(b'\x00', b'\x00')
    data = data.replace(b'\x00', b'\x00')
    data = data.replace(b'\x00', b'\x00')
    data = data.replace(b'\

 

We also release the 100 randomly-selected queries as well as the code generated by ChatGPT in the chatGPT.jsonl.

Files

CodeBERT.zip

Files (194.3 MB)

Name Size Download all
md5:858a1cb4f6a7e298203b0d7a939d9732
36.1 kB Download
md5:66d10ef1e23ef3374403a12df708bffd
25.8 kB Preview Download
md5:3d71bab161fa71b0e8846951865303d0
15.7 MB Preview Download
md5:c62cac7d388b218b265abe751df294cd
48.3 kB Preview Download
md5:4ce933cb5a6e73048b0327173a369f62
583.8 kB Preview Download
md5:f2fd8c9f36759f73e71b64511e3777eb
40.4 kB Preview Download
md5:321d29ed814a4c4942cd80d2a195bce4
16.7 MB Preview Download
md5:a14178ef775b791abf1f99c09bd500f8
5.4 MB Preview Download
md5:f72989f31459053f6032008f74dfa4e6
155.8 MB Preview Download
md5:5cc05323c324f3beb0aae86c47d122a3
37.5 kB Preview Download