Published March 22, 2021 | Version 1.0.0
Conference paper Open

Applying CodeBERT for Automated Program Repair of Java Simple Bugs

  • 1. University of Calgary

Description

This zip file contains the dataset of the paper titled "Applying CodeBERT for Automated Program Repair of Java Simple Bugs". You can access the GitHub repo containing the replication instruction and the provided code with this link.

The paper abstract:
Software debugging, and program repair are among the most time-consuming and labor-intensive tasks in software engineering that would benefit a lot from automation. In this paper, we propose a novel automated program repair approach based on CodeBERT, which is a transformer-based neural architecture pre-trained on large corpus of source code. We fine-tune our model on the ManySStuBs4J small and large datasets to automatically generate the fix codes. The results show that our technique accurately predicts the fixed codes implemented by the developers in 19-72% of the cases, depending on the type of datasets, in less than a second per bug. We also observe that our method can generate varied-length fixes (short and long) and can fix different types of bugs, even if only a few instances of those types of bugs exist in the training dataset.

 

Files

data.zip

Files (69.0 MB)

Name Size Download all
md5:e19639bd65ae692031290e137e18b5fd
69.0 MB Preview Download