Published April 27, 2020 | Version v1

The dataset of the ASE'20 paper titled "Automated Patch Correctness Assessment: How Far are We?"

  • 1. National University of Defense Technology

Description

This is the experiment result of the ASE'20 paper titled "Automated Patch Correctness Assessment: How Far are We?".

If you use our data for academic research, please cite our paper as:

@inproceedings{wang2020automated,
  title={Automated Patch Correctness Assessment: How Far are We?},
  author={Wang, Shangwen and Wen, Ming and Lin, Bo and Wu, Hongjun and Qin, Yihao and Zou, Deqing and Mao, Xiaoguang and Jin, Hai}, 
  booktitle={Proceedings of the 35th International Conference on Automated Software Engineering (ASE)}, 
  year={2020}, 
  organization={ACM}
}

The file Patches.zip includes all the patches we take into consideration in this study. Note that 269 patches come from "Automated Patch Assessment for Program Repair at Scale (Ye et al.), Technical report 1909.13694, arXiv, 2019".

The file Patches_for_Static include all the class files we used for static method.

The file Tests-oracle includes all the test cases generated by Evosuite and Randoop on the fixed version programs.

The file Tests-buggy includes all the test cases generated by Evosuite and Randoop on the buggy version programs.

The file DiffTGen-result includes ingredients and output information of DiffTGen.

The file Daikon-output includes inferred invariants of each patch and its corresponding ground-truth.

The file PATCH-SIM_result includes the output vector files from PATCH-SIM and E-PATCH-SIM.

The file Training_result includes the output of six ML algorithms with or without oracle.

Chart: 1-26;
Closure: 14, 18, 31, 33, 38, 40, 57, 62, 63, 70, 73, 86, 92, 93, 115, 123, 126;
Lang: 6, 7, 10, 16, 20, 21, 22, 24, 26, 27, 33, 35, 38, 39, 41, 43, 44, 45, 50, 51, 55, 57, 58, 59, 60, 61, 63;
Math: 2, 3, 4, 5, 6, 8, 20, 22, 25, 28, 30, 31, 32, 33, 34, 35, 39, 41, 49, 50, 53, 56, 57, 58, 59, 60, 61, 63, 65, 68, 70, 71, 73, 74, 75, 79, 80, 81, 82, 85, 86, 88, 89, 90, 93, 97, 98, 99, 104;
Time: 4, 7, 11, 14, 15, 19.

Please note that for bugs in the above table, the Evosuite tests on the fixed version programs are reused from a previous study. We thank He Ye, Matias Martinez, and Martin Monperrus so much for sharing their data.

 

Notice! For patches under the folder Patches_ICSE, those under Ddifferent and Dsame folders are all correct patches. Different and Same only indicate whether the patch is syntactically identical to the ground truth patch.

Patches generated for Mockito project (2 in total): Kali-A-Mockito-10; Arja-Mockito-10

Patches do not pass plausibility check (6 in total): Kali-Closure-133; kPAR-Chart-12; FixMiner-Chart-12; patch1-Lang-6-SketchFix-plausible; patch2-Lang-6-SketchFix-plausible; patch1-Math-2-SOFix

Patches that are mistakenly labeled (12 in total): patch2-Lang-51-Jaid; patch1-Lang-43-CapGen; patch2-Lang-43-CapGen; patch2-Math-53-CapGen; patch2-Math-53-Jaid; jKali-Lang-7; ACS-Lang-35; Arja-Math-35; SimFix-Math-72; SimFix-Closure-19; Arja-Math-50; SimFix-Lang-60

Detailed reasons for the mislabeled patches: 1. the ground-truth patch modifies multiple locations while the generated patch only modifies one of them (2/12, SimFix-Math-72, SimFix-Lang-60); 2. the edit points in the generated patch are different from those in ground-truth patch (8/12, patch2-Lang-51-Jaid, patch2-Math-53-Jaid, patch1-Lang-43-CapGen, patch2-Lang-43-CapGen, patch2-Math-53-CapGen, ACS-Lang-35, SimFix-Closure-19, Arja-Math-50); 3. the generated patch doesnot fulfill the intended function in ground-truth (2/12, jKali-Lang-7, Arja-Math-35).

Take Arja-Math-50 as an example, this patch deletes a conditional statement which deals with an unexpected input (null) in the method verifyBracketing. However, in the oracle program, this conditional statement still exists. Then, Randoop generated a test case by calling verifyBracketing with a null argument. This test passed on the ground-truthpatch but failed on the patch generated by Arja due to the removeof the exception handling statements. As a result, this patch is actually overfitting but mistakenly labeled as correct. We have confirmed this case with Kui Liu, the first author of the recent ICSE'20 paper (Title: On the Efficiency of Test Suite based Program Repair) which makes up our patch benchmark.

 

Border line Patches (3 in total): ACS-Lang-7; kPAR-Lang-7; TBar-Lang-7. Reasons for overfitting: Evosuite generates some tests that fail on those patches, e.g., test049 in Seed 1; the Java documentation above the function states that it needs to deal with the situation where the input cannot be converted. Reasons for correct: it synthesizes the correct modification; currently, in the program, createBigDecimal() is not called directly in other part of the production code except createNumber() and the test code. In our paper, we consider these three patches as correct and that's why Evosuite has 3 false positives.

Files

Daikon-output.zip

Files (1.3 GB)

Name Size
md5:778eca12cdcb7954dc13929b4d25f7dc
17.4 MB Preview Download
md5:d4e424948d7264db88c3155515e08ae7
552.9 MB Preview Download
md5:e3b68e17dd4be6a2bafadd6c6b30b52e
43.2 MB Preview Download
md5:11203b88e6ae8a657757c6b5842d5a46
1.2 MB Preview Download
md5:431acab7ecb952832818e9134554a981
23.4 MB Preview Download
md5:7a0e0962412f77ea6519ee2e7b0bde54
351.5 MB Preview Download
md5:559169b5c59cd0ba2afbc06344e3355d
276.6 MB Preview Download
md5:6eb6174a010a5838f30131a038458871
74.0 kB Preview Download