The dataset of the ASE'20 paper titled "Automated Patch Correctness Assessment: How Far are We?"

Wang, Shangwen; Lin, Bo

doi:10.5281/zenodo.3730599

Published April 27, 2020 | Version v1

Dataset Open

The dataset of the ASE'20 paper titled "Automated Patch Correctness Assessment: How Far are We?"

1. National University of Defense Technology

This is the experiment result of the ASE'20 paper titled "Automated Patch Correctness Assessment: How Far are We?".

If you use our data for academic research, please cite our paper as:

@inproceedings{wang2020automated,
  title={Automated Patch Correctness Assessment: How Far are We?},
  author={Wang, Shangwen and Wen, Ming and Lin, Bo and Wu, Hongjun and Qin, Yihao and Zou, Deqing and Mao, Xiaoguang and Jin, Hai}, 
  booktitle={Proceedings of the 35th International Conference on Automated Software Engineering (ASE)}, 
  year={2020}, 
  organization={ACM}
}

The file Patches.zip includes all the patches we take into consideration in this study. Note that 269 patches come from "Automated Patch Assessment for Program Repair at Scale (Ye et al.), Technical report 1909.13694, arXiv, 2019".

The file Patches_for_Static include all the class files we used for static method.

The file Tests-oracle includes all the test cases generated by Evosuite and Randoop on the fixed version programs.

The file Tests-buggy includes all the test cases generated by Evosuite and Randoop on the buggy version programs.

The file DiffTGen-result includes ingredients and output information of DiffTGen.

The file Daikon-output includes inferred invariants of each patch and its corresponding ground-truth.

The file PATCH-SIM_result includes the output vector files from PATCH-SIM and E-PATCH-SIM.

The file Training_result includes the output of six ML algorithms with or without oracle.

Chart: 1-26;
Closure: 14, 18, 31, 33, 38, 40, 57, 62, 63, 70, 73, 86, 92, 93, 115, 123, 126;
Lang: 6, 7, 10, 16, 20, 21, 22, 24, 26, 27, 33, 35, 38, 39, 41, 43, 44, 45, 50, 51, 55, 57, 58, 59, 60, 61, 63;
Math: 2, 3, 4, 5, 6, 8, 20, 22, 25, 28, 30, 31, 32, 33, 34, 35, 39, 41, 49, 50, 53, 56, 57, 58, 59, 60, 61, 63, 65, 68, 70, 71, 73, 74, 75, 79, 80, 81, 82, 85, 86, 88, 89, 90, 93, 97, 98, 99, 104;
Time: 4, 7, 11, 14, 15, 19.

Please note that for bugs in the above table, the Evosuite tests on the fixed version programs are reused from a previous study. We thank He Ye, Matias Martinez, and Martin Monperrus so much for sharing their data.

Notice! For patches under the folder Patches_ICSE, those under Ddifferent and Dsame folders are all correct patches. Different and Same only indicate whether the patch is syntactically identical to the ground truth patch.

Patches generated for Mockito project (2 in total): Kali-A-Mockito-10; Arja-Mockito-10

Patches do not pass plausibility check (6 in total): Kali-Closure-133; kPAR-Chart-12; FixMiner-Chart-12; patch1-Lang-6-SketchFix-plausible; patch2-Lang-6-SketchFix-plausible; patch1-Math-2-SOFix

Patches that are mistakenly labeled (12 in total): patch2-Lang-51-Jaid; patch1-Lang-43-CapGen; patch2-Lang-43-CapGen; patch2-Math-53-CapGen; patch2-Math-53-Jaid; jKali-Lang-7; ACS-Lang-35; Arja-Math-35; SimFix-Math-72; SimFix-Closure-19; Arja-Math-50; SimFix-Lang-60

Detailed reasons for the mislabeled patches: 1. the ground-truth patch modifies multiple locations while the generated patch only modifies one of them (2/12, SimFix-Math-72, SimFix-Lang-60); 2. the edit points in the generated patch are different from those in ground-truth patch (8/12, patch2-Lang-51-Jaid, patch2-Math-53-Jaid, patch1-Lang-43-CapGen, patch2-Lang-43-CapGen, patch2-Math-53-CapGen, ACS-Lang-35, SimFix-Closure-19, Arja-Math-50); 3. the generated patch doesnot fulfill the intended function in ground-truth (2/12, jKali-Lang-7, Arja-Math-35).

Take Arja-Math-50 as an example, this patch deletes a conditional statement which deals with an unexpected input (null) in the method verifyBracketing. However, in the oracle program, this conditional statement still exists. Then, Randoop generated a test case by calling verifyBracketing with a null argument. This test passed on the ground-truthpatch but failed on the patch generated by Arja due to the removeof the exception handling statements. As a result, this patch is actually overfitting but mistakenly labeled as correct. We have confirmed this case with Kui Liu, the first author of the recent ICSE'20 paper (Title: On the Efficiency of Test Suite based Program Repair) which makes up our patch benchmark.

Border line Patches (3 in total): ACS-Lang-7; kPAR-Lang-7; TBar-Lang-7. Reasons for overfitting: Evosuite generates some tests that fail on those patches, e.g., test049 in Seed 1; the Java documentation above the function states that it needs to deal with the situation where the input cannot be converted. Reasons for correct: it synthesizes the correct modification; currently, in the program, createBigDecimal() is not called directly in other part of the production code except createNumber() and the test code. In our paper, we consider these three patches as correct and that's why Evosuite has 3 false positives.

Files

Daikon-output.zip

Files (1.3 GB)

Name	Size
Daikon-output.zip md5:778eca12cdcb7954dc13929b4d25f7dc	17.4 MB	Preview Download
DiffTGen-result.zip md5:d4e424948d7264db88c3155515e08ae7	552.9 MB	Preview Download
PATCH-SIM_result.zip md5:e3b68e17dd4be6a2bafadd6c6b30b52e	43.2 MB	Preview Download
Patches.zip md5:11203b88e6ae8a657757c6b5842d5a46	1.2 MB	Preview Download
Patches_for_Static.zip md5:431acab7ecb952832818e9134554a981	23.4 MB	Preview Download
Tests-buggy.zip md5:7a0e0962412f77ea6519ee2e7b0bde54	351.5 MB	Preview Download
Tests-oracle.zip md5:559169b5c59cd0ba2afbc06344e3355d	276.6 MB	Preview Download
Training_result.zip md5:6eb6174a010a5838f30131a038458871	74.0 kB	Preview Download

	All versions	This version
Views	1,449	1,447
Downloads	903	903
Data volume	169.9 GB	169.9 GB

The dataset of the ASE'20 paper titled "Automated Patch Correctness Assessment: How Far are We?"

Authors/Creators

Description

Files

Daikon-output.zip

Files (1.3 GB)