Published April 17, 2020 | Version 1.0
Dataset Open

Understanding the Differences in the Unit Tests Produced by Humans and Coverage-Directed Automated Generation

  • 1. Chalmers and the University of Gothenburg
  • 2. University of South Carolina
  • 3. University of South Carolina Upstate

Description

Automated test generation - the use of tools to create all or part of test cases - has a critical role in controlling the cost of testing. A particular area of focus in automated test generation research is unit testing. Unit tests are intended to test the functionality of a small isolated unit of code - typically a class. 

In automated test generation research, it is not abnormal to compare the effectiveness of the test cases generated by automation to those written by humans. Indeed, a common premise of automation research - implicitly or explicitly - is that effective automation can replace human effort. The hypothesis postulated is that, if we make enough advances, a tool could replace the tremendous effort expended by a human tester to create those unit tests. 

This observation leads to two natural questions. Do the tests produced by humans and automation differ in the types of faults they detect? If so, in what ways are the tests produced and the faults detected different? Understanding when and how to deploy automation requires a clearer understanding of how the tests produced by humans and automation are different, and how those differences in turn affect the ability of those test cases to detect faults. Insight into the differences between human and automation-produced test cases could lead not only to improvements in the ability of automation to replace human effort, but improvements in our ability to use automation to augment human effort. The goal of this study is to explore and attempt to quantify those differences. 

In this study, we make use of the EvoSuite test generation framework for Java. We generate test suites targeting two configurations - a traditional single-criterion configuration targeting Branch Coverage over the source code and a more sophisticated multi-objective configuration targeting eight criteria. Controlling for coverage level, we compare the suites generated by EvoSuite to those written by humans for five mature, popular open-source systems in terms of both their syntactic structure and their ability to detect 45 different types of faults. Our goal is not to declare a "winner'", but to identify the areas where humans and automation differ in their capabilities, and - in turn - to make recommendations on how human and automation effort can be combined to overcome gaps in the coverage of the other. We aim to identify lessons that will improve human practices, lead to the creation of more effective automation, and present natural opportunities to both augment and replace human effort. 

Files

replication_package.zip

Files (952.1 MB)

Name Size Download all
md5:fa9d7d8d93bc5ce15797e4c487b83dab
952.1 MB Preview Download