Flaky Test Dataset to Accompany "FlakeFlagger: Predicting Flakiness Without Rerunning Tests"

Abdulrahman Alshammari; Christopher Morris; Michael Hilton; Jonathan Bell

doi:10.5281/zenodo.4450723

Published January 19, 2021 | Version 0.1.0

Dataset Open

Flaky Test Dataset to Accompany "FlakeFlagger: Predicting Flakiness Without Rerunning Tests"

1. George Mason University
2. Carnegie Mellon University
3. Northeastern University

When developers make changes to their code, they typically run regression tests to detect if their recent changes (re)introduce any bugs. However, many tests are flaky, and their outcomes can change non-deterministically, failing without apparent cause. Flaky tests are a significant nuisance in the development process, since they make it more difficult for developers to trust the outcome of their tests. The traditional approach to identify flaky tests is to rerun them multiple times: if a test is observed both passing and failing on the same code, it is definitely flaky. We conducted a very large empirical study looking for flaky tests by rerunning the test suites of 24 projects 10,000 times each, and found that even with this many reruns, some flaky tests were still not detected. We propose FlakeFlagger, a novel approach that collects a set of features describing the behavior of each test, and then predicts tests that are likely to be flaky based on similar behavioral features. We found that FlakeFlagger correctly labeled at least as many tests as flaky as a state-of-the-art flaky test classifier, but that FlakeFlagger reported far fewer false positives (an increase in precision from just 11% to 60%). This lower false positive rate translates directly to saved time for researchers and developers who use the classification result to guide more expensive flaky test detection processes. By investigating the information gain of each feature, we conclude that test execution time, overall test coverage, coverage of recently changed lines and usage of third party libraries are effective predictors of test flakiness. We did not find any keywords or tokens in the source code of tests that were effective in predicting test flakiness, and did not find the presence of test smells to be effective in predicting test flakiness.

This archive contains the dataset that we collected of flaky tests, along with the features that we collected from each test.

Contents:
Project_Info.csv: List of projects and their revisions studied
<project-slug>.tgz: An archive of all of the maven build logs and test reports from each of the 10,000 runs of that project's test suite.
test_results.csv: Summary of the number of passing and failing runs for each test in each project.
"Run ID" is a key into the <project-slug>.tgz archive also in this artifact, which refers to the run that we observed the test fail on.
test_features.csv: Summary of the features that each test had, as per our feature detectors described in the paper

To be added in the final revision of this artifact:
All scripts used to generate and process these results. These scripts are currently located at https://github.com/AlshammariA/FlakeFlagger - once they pass the ICSE 2021 artifact evaluation (and we make any changes as suggested by the reviewers), we will add them to this artifact and create a final version of this permanent archive.

Files

Project_Info.csv

Files (34.5 GB)

Name	Size	Download all
activiti-activiti.tgz md5:ed1745e8a512b0ea24d3a4ec8aa012af	3.5 GB	Download
Alluxio-alluxio.tgz md5:68b458173dde7876da096db87b10b786	158.9 MB	Download
apache-ambari.tgz md5:32758ba25c3764ac332fcb6546f3b157	3.1 GB	Download
apache-commons-exec.tgz md5:ba376bdfe0a5e4d74a02283b424fabfc	37.6 MB	Download
apache-hbase.tgz md5:edb005ad6a020afcf690537386b9dbcf	8.1 GB	Download
apache-httpcore.tgz md5:080c69159f36d149cac8d1d04cbe7858	193.5 MB	Download
apache-incubator-dubbo.tgz md5:9a52deab110b7aacb043662a1765a4ea	1.3 GB	Download
doanduyhai-Achilles.tgz md5:25af1d653590f1fe0379386231d2bcbd	667.6 MB	Download
elasticjob-elastic-job-lite.tgz md5:5103caace6d0f5bb2b50c0714c0132a4	247.6 MB	Download
google-jimfs.tgz md5:4a4d27a054cc54a02dfae56f1f8bc117	42.5 MB	Download
hector-client-hector.tgz md5:24a12413b12e40aa00d59541042e131d	231.0 MB	Download
jknack-handlebars.java.tgz md5:375a247f347237b033565b9c27809fe5	317.2 MB	Download
joel-costigliola-assertj-core.tgz md5:ae7d79db3f9f17cdd9e7290e36de55ee	2.7 GB	Download
kevinsawicki-http-request.tgz md5:233c5949a384c5cb4c9a5cf6115cc951	39.6 MB	Download
ninjaframework-ninja.tgz md5:d7c4318cbc4db28fbfe3226d6b7b6c72	305.4 MB	Download
orbit-orbit.tgz md5:ebf52214a3c0df4b00c359d34989ddd4	161.9 MB	Download
Project_Info.csv md5:5b3392a4f7367b2a566b919a98989a97	1.7 kB	Preview Download
qos-ch-logback.tgz md5:ca0fc014a77fcc454b709f405b463728	688.6 MB	Download
spring-projects-spring-boot.tgz md5:b47ca805c8216be63b262c987dfab0dd	2.8 GB	Download
square-okhttp.tgz md5:b7a5550a82b997e592abef7dc70adcc8	521.7 MB	Download
test_features.csv md5:63306c05fafcc6446911ab7000f85ae0	6.9 MB	Preview Download
test_results.csv md5:fcd2674ab42068de627ec6afce4f6d1a	3.5 MB	Preview Download
tootallnate-java-websocket.tgz md5:03e02ffafa3c9945b5a4ded6159ea70a	58.0 MB	Download
undertow-io-undertow.tgz md5:1d647e6191e820c7255b7f3b3193e753	179.3 MB	Download
wildfly-wildfly.tgz md5:90a04c8ddbb87f9354c3809a24ffcb13	8.1 GB	Download
wro4j-wro4j.tgz md5:794524384fca08704e3894a81e77a741	781.4 MB	Download
zxing-zxing.tgz md5:c4e6c7cf48c9af65c05aee064e841ef2	306.7 MB	Download

	All versions	This version
Views	2,163	1,155
Downloads	4,497	1,704
Data volume	3.6 TB	2.4 TB

Flaky Test Dataset to Accompany "FlakeFlagger: Predicting Flakiness Without Rerunning Tests"

Creators

Description

Files

Project_Info.csv

Files (34.5 GB)