Published June 22, 2021 | Version 1.0.0
Dataset Open

Flaky Test Dataset to Accompany "FlakeFlagger: Predicting Flakiness Without Rerunning Tests"

  • 1. George Mason University
  • 2. Carnegie Mellon University
  • 3. Northeastern University

Description

When developers make changes to their code, they typically run regression tests to detect if their recent changes (re)introduce any bugs. However, many tests are flaky, and their outcomes can change non-deterministically, failing without apparent cause. Flaky tests are a significant nuisance in the development process, since they make it more difficult for developers to trust the outcome of their tests. The traditional approach to identify flaky tests is to rerun them multiple times: if a test is observed both passing and failing on the same code, it is definitely flaky. We conducted a very large empirical study looking for flaky tests by rerunning the test suites of 24 projects 10,000 times each, and found that even with this many reruns, some flaky tests were still not detected. We propose FlakeFlagger, a novel approach that collects a set of features describing the behavior of each test, and then predicts tests that are likely to be flaky based on similar behavioral features. We found that FlakeFlagger correctly labeled at least as many tests as flaky as a state-of-the-art flaky test classifier, but that FlakeFlagger reported far fewer false positives (an increase in precision from just 11% to 60%). This lower false positive rate translates directly to saved time for researchers and developers who use the classification result to guide more expensive flaky test detection processes. By investigating the information gain of each feature, we conclude that test execution time, overall test coverage, coverage of recently changed lines and usage of third party libraries are effective predictors of test flakiness. We did not find any keywords or tokens in the source code of tests that were effective in predicting test flakiness, and did not find the presence of test smells to be effective in predicting test flakiness.

This archive contains the dataset that we collected of flaky tests, along with the features that we collected from each test.

Contents:
Project_Info.csv: List of projects and their revisions studied
build-logs-<project-slug>.tgz: An archive of all of the maven build logs from each of the 10,000 runs of that project's test suite. 
failing-test-reports-<project-slug>.tgz An archive of all of the surefire XML reports for each failing test of each build of each project.
test_results.csv: Summary of the number of passing and failing runs for each test in each project. 
"Run ID" is a key into the <project-slug>.tgz archive also in this artifact, which refers to the run that we observed the test fail on.
test_features.csv: Summary of the features that each test had, as per our feature detectors described in the paper
flakeflagger-code.zip: All scripts used to generate and process these results. These scripts are also located at https://github.com/AlshammariA/FlakeFlagger

Notes

This updated revision contains a corrected test_features.csv that contains exactly the set of tests used in our evaluation.

Files

flakeflagger-code.zip

Files (16.2 GB)

Name Size Download all
md5:efa9afb7f3f958381c090920e6851cb5
147.2 MB Download
md5:8a769911879d4a4d3b2453b00bb49dc1
198.9 MB Download
md5:2b70250692557da1477230add66d1089
57.2 MB Download
md5:7f5c5cb0e34d288c5a1584f8e4fb978f
4.2 MB Download
md5:00ed8e862da5328a3c0feb61068c1784
27.1 MB Download
md5:99a89f3c5a135caad92f32112c67a6ee
11.6 MB Download
md5:ef71e64511141e94a5a21fc34941a38c
1.0 GB Download
md5:f8e57cd136ee20b19be8558368cfd731
419.3 MB Download
md5:9cf961618de2341f1137f93ccc92f5fa
60.4 MB Download
md5:5b12cdbf609b98dbcb7861e7db3d2786
2.2 MB Download
md5:ae680dd80d2d432138f8b6e12117f90d
133.5 MB Download
md5:ffb2d654eea8c11e2fa3be235a882709
138.1 MB Download
md5:ab36e70e3ba25123ad3b822c085c6c67
227.6 MB Download
md5:3728040d417f13a524009b6ea3f24a8f
4.7 MB Download
md5:61a863b2d0c0103ff6a6ddace9fd4ebd
251.9 MB Download
md5:e1e9d46584b943b055d96b3562933ce6
102.9 MB Download
md5:c762232dc6a279a10f0168331bfdeae0
347.6 MB Download
md5:92b8748ac43593d6267b7eef90e06f8c
4.2 GB Download
md5:0710357668e63d45801d85f5d90534f0
219.5 MB Download
md5:809f66200abb9cd4bfe0f39faf2bb29c
3.8 MB Download
md5:3f68ab5a9440e2bde29667aeb3de31f2
70.6 MB Download
md5:6b6d550aed0b0fa5dc1de73b54e11555
760.8 MB Download
md5:9bbab421a955649a34d5c0e91b28a5a8
433.3 MB Download
md5:4e21c103762fd962c527af898fb6ad7b
120.5 MB Download
md5:cbf67aafc7da3d1d1aa8059b7b77a5a5
1.1 MB Download
md5:66b80e35f1e702a4fdfac74d06b20e2f
3.5 MB Download
md5:c1ae92f105d6a2096bfa1492e0d79760
32.9 MB Download
md5:afda5aa0fa2b619f139b1f704e0f91b7
12.5 kB Download
md5:00ded9cf23ae1241d9055aacea712d35
175.2 MB Download
md5:935721e7d07d5017248d247b0e056cec
115.1 kB Download
md5:f07098ea6c0be7ce94fec9d2af94d25e
1.3 MB Download
md5:df997c28b933ce86ef5b88f09f0f88db
48.5 kB Download
md5:54086f26d2f6d9395e943786b303a169
3.1 kB Download
md5:065cd31578b811cc33e42c17a20d3c78
5.6 MB Download
md5:701614b1dc264c2575c8e94e5a30f656
56.1 MB Download
md5:82f7dbd790342595b811f318e6678b94
164.4 kB Download
md5:b85eb20ec182185510777de0dec59e25
13.1 MB Download
md5:8574442a7cd2a42b9614a71b1397d4f5
234.8 kB Download
md5:644ecb567c947b269f009551bb62e902
12.4 MB Download
md5:ba835bc68c96d6667437084b1fe8f94b
878.8 kB Download
md5:0873173a70d3e28257c10e2cabe22dd2
3.3 GB Download
md5:9f33ad7c2eaba2f4c907b6b4ccd5fd67
3.6 GB Download
md5:bed516fc922e0c029530c364148fd79b
719.5 kB Download
md5:3281eaeea762b2dcef704644e05af015
33.1 kB Download
md5:720d3b4f1889ba3ffd29dbc0377547d1
1.5 MB Download
md5:85b36ee4844934bc6874a753e6d74fe8
42.6 MB Download
md5:93cb0ae50da82167aa363591e029f1dc
128.4 kB Download
md5:7e1df9acbc1daf1ca1b05e2a55ad157b
19.2 MB Preview Download
md5:5b3392a4f7367b2a566b919a98989a97
1.7 kB Preview Download
md5:fe3957a3ab3abbe52ba743f7e60144ac
5.7 MB Preview Download
md5:684cb91a0afd34cea2865e0b0fe78a03
2.9 MB Preview Download