Flaky and True Failures Logs to Accompany "230,439 Test Failures Later: An Empirical Evaluation of Flaky Failure Classifiers"

Alshammari, Abdulrahman; Ammann, Paul; Hilton, Michael; Bell, Jonathan

doi:10.5281/zenodo.10531160

Published January 19, 2024 | Version v1

Dataset Open

Flaky and True Failures Logs to Accompany "230,439 Test Failures Later: An Empirical Evaluation of Flaky Failure Classifiers"

1. George Mason University
2. Carnegie Mellon University
3. Northeastern University

Flaky tests are tests that can non-deterministically pass or fail, even in the absence of code changes. Despite being a source of false alarms, flaky tests often remain in test suites once they are detected, as they also may be relied upon to detect true failures. Hence, a key open problem in flaky test research is: How to quickly determine if a test failed due to flakiness, or if it detected a bug? The state-of-the-practice is for developers to re-run failing tests: if a test fails and then passes, it is flaky by definition; if the test persistently fails, it is likely a true failure. However, this approach can be both ineffective and inefficient. An alternate approach that developers may already use for triaging test failures is failure de-duplication, which matches newly discovered test failures to previously witnessed flaky and true failures. However, because flaky test failure symptoms might resemble those of true failures, there is a risk of missclassifying a true test failure as a flaky failure to be ignored. Using a dataset of 498 flaky tests from 22 open-source Java projects, we collect a large dataset of 230,439 failure messages (both flaky and not), allowing us to empirically investigate the efficacy of failure de-duplication. We find that for some projects, this approach is extremely effective (with 100\% specificity), while for other projects, the approach is entirely ineffective. By analyzing the characteristics of these flaky and non-flaky failures, we provide useful guidance on how developers should rely on this approach.

Other

This dataset contains both flaky failure logs (gathered from the FlakeFlagger dataset) and true failure logs (collected through mutation analysis). The dataset consists of the following content:

project_name.tgz:

The project folder includes subfolders, each representing a flaky test.
Each subfolder, named after a test, contains the following files:
- pit-report.xml: Result of running a test on mutations. If the file name contains 5X, it indicates the report's result from 5 runs (reported once). If it contains 1X, it reports the result of a single run against all mutants. Total runs per test are 20X.
- test_name.xml: Contains sets of all flaky and true failures of a particular test.
- summary-of-test_name.xml: A simplified version of the test_name.xml file used as the final shape to be used in our paper (See our repo below for more details).

FlakeFlagger-testCode-CUT-reports.tgz:

List of Code-Under-Test files names and test code files names used to collect classifier features (refer to Table 1 in the paper).

pit-indexes-per-test.tgz:

JSON file with test names as keys and mutation indexes in pit reports indicating killed, survived, or flaky mutants. This file can be reproduced using the scripts available in our GitHub repo.

All scripts for replicating the experiment or analyzing the dataset can be accessed in our GitHub repo (https://github.com/AlshammariA/FailureLogClassifiers)

Files

Files (1.6 GB)

Name	Size	Download all
activiti-activiti.tgz md5:b34dfc19cee359f8e34506a9605fcbb6	625.2 MB	Download
Alluxio-alluxio.tgz md5:b6007a1b8aa0da7079622c622f4718d1	245.0 MB	Download
apache-ambari.tgz md5:c5cf92b63306688c4931e25014c106fd	92.1 MB	Download
apache-commons-exec.tgz md5:4e604a7c7672f0344091e7813b0e541f	106.5 kB	Download
apache-hbase.tgz md5:b9c7af9616ccc93803e2e9704572a820	235.2 MB	Download
apache-httpcore.tgz md5:4cec9292cc7256cacbc89d95ab2cd312	25.1 MB	Download
doanduyhai-Achilles.tgz md5:a3d553c0898b19cfdb4e3a1a69b2eacb	2.7 MB	Download
elasticjob-elastic-job-lite.tgz md5:4b105d4c415972fb5b6363ebe96be652	4.9 MB	Download
FlakeFlagger-testCode-CUT-reports.tgz md5:a093410dbd9edb6d76be35d347bdf3bb	157.7 kB	Download
hector-client-hector.tgz md5:128818252e19f2707f3650fb33dad75f	19.6 MB	Download
jknack-handlebars.java.tgz md5:d590f95239a621afca17bfaaabcb813d	661.6 kB	Download
joel-costigliola-assertj-core.tgz md5:64caf04dcdd53fdcce6ee3b77032184b	496.6 kB	Download
kevinsawicki-http-request.tgz md5:c1fb7b9172eae90e1bf0a9145445d3a4	3.0 MB	Download
ninjaframework-ninja.tgz md5:0cefac46650da35a3043c726006845e0	1.4 MB	Download
orbit-orbit.tgz md5:7d4dedfca5376527ebf9ee45b22b5661	9.2 MB	Download
pit-indexes-per-test.tgz md5:930bc14e74a630e235a4a53ac23730d2	646.7 kB	Download
qos-ch-logback.tgz md5:bb5ea3d9c2b2986f125f03486ce203f8	25.0 MB	Download
spring-projects-spring-boot.tgz md5:5acf44ad4b31d61c306198a15c8cef40	25.6 MB	Download
square-okhttp.tgz md5:d47fa4a690ff8ba2d22b30ad651a2bfc	161.4 MB	Download
tootallnate-java-websocket.tgz md5:d6196beea85ee0a24d4a6bd612f00775	16.3 MB	Download
undertow-io-undertow.tgz md5:d8106e69728a9cdd6d9a74ad75958608	14.6 MB	Download
wildfly-wildfly.tgz md5:b669b32b5b79a3f99a9e3c6aea56e7e4	113.6 MB	Download
wro4j-wro4j.tgz md5:8020134acfeef711bd89d22d60a862eb	15.6 MB	Download
zxing-zxing.tgz md5:2cfa83e8cd00be0000a0cf0bdb6dd92b	3.0 MB	Download

	All versions	This version
Views	116	116
Downloads	355	355
Data volume	25.1 GB	25.1 GB

Flaky and True Failures Logs to Accompany "230,439 Test Failures Later: An Empirical Evaluation of Flaky Failure Classifiers"

Creators

Description

Other

Files

Files (1.6 GB)