Planned intervention: On Wednesday June 26th 05:30 UTC Zenodo will be unavailable for 10-20 minutes to perform a storage cluster upgrade.
Published May 23, 2023 | Version 1.0
Dataset Open

Enhanced Westermo dataset - Transformed and Modified for Test case Selection and Priorotization in the context of Continuous Integration and Reinforcement Learning.

Description

Overview

This repository contains a modified version of the existing, recently published dataset, Westermo. The initial dataset was gathered at Westermo Network Technologies AB, located in Västerås, Sweden. It encompasses over 1 Million verdicts obtained from testing embedded systems, collected over a span of more than 500 consecutive days of nightly testing. The dataset has been transformed and tailored specifically to cater to the research community, particularly for addressing challenges such as regression test selection, identification of flaky tests, and visualization of test results. The original dataset can be accessed through the reference provided in [1].

The Westermo dataset offers valuable historical information regarding the execution of test cases and their corresponding results. It serves as a valuable resource for evaluating and comparing different Test case Selection and Prioritization (TSP) techniques, enabling researchers to identify test cases that are more likely to fail during subsequent executions. Test cases in the dataset are characterized by attributes such as execution duration, previous last execution time, and the results of their recent executions.

This dataset offers valuable historical information regarding the execution of test cases and their corresponding results. It serves as a valuable resource for evaluating and comparing different test case prioritization and selection techniques, enabling researchers to identify test cases that are more likely to fail during subsequent executions. Test cases in the dataset are characterized by attributes such as execution duration, previous last execution time, and the results of their recent executions.

Table 1: Dataset Overview
Test Cases 1855
CI Cycles 15,197
Verdict 1,036,818
Failed 5.03%

 

 

 

 

 

 

 

However, the diversity and multitude of the features in the dataset can be irrelevant to some TSP approaches. This led us to perform a dataset conversion, where we customized Westermo to have the same features from Paint Control and IOF/ROL, two widely used datasets in Reinforcement Learning based TSP approaches.

This conversion required the combination of multiple variables and generating the target ones. When it comes to generating the “LastResults” and “Cycle” values, further analysis was required and the data handling needed an in-depth understanding of how the nightly testing was conducted. This led us to investigate what a CI cycle is in their context, and we followed their definition of a session, stating that “a session is when we run a suite of tests on one test system with a certain software version and testware version”. When splitting the data according to the 9 different systems used, we were able to generate 9 different sub-sets that fit the CI context.

 

File Format

The compressed .zip file contains 9 files, each one corresponding to each of the 9 systems. The datasets are available in CSV format, with the semicolon (;) serving as the delimiter. The columns included are represented in the table below along with their descriptions.

Table 2: Parameters of the dataset
Column Name Content
Id Unique numeric identifier of the test execution 
Name Unique numeric identifier of the test case
Duration Approximated runtime of the test case
CalcPrio Priority of the test case, calculated by the prioritization algorithm (output column, initially 0)
LastRun Previous last execution of the test case as date-time-string (Format: YYYY-MM-DD HH:ii )
LastResults List of previous test results (Failed: 1, Passed: 0), ordered by ascending age. Lists are delimited by [ ].
Verdict

Test verdict of this test execution (Failed: 1, Passed: 0)

Cycle The number of the CI cycle this test execution belongs to.

 

The implications of this conversion are important as it can help the previous works to re-assess their approaches and have more data for training and testing, as well as opening a broader data spectrum for future researchers in this field to find ready-to-use, rich datasets, on which they could evaluate their approaches and contribute to the TSP community. This also addresses the limitations in the field discussed in the systematic literature review [2], stating that future research on TSP techniques should focus on collecting data from more recent subjects in a CI context with varying failure rates and larger execution times, as reproducible studies with appropriate datasets are needed to develop a usable body of knowledge regarding TSP over time. We believe that this conversion of the Westermo dataset is our contribution to alleviating the gap for the RL-based approaches.

The original dataset can be found here.

Files

TSP_Westermo_Dataset.zip

Files (31.3 MB)

Name Size Download all
md5:744131ba9bee8aceaf5888c2ee39d683
31.3 MB Preview Download

Additional details

References