TRAVEL: A Dataset with Toolchains for Test Generation and Regression Testing of Self-driving Cars Software

Pouria Derakhshanfar; Annibale Panichella; Alessio Gambi; Vincenzo Riccio; Christian Birchler; Sebastiano Panichella

doi:10.5281/zenodo.5911161

Published January 27, 2022 | Version v1

Dataset Open

TRAVEL: A Dataset with Toolchains for Test Generation and Regression Testing of Self-driving Cars Software

1. Delft University of Technology
2. University of Passau
3. Università della Svizzera Italiana
4. Zurich University of Applied Sciences

Introduction

This repository hosts the Testing Roads for Autonomous VEhicLes (TRAVEL) dataset. TRAVEL is an extensive collection of virtual roads that have been used for testing lane assist/keeping systems (i.e., driving agents) and data from their execution in state of the art, physically accurate driving simulator, called BeamNG.tech. Virtual roads consist of sequences of road points interpolated using Cubic splines.

Along with the data, this repository contains instructions on how to install the tooling necessary to generate new data (i.e., test cases) and analyze them in the context of test regression. We focus on test selection and test prioritization, given their importance for developing high-quality software following the DevOps paradigms.

This dataset builds on top of our previous work in this area, including work on

test generation (e.g., AsFault, DeepJanus, and DeepHyperion) and the SBST CPS tool competition (SBST2021),
test selection: SDC-Scissor and related tool
test prioritization: automated test cases prioritization work for SDCs.

Dataset Overview

The TRAVEL dataset is available under the data folder and is organized as a set of experiments folders. Each of these folders is generated by running the test-generator (see below) and contains the configuration used for generating the data (experiment_description.csv), various statistics on generated tests (generation_stats.csv) and found faults (oob_stats.csv). Additionally, the folders contain the raw test cases generated and executed during each experiment (test.<TEST_ID>.json).

The following sections describe what each of those files contains.

Experiment Description

The experiment_description.csv contains the settings used to generate the data, including:

Time budget. The overall generation budget in hours. This budget includes both the time to generate and execute the tests as driving simulations.
The size of the map. The size of the squared map defines the boundaries inside which the virtual roads develop in meters.
The test subject. The driving agent that implements the lane-keeping system under test. The TRAVEL dataset contains data generated testing the BeamNG.AI and the end-to-end Dave2 systems.
The test generator. The algorithm that generated the test cases. The TRAVEL dataset contains data obtained using various algorithms, ranging from naive and advanced random generators to complex evolutionary algorithms, for generating tests.
The speed limit. The maximum speed at which the driving agent under test can travel.
Out of Bound (OOB) tolerance. The test cases' oracle that defines the tolerable amount of the ego-car that can lie outside the lane boundaries. This parameter ranges between 0.0 and 1.0. In the former case, a test failure triggers as soon as any part of the ego-vehicle goes out of the lane boundary; in the latter case, a test failure triggers only if the entire body of the ego-car falls outside the lane.

Experiment Statistics

The generation_stats.csv contains statistics about the test generation, including:

Total number of generated tests. The number of tests generated during an experiment. This number is broken down into the number of valid tests and invalid tests. Valid tests contain virtual roads that do not self-intersect and contain turns that are not too sharp.
Test outcome. The test outcome contains the number of passed tests, failed tests, and test in error. Passed and failed tests are defined by the OOB Tolerance and an additional (implicit) oracle that checks whether the ego-car is moving or standing. Tests that did not pass because of other errors (e.g., the simulator crashed) are reported in a separated category.

The TRAVEL dataset also contains statistics about the failed tests, including the overall number of failed tests (total oob) and its breakdown into OOB that happened while driving left or right. Further statistics about the diversity (i.e., sparseness) of the failures are also reported.

Test Cases and Executions

Each test.<TEST_ID>.json contains information about a test case and, if the test case is valid, the data observed during its execution as driving simulation.

The data about the test case definition include:

The road points. The list of points in a 2D space that identifies the center of the virtual road, and their interpolation using cubic splines (interpolated_points)
The test ID. The unique identifier of the test in the experiment.
Validity flag and explanation. A flag that indicates whether the test is valid or not, and a brief message describing why the test is not considered valid (e.g., the road contains sharp turns or the road self intersects)

The test data are organized according to the following JSON Schema and can be interpreted as RoadTest objects provided by the tests_generation.py module.

{
  "type": "object",
  "properties": {
    "id": { "type": "integer" },
    "is_valid": { "type": "boolean" },
    "validation_message": { "type": "string" },
    "road_points": { §\label{line:road-points}§
      "type": "array",
      "items": { "$ref": "schemas/pair" },
    },
    "interpolated_points": { §\label{line:interpolated-points}§
      "type": "array",
      "items": { "$ref": "schemas/pair" },
    },
    "test_outcome": { "type": "string" },  §\label{line:test-outcome}§
    "description": { "type": "string" },
    "execution_data": { 
      "type": "array",
      "items": { "$ref" : "schemas/simulationdata" }
    }
  },
  "required": [
    "id", "is_valid", "validation_message",
    "road_points", "interpolated_points"
  ]
}

Finally, the execution data contain a list of timestamped state information recorded by the driving simulation. State information is collected at constant frequency and includes absolute position, rotation, and velocity of the ego-car, its speed in Km/h, and control inputs from the driving agent (steering, throttle, and braking). Additionally, execution data contain OOB-related data, such as the lateral distance between the car and the lane center and the OOB percentage (i.e., how much the car is outside the lane).

The simulation data adhere to the following (simplified) JSON Schema and can be interpreted as Python objects using the simulation_data.py module.

{
    "$id": "schemas/simulationdata",
    "type": "object",
    "properties": {
        "timer" : { "type": "number" },
        "pos" : { 
                  "type": "array",
                  "items":{ "$ref" : "schemas/triple" }
                }
        "vel" : { 
                  "type": "array",
                  "items":{ "$ref" : "schemas/triple" }
                }
        "vel_kmh" : { "type": "number" },
        "steering" : { "type": "number" },
        "brake" : { "type": "number" },
        "throttle" : { "type": "number" },
        "is_oob" : { "type": "number" },
        "oob_percentage" : { "type": "number" } §\label{line:oob-percentage}§
    },
  "required": [
    "timer", "pos", "vel", "vel_kmh", 
    "steering", "brake", "throttle",
    "is_oob", "oob_percentage"
  ]
}

Dataset Content

The TRAVEL dataset is a lively initiative so the content of the dataset is subject to change. Currently, the dataset contains the data collected during the SBST CPS tool competition, and data collected in the context of our recent work on test selection (SDC-Scissor work and tool) and test prioritization (automated test cases prioritization work for SDCs).

SBST CPS Tool Competition Data

The data collected during the SBST CPS tool competition are stored inside data/competition.tar.gz. The file contains the test cases generated by Deeper, Frenetic, AdaFrenetic, and Swat, the open-source test generators submitted to the competition and executed against BeamNG.AI with an aggression factor of 0.7 (i.e., conservative driver).

Name	Map Size (m x m)	Max Speed (Km/h)	Budget (h)	OOB Tolerance (%)	Test Subject
DEFAULT	200 × 200	120	5 (real time)	0.95	BeamNG.AI - 0.7
SBST	200 × 200	70	2 (real time)	0.5	BeamNG.AI - 0.7

Specifically, the TRAVEL dataset contains 8 repetitions for each of the above configurations for each test generator totaling 64 experiments.

SDC Scissor

With SDC-Scissor we collected data based on the Frenetic test generator. The data is stored inside data/sdc-scissor.tar.gz. The following table summarizes the used parameters.

Name	Map Size (m x m)	Max Speed (Km/h)	Budget (h)	OOB Tolerance (%)	Test Subject
SDC-SCISSOR	200 × 200	120	16 (real time)	0.5	BeamNG.AI - 1.5

The dataset contains 9 experiments with the above configuration. For generating your own data with SDC-Scissor follow the instructions in its repository.

Dataset Statistics

Here is an overview of the TRAVEL dataset: generated tests, executed tests, and faults found by all the test generators grouped by experiment configuration. Some 25,845 test cases are generated by running 4 test generators 8 times in 2 configurations using the SBST CPS Tool Competition code pipeline (SBST in the table). We ran the test generators for 5 hours, allowing the ego-car a generous speed limit (120 Km/h) and defining a high OOB tolerance (i.e., 0.95), and we also ran the test generators using a smaller generation budget (i.e., 2 hours) and speed limit (i.e., 70 Km/h) while setting the OOB tolerance to a lower value (i.e., 0.85). We also collected some 5, 971 additional tests with SDC-Scissor (SDC-Scissor in the table) by running it 9 times for 16 hours using Frenetic as a test generator and defining a more realistic OOB tolerance (i.e., 0.50).

Generating new Data

Generating new data, i.e., test cases, can be done using the SBST CPS Tool Competition pipeline and the driving simulator BeamNG.tech.

Extensive instructions on how to install both software are reported inside the SBST CPS Tool Competition pipeline Documentation; therefore, below we only summarize the overall installation process.

Installation

Install Python 3.7. (We tested the code using 3.7.9, so we suggest to install that version of Python)
Clone the SBST CPS Tool Competition pipeline
Request a (free) copy of the driving simulator at https://register.beamng.tech and, after receiving the registration key file (tech.key) and the download link, download and install the latest distribution of the software (0.24).
Create a python virtual environment (python -m venv .venv) inside the root of the SBST CPS Tool Competition pipeline and activate it. Please, be sure to call the virtual environment exactly .venv and place it in the correct position as the scripts to collect new data assume this setup.
Install inside the virtual environment all the SBST CPS Tool Competition pipeline requirements (pip install -r requirements.txt).

If you have successfully setup the python environment, activate the virtual environment, and run

python.exe competition.py --help

and check that the command outputs the usage of the SBST CPS Tool Competition pipeline.

Data Collection

Collecting new data is as simple as starting the SBST CPS Tool Competition pipeline with suitable parameters for test generation (--time-budget, --module-name, --module-path and --class-name), test executor (--executor), and test definition (--map-size, --speed-limit and --oob-tolerance).

To ease the collection of new data, we include the (PowerShell) scripts we used to create the current TRAVEL dataset. Those scripts can be found inside the data-collection folder and are named after the test generator and configuration they implement. For instance, Frenetic-DEFAULT.ps1 will execute the Frenetic test generator in the DEFAULT configuration, whereas Deeper-SBST.ps1 will execute the Deeper test generation in the SBST configuration. The description of these configurations is reported in the table above).

To run those scripts, open a PowerShell, cd to the folder containing the script (data-collection), and invoke the script you want. Assuming you have configured the SBST CPS Tool Competition pipeline as described above, the scripts will start the code pipeline and the BeamNG.tech driving simulator.

Once the run is finished, you will find the results under the results folder inside the SBST CPS Tool Competition pipeline project. The results of each experiment are stored in a uniquely named folder that clearly indicates the test generator used to produce them (see above for the detailed description of the produced results).

Data Usage

So far, the data contained in the TRAVEL dataset have been used for benchmarking test generators as well as for optimizing regression testing in the context of Self-driving car software.

We used the TRAVEL to study the problem of test generation from different perspectives.

In the last edition of the SBST Tool Challenge, we answer the question "How efficient and effective are the existing test generators for SDCs?". We assess test generation efficiency by counting how many tests are produced by each test generator within a given time budget. To assess test generation effectiveness, we count how many of the generated tests are valid and invalid.

In a recent study, we propose (along the DeepHyperion test generator) a novel approach to assess SDCs test suites adequacy by measuring the coverage of feature maps that they achieved. To do so, we extract structural and behavioral test case features from the test case descriptions and simulation data by looking at the geometrical properties describing the roads and the physical quantities describing the ego-car behavior (e.g., position, speed).

We also answer the question "How effective are the resulting tests in finding faults in SDCs?", by counting how many failures are triggered by the generated tests and how different those failures are (e.g., Left or Right OOBs). Specifically, we propose an approach to measure OOBs similarity by computing the edit distance of the road segments relevant to the corresponding failure.

Castellano et al. demonstrated that a dataset such as TRAVEL can be used to answer other interesting questions, such as "Which road representation is the most suitable for generating SDCs test cases?".

In the context of regression testing the TRAVEL dataset is also used to develop tools to select test cases that are likely to fail before executing them simulation (e.g., SDC-Scissor) and also to prioritize them in a single-objective and multi-objective approach by SDC-Prioritizer.

With SDC-Scissor we can select test cases that are more likely to fail. This allows us to answer the following research questions like "To what extent is it possible to identify safe and unsafe test scenarios for SDCs before executing them?" or "Does SDC-Scissor improve the cost-effectiveness of simulation-based testing of SDCs?" Khatiri et al.. SDC-Scissor achieves a F1-score (between 47% and 90%) compared to a baseline in identifying failing tests and reduced the time spent running uninformative tests by 107% and 170%. (Birchler et al.)

Regarding test prioritization, the TRAVEL dataset allows us to answer the following questions: "To what extent is it possible to apply test prioritization strategies to prioritize the execution of safe and unsafe test scenarios for SDCs?", “What is the cost-effectiveness of SDC-Prioritizer compared to baseline approaches?” and “What is the overhead introduced by it?”. To address the former question, we compared the test prioritization approaches in terms of fault detection rate (i.e., how fast faults are detected during the test execution process) using Cost cognizant Average Percentage of Fault Detection (APFD𝑐) (Epitropakis et al., Rothermel et al.) and concluded that SDC-Prioritizer (with 82.5% APFD𝑐) significantly outperformed the baseline random and greedy test prioritization approaches Birchler et al.. For the latter research question, we compared the time required by SDC-Prioritizer to sort tests against the time needed to run all of the tests and showed that, on average, SDC-Prioritizer needs less than 13 minutes to perform the test prioritization while running the SDC test suites takes between 16 to 106 hours. As explained in Section 3.2.3, SDC-Prioritizer uses the road features shared with TRAVEL to guide the search process towards generating test orders with the high diversity in the road shapes. The selection of best features for test prioritization was performed with the Principal component analysis (PCA) Birchler et al..

Notes

ACKNOWLEDGMENTS We gratefully acknowledge the Horizon 2020 (EU Commission) support for the projects COSMOS (DevOps for Complex Cyberphysical Systems), Project No. 957254-COSMOS, PRECRIME (Selfassessment Oracles for Anticipatory Testing), ERC Grant Agreement No. 787703, and the DFG project STUNT (DFG Grant Agreement n. FR 2955/4-1). Credit Author Statement. All authors of this paper contributed equally to all aspects of this paper: Data curation, Conceptualization, Software, Methodology, Investigation, Validation, Writing - original draft & editing.

Files

README.pdf

Files (1.8 GB)

Name	Size	Download all
competition.tar.gz md5:8ca73f4d2da10f15897572649bdd7536	1.6 GB	Download
README.pdf md5:06c7cdd0ab0771be80def633eea4030d	421.3 kB	Preview Download
sdc-prioritizer.zip md5:05a72c036c263c8a62a546ee6c94b704	1.6 MB	Preview Download
sdc-scissor.zip md5:ca086a2426d8c6b1dc634d8e58099a21	216.1 MB	Preview Download

	All versions	This version
Views	583	582
Downloads	371	371
Data volume	81.7 GB	81.7 GB

TRAVEL: A Dataset with Toolchains for Test Generation and Regression Testing of Self-driving Cars Software

Authors/Creators

Description

Notes

Files

README.pdf

Files (1.8 GB)