Real ORNL Automotive Dynamometer (ROAD) CAN Intrusion Dataset
Creators
- 1. Oak Ridge National Laboratory
Description
The Real ORNL Automotive Dynamometer (ROAD) CAN IDS dataset consistis of over 3.5 hours of one vehicle's CAN data. ROAD contains ambient data recorded during a diverse set of activities, and attacks of increasing stealth with multiple variants and instances of real (i.e. non-simulated) fuzzing, fabrication, unique advanced attacks, and simulated masquerade attacks. In addition to the "raw" CAN format, the data is also provided in a the signal time series format for many of the CAN captures.
Authors: Miki E. Verma, Robert A. Bridges, Michael D. Iannacone, Samuel C. Hollifield, Pablo Moriano, Bill Kay, Steven Hespeler and Frank L. Combs
Citation: Please cite the paper with full description (preprint https://arxiv.org/abs/2012.14600, PLoS ONE publication to appear in 2024)
Technical info (English)
The ROAD dataset consists of 33 attack captures totalling about 30m, and 12 ambient captures totalling about 3h. Details on attacks, syntactic metadata, and further descriptions are provided in the paper https://arxiv.org/abs/2012.14600 (to appear in PLoS ONE in 2024). Please cite this paper for use of the dataset.
Using the CAN-D algorithm (https://ieeexplore.ieee.org/abstract/document/9466242) to translate the raw CAN data into signals, we also provide a signal-translated version for 17 of the 33 attack captures and the all ambient captures.
We collected CAN data using SocketCAN software on a Linux computer with a Kvaser Leaf Light V2 connecting to the OBD-II port. All data is from a single vehicle, the make/model of which our organization will not allow us to disclose, and with year of manufacture in the mid 2010s. The published data has been obfuscated in a way that maintains the anonymity of the vehicle, while preserving nearly all important aspects of the data for IDS research.
For all attacks, the vehicle was actively being driven on a dynamometer. Notably, each attack's intended alteration of vehicle functionality was physically verified and documented--an advantage of using real CAN data and a dynamometer. Ambient data was collected both on a dynamometer and on roads, while performing a variety of normal and sometimes unusual driving activities (e.g., opened door while driving). This allows training/testing with anomalies to both increase realism and allow investigation of false positives. Metadata with details on the activities in each capture and on the physical effect of each attack are provided.
The breadth of ambient and attack data in the ROAD dataset is designed for testing a variety of detectors--using different features (e.g., timing, payload bits, or signals), and modeling different characteristics (e.g., frequency, entropy, continuity, or correlation). While ROAD's fuzzing, fabrication, and suspension attacks provide T.T. attacks of increasing stealth, the Accelerator Attack--a result of a newly discovered and disclosed vulnerability--and many simulated masquerade attacks are T.O. For developing detector using payloads, the Correlated Signal, Max Speedometer, and Reverse Light attacks all entail discontinuities or break correlation in CAN signals.
Attack captures detailed in order of expected detection difficulty.
Fuzzing Attack
We mounted the less stealthy version of the fuzzing attack, injecting frames with random IDs (cycling IDs in order from 0x000 to 0xFF) with the maximum payload 0xFFFFFFFFFFFFFFFF) every .005s, as opposed to the more stealthy version which only injects IDs seen in ambient data. This attack is designed to be easy to detect. There were many physical effects of this attack, for example: the accelerator pedal became ineffective; the dash lights and headlights were illuminated; and the seat positions moved.
Targetted ID Fabrication & Masquerade Attacks
We performed targeted ID fabrication attacks using the flam delivery, meaning a message is injected immediately after a legitimate message with the target ID is seen. The flam technique allows for dynamic injection; that is, the legitimate ID message is read, only the bits corresponding to the target signal are modified with malicious values, and then this spoofed message is injected. When only part of the message is modified, we refer to this as targeting a signal, rather than an ID. Designing these attacks required reverse engineering of signals for this vehicle, which we completed using CAN-D signal reverse engineering algorithm and manually verifying the results. The targeted ID fabrication attacks and masquerade attacks are as follows:
- Correlated Signal - The single ID communicating the four wheels' speeds (each is a two-byte signal) is injected with four false wheel speed values that are all very different. The effect, supposing the injected frames are at least as fast as the ambient frames with that ID (the case in this dataset), is that the accelerator has no effect on the vehicle. This loss of control begins immediately and throughout the injection period. In some instances the car required restart to return to normal functionality.
- Max Speedometer - The one-byte speedometer signal is targeted by sending (0xFF), causing the speedometer to falsely display a maximum value.
- Max Engine Coolant Temperature - We target the engine coolant signal (one byte), modifying the signal value to be the maximum (0xFF). The physical effect is an ``engine coolant too high'' warning light on the dash illuminates.
- Reverse Light - A one-bit signal communicating the state of the reverse lights (on/off) is targeted. We perform two slight variations of the attack, manipulating the value to off (on), while the car is in Reverse (Drive), respectively. The effect is that the reverse lights do not reflect the gear (Drive/Reverse).
For all of these targeted ID attacks, we provide two versions of the same CAN data captures: the original fabrication attack, and a version slightly modified in post-processing to make it appear to be
simulate a masquerade attack.
The fabrication attack versions are the original altered capture, including both the legitimate target ID frames and the injected frames. Because these are real, physically verified attacks with the minimally occurring injected frames (due to the flam delivery), they provide perhaps the best (i.e., most stealthy/most difficult to detect), current, public data for testing frequency-based IDSs. That is, most fabrication attacks in public datasets involve many injected frames between the ambient vehicle frames with the same ID, while the flam delivery used for ROAD's targeted ID attacks have a single injected frame between ambient frames of the same ID. As at least one injected message occurring after each legitimate message is needed to manifest the desired physical effect; hence, the flam delivery, with only one such message, is the most stealthy possible.
Using the fabrication captures we produce simulated masquerade attacks by removing the legitimate target ID frames preceding each injected frame to provide more advanced versions. In effect, this removes message confliction in the data, making it appear as though only the spoofed messages are present during the injection interval. With this masquerade dataset, frequency-based approaches will almost certainly fail to provide accurate detection. It is important to note that while the masquerade aspect is simulated through post-processing, this means of alteration avoids problematic issues with synthetic data. Namely, the effect of the attack on the vehicle was physically verified; every message appearing in the data was actually seen by the car in the order it appears in the data; and no aspect of CAN protocol was violated. As discussed in the introduction, there are no publicly available, real CAN data captures with real masquerade attacks, and the hacking skill required to implement such an attack on a real vehicle seems to be preventing CAN IDS researchers from implementing such an attack.
This provides the highest fidelity alternative possible.
Accelerator Attacks
We have responsibly disclosed this vulnerability to the OEM, and will not disclose details of how to implement this attack. We do not include the CAN data during the exploit. After the exploit, the effect is that the vehicle is in a state that has less control by the driver as follows: when put into Drive gear, the vehicle accelerates to a fixed speed and then holds this speed (regardless of accelerator pedal position or cruise control setting); when in reverse, the vehicle accelerates to a (different) fixed speed and holds this speed (regardless of accelerator pedal position or cruise control setting); touching the brake pedal results in the acceleration ceasing and the brakes engaging normally; when the brake is released, the vehicle commences accelerating as described above.
The Accelerator Attack captures have no injected messages, but simply record the CAN data when the vehicle is in this state. Discrepancies exist between the vehicle's actions and the driver's inputs, e.g., acceleration occurs regardless of the accelerator pedal position.
Obfuscation
While other public CAN datasets provide information on the make, model, and year of the vehicles attacked, it would be irresponsible, given our previous disclosure, to release such information. Furthermore, we have taken steps to obfuscate the CAN data in such a way as to preserve the characteristics necessary for CAN IDS development, while ideally preventing users from knowing the make, model, and year of the vehicle. Below we itemize the augmentations performed on the data to preserve anonymity:
- Absolute timestamps are shifted uniformly by a scalar.
- Arbitration IDs that were constant, aperiodic, or periodic with frequency under 0.1 Hz (less than one frame per ten seconds) were replaced with the ``filler message'' FFF#0000000000000000 (ID#Data in hex) and same relative timestamp.
- Messages on reserved IDs (greater than 0x700: e.g., diagnostic messages) have been removed.
- IDs have been anonymized in such a way that arbitration order/priority is not preserved. There is a one-to-one mapping between the original and the anonymized IDs for a given vehicle (not including the ``filler messages'' under ID 0xFFF). For example, if ID 0x10 is converted to ID 0x821 in an anonymized log, the same is true for all logs.
- Data fields have been scrambled in such a way that signals have been preserved, and fields are scrambled in a consistent way for each ID; e.g., if the first byte is moved to the end of the field for ID 0x10, it will be shifted this way in all messages from ID 0x10.
Syntactic Information
All of the CAN data files are logged using the standard can-utils (https://github.com/linux-can/can-utils ) candump format.
While timestamps are reported with a precision of 1 μ s, the hardware used to collect this data (a Kvaser Leaf Light V2) only guarantees an accuracy of 100μ s. Note that all data fields in these logs contain the full 8 bytes, which we padded with zeros if necessary. The channel is always "can0", so this column can be dropped. We provide metadata (in JSON format) for each capture, including a general description of driving activities, the length of the capture in seconds, and whether or not the car was on the dynamometer. For attack captures, we also include whether the capture was modified (i.e., masquerade attacks), the injection ID and data field, and the interval of injection (start, end) corresponding to the time of the first/last injected message in elapsed seconds. Importantly, we do not label individual messages as attack/normal, because the software we used to collect did not have that capability. However, with injection ID, data, and intervals, these can be labeled in post-processing fairly easily.
We use a wildcard character “X” in the injection data str field to indicate that the byte in the given position was not modified in the injection when only one signal in the data field is targeted. Similarly, “X” in the injection id field indicates that no particular ID was targeted, which is only the case in the fuzzing attack. For the accelerator attack, the injection id and injection data str are null, and the injection interval is just the start and end time of the capture. (Note that all of these details are included in the full documentation.)
The translated time series are represented in CSV format, the other signal translated dataset. Specifically, the CSV files have the following columns: Label, ID, Time, and Signal-<i>-of-ID. Labels are either 0 (benign) and 1 (attack), and all the entries in the ambient captures are labeled 0. Each of the signals within an ID is named based on the index they have when translated, i.e., i = 0,1,...,NID −1, where NID is the maximum number of signals in a particular ID. We added a metadata file for each of the logs describing the details of the CSV files.
Files
road.zip
Files
(556.7 MB)
Name | Size | Download all |
---|---|---|
md5:cab184cfc2fe12c0834bc46188c0f330
|
556.7 MB | Preview Download |
Additional details
Identifiers
Related works
- Is published in
- Journal article: arXiv:2012.14600 (arXiv)