CESNET Idle OS Traffic
Description
This dataset consists of captured network traffic from various idle virtual machines (VMs) running in VirtualBox. It provides detailed insights into network communication behavior between VMs and external networks, collected while the VMs are in an idle state. Every capture contains network traffic generated by the virtual machine in a span of one hour, including startup.
The data includes:
-
raw packets,
-
information about network flows,
-
HTTP requests,
-
and TLS sessions.
Data Structure
The data is stored in the following folder hierarchy:
-
cesnet-idle-os-traffic.zip/
- <os_family>__<os_type>__<os_version>/
-
<timestamp>__<source>__<identifier>/
-
flow.csv
-
info.json
-
traffic.pcap
-
-
http.csv
-
tls.csv
-
-
merged_http.csv
-
merged_tls.csv
- <os_family>__<os_type>__<os_version>/
Where:
-
os_family: The family of the operating system (e.g., linux, windows, macos).
-
os_type: The specific operating system (e.g., ubuntu, debian, windows_10).
-
os_version: The version of the operating system (e.g., 20.04, 23.10).
-
timestamp: The date and time when the traffic capture was started.
-
source: The origin of the VM image (e.g., vagrant, osboxes.org).
-
identifier: A unique identifier based on the source (e.g., vagrant_box_name or hash).
Captured Data
The dataset was captured with a special toolset (see section ‘Capture toolset’) and contains several types of captured traffic data.
PCAP data (traffic.pcap)
The dataset includes PCAP files, which contain raw network traffic collected from virtual machines. These files store detailed packet-level data and serve as the primary source for extracting structured flow data, HTTP data, and TLS session details.
Flow Data (flow.csv)
These files contain bidirectional flow records computed from corresponding PCAP files, and include various application-layer data. They were created using the ipfixprobe exporter with several plugins enabled (see ‘Flow extraction using ipfixprobe’ section for details). The flows are stored in CSV format (directly readable by the logreplay module of the NEMEA framework). Some of the fields included in the flow data are as follows:
Traffic & Packet Counts
-
BYTES, BYTES_REV (uint64) – Total bytes sent/received.
-
PACKETS, PACKETS_REV (uint32) – Total packets sent/received.
Time Stamps
-
TIME_FIRST, TIME_LAST (time) – Timestamp of the first and the last packet.
DNS Data
-
DNS_NAME (string) – Queried domain name.
-
DNS_RCODE (uint8) – DNS response code.
-
DNS_QTYPE, DNS_ID, DNS_PSIZE (uint16) – Query type, ID, and packet size.
QUIC Data
-
QUIC_VERSION, QUIC_CLIENT_VERSION (uint32) – QUIC versions.
-
QUIC_SNI (string) – Server Name Indication.
-
QUIC_MULTIPLEXED, QUIC_ZERO_RTT (uint8) – Multiplexing and 0-RTT info.
TCP Data
-
TCP_MSS, TCP_MSS_REV (uint32) – Max segment size in each direction.
-
TCP_WIN, TCP_WIN_REV (uint16) – Initial Window size in each direction.
-
TCP_FLAGS, TCP_FLAGS_REV (uint8) – TCP flags used in each direction.
TLS
-
TLS_VERSION (uint16) – TLS protocol version.
-
TLS_SNI, TLS_ALPN (string) – TLS server name & ALPN protocol.
HTTP Data
-
HTTP_REQUEST_URL, HTTP_REQUEST_METHOD, HTTP_USERAGENT (string) – HTTP request details.
-
HTTP_RESPONSE_STATUS_CODE (uint16) – HTTP response code.
Miscellaneous
-
PROTOCOL (uint8) – Protocol type (TCP, UDP, etc.).
-
DST_PORT, SRC_PORT (uint16) – Destination and source ports.
-
DIR_BIT_FIELD (uint8) – Direction of the flow through the observation point (0 or 1).
-
for all the fields see section ‘Flow extraction using ipfixprobe’
HTTP Traffic Data (http.csv)
This file includes extracted HTTP request data, such as:
-
Operating System Information: os_family, os_type, os_version.
-
HTTP Request Details: user-agent, host,uri
Each line corresponds to one HTTP connection made by the OS.
TLS Traffic Data (tls.csv)
This file contains details about encrypted TLS sessions, such as:
-
Operating System Information: os_family, os_type, os_version.
-
TLS Details: TLS_VERSION,TLS_ALPN,TLS_JA3,TLS_SNI
Each line corresponds to one TLS connection made by the OS.
Merged HTTP traffic (merged_http.csv)
This file was created by merging individual http.csv files from different virtual machines. It contains all extracted HTTP request data from all virtual machines in the dataset. The fields remain the same as in the individual files.
Merged TLS Traffic (merged_tls.csv)
This file was created by merging individual tls.csv files from different virtual machines. It contains all extracted TLS session data from all virtual machines in the dataset. The fields remain the same as in the individual files.
Metadata about the VM image and the capture (info.json)
Each traffic capture folder contains an info.json file that stores information about the VM and its OS and the capture session. Key information includes:
-
VM Name: The name of the VM.
-
Source: The origin of the VM image.
-
Operating System Details: The OS family, type, and version.
-
Network Configuration: The VM's IPv4 address and MAC address.
-
Time of capture: Date and time of start and end of the capture.
Capture toolset
A new toolset was developed to capture and process all data in this dataset. It consists of multiple scripts for automating key tasks, including starting a virtual machine and capturing its idle network traffic. The toolset also allows scheduling capture intervals, ensuring proper VM shutdown, and processing the captured raw data. One script extracts HTTP requests from the raw network data into a CSV file, while other uses ipfixprobe to convert PCAP files into flow data, from which TLS data are extracted later.
The virtual machines are run using VirtualBox with NAT networking, meaning all VMs share the same IP address, while MAC addresses may differ for each instance. These settings are configured by default in VirtualBox and Vagrant.
Additionally, the toolset includes scripts for updating information (IP, MAC address, OS family, OS type, etc.) about an existing virtual machine or creating new virtual machines from Vagrant boxes. This toolset and more detailed documentation are available on GitHub (https://github.com/CESNET/idle-os-toolset).
Flow extraction using ipfixprobe
The flow data in this dataset are generated from raw packets using the ipfixprobe exporter, with the following plugins enabled:
-
Basic
-
Basic plus
-
HTTP
-
TLS
-
DNS
-
QUIC
-
PSTATS
Each plugin adds fields to provide more detailed description of the network traffic. For more information about the plugins and their fields, see ipfixprobe documentation (https://cesnet.github.io/ipfixprobe/export/)
Files
cesnet-idle-os-traffic.zip
Files
(2.4 GB)
Name | Size | Download all |
---|---|---|
md5:4c0bbc96e9a7ebbc25bc309bb0206e3e
|
2.4 GB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/CESNET/idle-os-toolset