CESNET Idle OS Traffic
Description
This dataset consists of captured network traffic generated by various operating systems in an idle state, i.e. without any user interaction. The systems were run as virtual machines (VMs) in VirtualBox. Every capture contains network traffic generated by the virtual machine in a span of over 24 hours, including system startup.
The data includes:
-
raw packets,
-
network flows (bidirectional, including application-layer fields),
- DNS requests,
-
HTTP requests,
-
and info about TLS sessions.
Versions:
- Ver. 1 was captured in February 2025, VMs run for 1 hour only.
- Ver. 2 was captured in October 2025, VMs run for over 24 hours. Extraction of DNS queries was added.
Data Structure
The data is stored in the following folder hierarchy:
-
cesnet-idle-os-traffic.zip/
- <os_family>__<os_type>__<os_version>/
-
<timestamp>__<source>__<identifier>/
-
flow.csv
-
info.json
-
traffic.pcap
-
- dns.csv
-
http.csv
-
tls.csv
-
- merged_dns.csv
-
merged_http.csv
-
merged_tls.csv
- <os_family>__<os_type>__<os_version>/
Where:
-
os_family: The family of the operating system (e.g., linux, windows, android).
-
os_type: The specific operating system (e.g., ubuntu, debian, windows_10).
-
os_version: The version of the operating system (e.g., 9, 22.04).
-
timestamp: The date and time when the traffic capture was started.
-
source: The origin of the VM image (e.g., vagrant, osboxes.org).
-
identifier: A unique identifier based on the source (e.g., vagrant_box_name or hash).
Captured Data
The dataset was captured with a special toolset (see section ‘Capture toolset’) and contains several types of captured traffic data.
PCAP data (traffic.pcap)
The dataset includes PCAP files, which contain raw network traffic collected from virtual machines. These files store detailed packet-level data and serve as the primary source for extracting structured flow data and HTTP data.
Flow Data (flow.csv)
These files contain bidirectional flow records computed from corresponding PCAP files, and include various application-layer data. They were created using the ipfixprobe exporter with several plugins enabled (see ‘Flow extraction using ipfixprobe’ section for details). The flows are stored in CSV format (directly readable by the logreplay module of the NEMEA framework). Some of the fields included in the flow data are as follows:
Traffic & Packet Counts
-
BYTES, BYTES_REV (uint64) – Total bytes sent/received.
-
PACKETS, PACKETS_REV (uint32) – Total packets sent/received.
Time Stamps
-
TIME_FIRST, TIME_LAST (time) – Timestamp of the first and the last packet.
DNS Data
-
DNS_NAME (string) – Queried domain name.
-
DNS_RCODE (uint8) – DNS response code.
-
DNS_QTYPE, DNS_ID, DNS_PSIZE (uint16) – Query type, ID, and packet size.
TCP Data
-
TCP_MSS, TCP_MSS_REV (uint32) – Max segment size in each direction.
-
TCP_WIN, TCP_WIN_REV (uint16) – Initial Window size in each direction.
-
TCP_FLAGS, TCP_FLAGS_REV (uint8) – TCP flags used in each direction.
TLS
-
TLS_VERSION (uint16) – TLS protocol version.
-
TLS_SNI, TLS_ALPN (string) – TLS server name & ALPN protocol.
HTTP Data
-
HTTP_REQUEST_URL, HTTP_REQUEST_METHOD, HTTP_USERAGENT (string) – HTTP request details.
-
HTTP_RESPONSE_STATUS_CODE (uint16) – HTTP response code.
Miscellaneous
-
PROTOCOL (uint8) – Protocol type (TCP, UDP, etc.).
-
DST_PORT, SRC_PORT (uint16) – Destination and source ports.
-
DIR_BIT_FIELD (uint8) – Direction of the flow through the observation point (0 or 1).
-
for all the fields see section ‘Flow extraction using ipfixprobe’
HTTP Traffic Data (http.csv)
This file includes extracted DNS query data (queried domain names). The following fields are included:
-
Operating System Information: os_family, os_type, os_version.
-
DST Query Details: DNS_NAME
Each line corresponds to one DNS query made by the OS.
HTTP Traffic Data (http.csv)
This file includes extracted HTTP request data. The following fields are included:
-
Operating System Information: os_family, os_type, os_version.
-
HTTP Request Details: user-agent, host,uri
Each line corresponds to one HTTP connection made by the OS.
TLS Traffic Data (tls.csv)
This file contains details about encrypted TLS sessions. The following fields are included:
-
Operating System Information: os_family, os_type, os_version.
-
TLS Details: TLS_VERSION,TLS_ALPN,TLS_JA3,TLS_SNI
Each line corresponds to one TLS connection made by the OS.
Merged DNS Traffic (merged_dns.csv)
This file was created by merging individual dns.csv files from different virtual machines. It contains all extracted DNS requests from all virtual machines in the dataset. The fields remain the same as in the individual files.
Merged HTTP traffic (merged_http.csv)
This file was created by merging individual http.csv files from different virtual machines. It contains all extracted HTTP request data from all virtual machines in the dataset. The fields remain the same as in the individual files.
Merged TLS Traffic (merged_tls.csv)
This file was created by merging individual tls.csv files from different virtual machines. It contains all extracted TLS session data from all virtual machines in the dataset. The fields remain the same as in the individual files.
Metadata about the VM image and the capture (info.json)
Each traffic capture folder contains an info.json file that stores information about the VM and its OS and the capture session. Key information includes:
-
VM Name: The name of the VM.
-
Source: The origin of the VM image.
-
Operating System Details: The OS family, type, and version.
-
Network Configuration: The VM's IPv4 address and MAC address.
-
Time of capture: Date and time of start and end of the capture.
Capture toolset
A new toolset was developed to capture and process all data in this dataset. It consists of multiple scripts for automating key tasks, including starting a virtual machine and capturing its idle network traffic. The toolset also allows scheduling capture intervals, ensuring proper VM shutdown, and processing the captured raw data. One script extracts HTTP requests from the raw network data into a CSV file, while other uses ipfixprobe to convert PCAP files into flow data, from which DNS and TLS data are extracted later.
The virtual machines are run using VirtualBox with NAT networking, meaning all VMs share the same IP address, while MAC addresses may differ for each instance. These settings are configured by default in VirtualBox and Vagrant.
Additionally, the toolset includes scripts for updating information (IP, MAC address, OS family, OS type, etc.) about an existing virtual machine or creating new virtual machines from Vagrant boxes. This toolset and more detailed documentation are available on GitHub (https://github.com/CESNET/idle-os-toolset).
Flow extraction using ipfixprobe
The flow data in this dataset are generated from raw packets using the ipfixprobe exporter, with the following plugins enabled:
-
Basic
-
Basic plus
-
HTTP
-
TLS
-
DNS
-
PSTATS
Each plugin adds fields to provide more detailed description of the network traffic. For more information about the plugins and their fields, see ipfixprobe documentation (https://ipfixprobe.cesnet.cz/en/plugins)
Files
cesnet-idle-os-traffic-v2.zip
Files
(6.6 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:751946755bedfa6a74ae625725704956
|
4.2 GB | Preview Download |
|
md5:4c0bbc96e9a7ebbc25bc309bb0206e3e
|
2.4 GB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/CESNET/idle-os-toolset