Planned intervention: On Wednesday April 3rd 05:30 UTC Zenodo will be unavailable for up to 2-10 minutes to perform a storage cluster upgrade.
Published February 3, 2022 | Version v1
Dataset Open

DoH-Gen-F-AABBC

  • 1. FIT BUT
  • 2. FIT CTU
  • 3. CESNET z.s.p.o

Description

Please refer to the original data article for further data description: Jeřábek & Hynek et al., Collection of datasets with DNS over HTTPS traffic In: Data in Brief Journal ,DOI:10.1016/j.dib.2022.108310

Dataset of DNS over HTTPS traffic  from Firefox (AdGuard, AhaDNS, BlahDNS, BraveDNS, CloudFlare)
The dataset contains DoH and HTTPS traffic that was captured in a virtualized environment (Docker) and generated automatically by Firefox browser with enabled DoH towards 5 different DoH servers (AdGuard, AhaDNS, BlahDNS, BraveDNS, CloudFlare) and a web page loads towards a sample of web pages taken from Majestic Million dataset. The data are provided in the form of PCAP files. However, we also provided TLS enriched flow data that are generated with opensource [ipfixprobe](https://github.com/CESNET/ipfixprobe) flow exporter. Other than TLS related information is not relevant since the dataset comprises only encrypted TLS traffic. The TLS enriched flow data are provided in the form of CSV files with the following columns:

Column Name Column Description
DST_IP Destination IP address
SRC_IP Source IP address
BYTES The number of transmitted bytes from Source to Destination
BYTES_REV The number of transmitted bytes from Destination to Source
TIME_FIRST Timestamp of the first packet in the flow in format YYYY-MM-DDTHH-MM-SS
TIME_LAST Timestamp of the last packet in the flow in format YYYY-MM-DDTHH-MM-SS
PACKETS The number of packets transmitted from Source to Destination
PACKETS_REV The number of packets transmitted from Destination to Source
DST_PORT Destination port
SRC_PORT Source port
PROTOCOL The number of transport protocol
TCP_FLAGS Logic OR across all TCP flags in the packets transmitted from Source to Destination
TCP_FLAGS_REV Logic OR across all TCP flags in the packets transmitted from Destination to Source
TLS_ALPN The Value of Application Protocol Negotiation Extension sent from Server
TLS_JA3 The JA3 fingerprint
TLS_SNI The value of Server Name Indication Extension sent by Client

The DoH resolvers in the dataset can be identified by IP addresses written in doh_resolver_ip.csv file.

The main part of the dataset is located in DoH-Gen-F-AABBC.tar.gz and has the following structure:

 

.
└─── data                   | - Main directory with data
     └── generated          | - Directory with generated captures
         ├── pcap           | - Generated PCAPs
         │   └── firefox
         └── tls-flow-csv   | - Generated CSV flow data
             └── firefox

 

Total stats of generated data:

Name Value
Total Data Size 40.2 GB
Total files 10
DoH extracted tls flows ~57 K
Non-DoH extracted tls flows ~327 K

DoH Server information

Name Provider DoH query url
AdGuard https://adguard-dns.com https://dns.adguard.com/dns-query
AhaDNS https://ahadns.com https://doh.it.ahadns.net/dns-query
BlahDNS https://blahdns.com https://doh-de.blahdns.com/dns-query
BraveDNS https://brave.com https://basic.bravedns.com
CloudFlare https://www.cloudflare.com https://cloudflare-dns.com/dns-query

Please cite the original article:

@article{Jerabek2022,
title = {Collection of datasets with DNS over HTTPS traffic},
journal = {Data in Brief},
volume = {42},
pages = {108310},
year = {2022},
issn = {2352-3409},
doi = {https://doi.org/10.1016/j.dib.2022.108310},
url = {https://www.sciencedirect.com/science/article/pii/S2352340922005121},
author = {Kamil Jeřábek and Karel Hynek and Tomáš Čejka and Ondřej Ryšavý}
}

Notes

This research was funded by the Ministry of Interior of the Czech Republic, grant No. VJ02010024: Flow-Based Encrypted Traffic Analysis and also by the Grant Agency of the CTU in Prague, grant No. SGS20/210/OHK3/3T/18 funded by the MEYS of the Czech Republic, and also by Brno University of Technology, Faculty of Information Technology internal grant FIT-S-20-6293, and also by Technology Agency of the Czech Republic, grant No. FW03010099: Context-based Encrypted Traffic Analysis Using Flow Data.

Files

doh_resolver_ip.csv

Files (40.2 GB)

Name Size Download all
md5:336922cb7c05a351e2d6c436fab4ce9b
40.2 GB Download
md5:25c9ebc13fe579f4885590702d7ab545
4.2 kB Preview Download