DoH-Gen-F-AABBC
- 1. FIT BUT
- 2. FIT CTU
- 3. CESNET z.s.p.o
Description
Please refer to the original data article for further data description: Jeřábek & Hynek et al., Collection of datasets with DNS over HTTPS traffic In: Data in Brief Journal ,DOI:10.1016/j.dib.2022.108310
Dataset of DNS over HTTPS traffic from Firefox (AdGuard, AhaDNS, BlahDNS, BraveDNS, CloudFlare)
The dataset contains DoH and HTTPS traffic that was captured in a virtualized environment (Docker) and generated automatically by Firefox browser with enabled DoH towards 5 different DoH servers (AdGuard, AhaDNS, BlahDNS, BraveDNS, CloudFlare) and a web page loads towards a sample of web pages taken from Majestic Million dataset. The data are provided in the form of PCAP files. However, we also provided TLS enriched flow data that are generated with opensource [ipfixprobe](https://github.com/CESNET/ipfixprobe) flow exporter. Other than TLS related information is not relevant since the dataset comprises only encrypted TLS traffic. The TLS enriched flow data are provided in the form of CSV files with the following columns:
Column Name | Column Description |
---|---|
DST_IP | Destination IP address |
SRC_IP | Source IP address |
BYTES | The number of transmitted bytes from Source to Destination |
BYTES_REV | The number of transmitted bytes from Destination to Source |
TIME_FIRST | Timestamp of the first packet in the flow in format YYYY-MM-DDTHH-MM-SS |
TIME_LAST | Timestamp of the last packet in the flow in format YYYY-MM-DDTHH-MM-SS |
PACKETS | The number of packets transmitted from Source to Destination |
PACKETS_REV | The number of packets transmitted from Destination to Source |
DST_PORT | Destination port |
SRC_PORT | Source port |
PROTOCOL | The number of transport protocol |
TCP_FLAGS | Logic OR across all TCP flags in the packets transmitted from Source to Destination |
TCP_FLAGS_REV | Logic OR across all TCP flags in the packets transmitted from Destination to Source |
TLS_ALPN | The Value of Application Protocol Negotiation Extension sent from Server |
TLS_JA3 | The JA3 fingerprint |
TLS_SNI | The value of Server Name Indication Extension sent by Client |
The DoH resolvers in the dataset can be identified by IP addresses written in doh_resolver_ip.csv file.
The main part of the dataset is located in DoH-Gen-F-AABBC.tar.gz and has the following structure:
.
└─── data | - Main directory with data
└── generated | - Directory with generated captures
├── pcap | - Generated PCAPs
│ └── firefox
└── tls-flow-csv | - Generated CSV flow data
└── firefox
Total stats of generated data:
Name | Value |
---|---|
Total Data Size | 40.2 GB |
Total files | 10 |
DoH extracted tls flows | ~57 K |
Non-DoH extracted tls flows | ~327 K |
DoH Server information
Name | Provider | DoH query url |
---|---|---|
AdGuard | https://adguard-dns.com | https://dns.adguard.com/dns-query |
AhaDNS | https://ahadns.com | https://doh.it.ahadns.net/dns-query |
BlahDNS | https://blahdns.com | https://doh-de.blahdns.com/dns-query |
BraveDNS | https://brave.com | https://basic.bravedns.com |
CloudFlare | https://www.cloudflare.com | https://cloudflare-dns.com/dns-query |
Please cite the original article:
@article{Jerabek2022,
title = {Collection of datasets with DNS over HTTPS traffic},
journal = {Data in Brief},
volume = {42},
pages = {108310},
year = {2022},
issn = {2352-3409},
doi = {https://doi.org/10.1016/j.dib.2022.108310},
url = {https://www.sciencedirect.com/science/article/pii/S2352340922005121},
author = {Kamil Jeřábek and Karel Hynek and Tomáš Čejka and Ondřej Ryšavý}
}
Notes
Files
doh_resolver_ip.csv
Files
(40.2 GB)
Name | Size | Download all |
---|---|---|
md5:336922cb7c05a351e2d6c436fab4ce9b
|
40.2 GB | Download |
md5:25c9ebc13fe579f4885590702d7ab545
|
4.2 kB | Preview Download |