5957466
doi
10.5281/zenodo.5957466
oai:zenodo.org:5957466
Hynek, Karel
FIT CTU
Čejka, Tomáš
CESNET z.s.p.o
Ryšavý, Ondřej
FIT BUT
DoH-Gen-C-AABBCC
Jeřábek, Kamil
FIT BUT
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
DNS over HTTPS
DNS
HTTPS
Encrypted
Network traffic
PCAP
TLS
Flows
<p>Dataset of DNS over HTTPS traffic (AdGuard, AhaDNS, BlahDNS, BraveDNS, Comcast, CZNIC)</p>
<p>The dataset contains DoH and HTTPS traffic that was captured in controlled environment and generated automatically by Chrome browser with enabled DoH towards 6 different DoH servers (AdGuard, AhaDNS, BlahDNS, BraveDNS, Comcast, CZNIC) and a web page loads towards a sample of web pages taken from Majestic Million dataset. The data are provided in the form of PCAP files. However, we also provided TLS enriched flow data that are generated with opensource <a href="https://github.com/CESNET/ipfixprobe">ipfixprobe</a> flow exporter. Other than TLS related information is not relevant since the dataset comprises only encrypted TLS traffic. The TLS enriched flow data are provided in the form of CSV files with the following columns:</p>
<table>
<thead>
<tr>
<th>Column Name</th>
<th>Column Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>DST_IP</td>
<td>Destination IP address</td>
</tr>
<tr>
<td>SRC_IP</td>
<td>Source IP address</td>
</tr>
<tr>
<td>BYTES</td>
<td>The number of transmitted bytes from Source to Destination</td>
</tr>
<tr>
<td>BYTES_REV</td>
<td>The number of transmitted bytes from Destination to Source</td>
</tr>
<tr>
<td>TIME_FIRST</td>
<td>Timestamp of the first packet in the flow in format YYYY-MM-DDTHH-MM-SS</td>
</tr>
<tr>
<td>TIME_LAST</td>
<td>Timestamp of the last packet in the flow in format YYYY-MM-DDTHH-MM-SS</td>
</tr>
<tr>
<td>PACKETS</td>
<td>The number of packets transmitted from Source to Destination</td>
</tr>
<tr>
<td>PACKETS_REV</td>
<td>The number of packets transmitted from Destination to Source</td>
</tr>
<tr>
<td>DST_PORT</td>
<td>Destination port</td>
</tr>
<tr>
<td>SRC_PORT</td>
<td>Source port</td>
</tr>
<tr>
<td>PROTOCOL</td>
<td>The number of transport protocol</td>
</tr>
<tr>
<td>TCP_FLAGS</td>
<td>Logic OR across all TCP flags in the packets transmitted from Source to Destination</td>
</tr>
<tr>
<td>TCP_FLAGS_REV</td>
<td>Logic OR across all TCP flags in the packets transmitted from Destination to Source</td>
</tr>
<tr>
<td>TLS_ALPN</td>
<td>The Value of Application Protocol Negotiation Extension sent from Server</td>
</tr>
<tr>
<td>TLS_JA3</td>
<td>The JA3 fingerprint</td>
</tr>
<tr>
<td>TLS_SNI</td>
<td>The value of Server Name Indication Extension sent by Client</td>
</tr>
</tbody>
</table>
<p>The DoH resolvers in the dataset can be identified by IP addresses written in <strong><em>doh_resolver_ip.csv </em></strong>file.</p>
<p>The main part of the dataset is located in <strong><em>DoH-Gen-C-AABBCC.tar.gz </em></strong>and has the following structure:</p>
<pre><code>.
└─── data | - Main directory with data
└── generated | - Directory with generated captures
├── pcap | - Generated PCAPs
│ └── chrome
└── tls-flow-csv | - Generated CSV flow data
└── chrome</code></pre>
<p> </p>
<p><strong>Total stats of generated data:</strong></p>
<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Total Data Size</td>
<td>39.9 GB</td>
</tr>
<tr>
<td>Total files</td>
<td>12</td>
</tr>
<tr>
<td>DoH extracted tls flows</td>
<td>~35 K</td>
</tr>
<tr>
<td>Non-DoH extracted tls flows</td>
<td>~247 K</td>
</tr>
</tbody>
</table>
<p> </p>
<p><strong>DoH Server information</strong></p>
<p> </p>
<table>
<thead>
<tr>
<th>Name</th>
<th>Provider</th>
<th>DoH query url</th>
</tr>
</thead>
<tbody>
<tr>
<td>AdGuard</td>
<td><a href="https://adguard-dns.com">https://adguard-dns.com</a></td>
<td><a href="https://dns.adguard.com/dns-query">https://dns.adguard.com/dns-query</a></td>
</tr>
<tr>
<td>AhaDNS</td>
<td><a href="https://ahadns.com">https://ahadns.com</a></td>
<td><a href="https://doh.it.ahadns.net/dns-query">https://doh.it.ahadns.net/dns-query</a></td>
</tr>
<tr>
<td>BlahDNS</td>
<td><a href="https://blahdns.com">https://blahdns.com</a></td>
<td><a href="https://doh-de.blahdns.com/dns-query">https://doh-de.blahdns.com/dns-query</a></td>
</tr>
<tr>
<td>BraveDNS</td>
<td><a href="https://brave.com">https://brave.com</a></td>
<td><a href="https://basic.bravedns.com">https://basic.bravedns.com</a></td>
</tr>
<tr>
<td>Comcast</td>
<td><a href="https://corporate.comcast.com">https://corporate.comcast.com</a></td>
<td><a href="https://doh.xfinity.com/dns-query">https://doh.xfinity.com/dns-query</a></td>
</tr>
<tr>
<td>CZNIC</td>
<td><a href="https://www.nic.cz">https://www.nic.cz</a></td>
<td><a href="https://odvr.nic.cz/doh">https://odvr.nic.cz/doh</a></td>
</tr>
</tbody>
</table>
<p> </p>
This research was funded by the Ministry of Interior of the Czech Republic, grant No. VJ02010024: Flow-Based Encrypted Traffic Analysis and also by the Grant Agency of the CTU in Prague, grant No. SGS20/210/OHK3/3T/18 funded by the MEYS of the Czech Republic, and also by Brno University of Technology, Faculty of
Information Technology internal grant FIT-S-20-6293, and also by Technology Agency of the Czech Republic, grant No. FW03010099: Context-based Encrypted Traffic Analysis Using Flow Data.
Zenodo
2022-02-03
info:eu-repo/semantics/other
5957465
1647336803.008035
4173
md5:25c9ebc13fe579f4885590702d7ab545
https://zenodo.org/records/5957466/files/doh_resolver_ip.csv
39242654066
md5:f842ce8f613ec5518336a34222afabda
https://zenodo.org/records/5957466/files/DoH-Gen-C-AABBCC.tar.gz
public
10.5281/zenodo.5957465
isVersionOf
doi