Dataset used for fingerprinting of DNS over HTTPS responses.
Description
The dataset consists of multiple different data sources:
- DoH enabled Firefox on Linux OS
- DoH enabled Firefox on Windows 10 OS
- DoH enabled Chrome on Windows 10 OS
We captured the traffic from the DoH enabled web-browsers using tcpdump. To automate the process of traffic generation, we installed Google Chrome and Mozilla Firefox into separate virtual machines and controlled them with the Selenium framework shows detailed information about used browsers and environments). Selenium simulates a user's browsing according to the predefined script and a list of domain names (i.e., URLs from Alexa's top websites list in our case). The selenium was configured to visit pages in random order multiple times. For capturing the traffic, we used the default settings of each browser. We did not disable the DNS cache of the browser, and the random order of visiting webpages secures that the dataset contains traces influenced by DNS caching mechanisms. Each virtual machine was configured to export TLS cryptographic keys, that was used for decrypting the traffic using WireShark application.
The WireShark text output of the decrypted traffic is provided in the dataset files. The detailed information about each file is provided in dataset README.
Acknowledgment
This work was supported by the European Union’s Horizon 2020 research and innovation program under grant agreement No. 833418 and also by the Grant Agency of the CTU in Prague, grant No. SGS20/210/OHK3/3T/18 funded by the MEYS of the Czech Republic and the project Reg. No. CZ.02.1.01/0.0/0.0/16_013/0001797 co-funded by the MEYS and ERDF