An Encrypted Network Video Stream Dataset
- 1. Czech Technical University in Prague, Faculty of Information Technology, Deparment of Computer Systems
- 2. University of South Bohemia, Faculty of Science, Department of Informatics
Description
Much of the video content on the Internet today is distributed through online streaming platforms. To ensure user privacy, data transmissions are often encrypted using cryptographic protocols. In previous research, we first experimentally validated the idea that the amount of transmitted data belonging to a particular video stream is not constant over time or that it changes periodically and forms a specific fingerprint. Based on the knowledge of the fingerprint of a specific video stream, this video stream can be subsequently identified. Over several months of intensive work, our team has created a large dataset containing a large number of video streams that were captured by network traffic probes during their playback by end users. The video streams were deliberately chosen to fall thematically into pre-selected categories. We selected two primary platforms for streaming - PeerTube and YouTube. The first platform was chosen because of the possibility of modifying any streaming parameters, while the second one was chosen because it is used by a huge number of people worldwide. Our dataset can be used to create and train machine learning models or heuristic algorithms, allowing to identify encrypted video streams according to their content category or specifically.