Published September 26, 2019 | Version v1

Dataset Using TLS Fingerprints for OS Identification in Encrypted Traffic

  • 1. Masaryk University

Description

The dataset consists of data from three different sources; flow records collected from the university backbone network, log entries from the two university DHCP (Dynamic Host Configuration Protocol) servers and a single RADIUS (Remote Authentication Dial In User Service) accounting server. The data was collected from 2019-07-12 00:00 to 2019-07-16 23:59 with a few hours overhead on both sides of the interval for the log entries to cover long connection sessions overlapping to and from the time frame.

We measured the flow data from the university uplink to the Internet. In the dataset, we kept only flows with source IP addresses from university wireless networks (Eduroam). The flow data was then enriched with information from DHCP and RADIUS servers to contain ID of the RADIUS session and operating system od the transmitting device as derived from DHCP logs.

The dataset is in the form of CSV file with the following information fields important for OS identification:

  • Basic flow features
    • Date flow start - timestamp of flow start
    • Date flow end - timestamp of flow end
    • Src IPv4 - source IPv4 address
    • sPort - source L4 port
    • Dst IPv4 - destination IPv4 address
    • dPort - destination L4 port
  • Extended TCP/IP parameters
    • SYN size - the size of the initial SYN packet of a TCP connection (in bytes)
    • TCP win - value of TCP Window size parameter
    • TCP SYN TTL - observed TTL value
  • HTTP parameters
    • HTTP Host - hostname from the HTTP request
    • HTTP UA OS - OS identification based on user-agent
    • HTTP UA OS MAJ - OS identification based on user-agent
    • HTTP UA OS MIN - OS identification based on user-agent
    • HTTP UA OS BLD - OS identification based on user-agent
  • TLS parameters
    • TLS SNI - Server Name Indication field
    • TLS SNI length - length of SNI in bytes
    • TLS Client Version - TLS client hello Version field
    • Client Cipher Suites - list of supported cipher suites
    • TLS Extension Types - list of extension IDs
    • TLS Extension Lengths - list of extension lengths
    • TLS Elliptic Curves - list of supported curves (or supported groups in TLS1.3)
    • TLS EC Point Formats - list of EC formats
  • Log based extensions
    • Session ID - ID of the session to match flows from one device
    • Ground Truth OS - OS name derived from log data

The observed network traffic contains privacy-sensitive information. Hereby, we declare that the monitored data used for our research were processed in accordance with the EU General Data Protection Regulation 2016/679. The published dataset was anonymized with cryptographic means using Crypto-PAn algorithm to preserve both the scientific value and user privacy.

When using this dataset, please cite the original work as follows:

@inproceedings{lastovicka2020using,
  title={Using TLS Fingerprints for OS Identification in Encrypted Traffic},
  author={La{\v{s}}tovi{\v{c}}ka, Martin and {\v{S}}pa{\v{c}}ek, Stanislav and Velan, Petr and {\v{C}}eleda, Pavel},
   booktitle = {2020 IEEE/IFIP Network Operations and Management Symposium (NOMS 2020)},
   doi = {http://dx.doi.org/10.1109/NOMS47738.2020.9110319},
   keywords = {OS fingerprinting;passive monitoring;IPFIX;TLS},
   isbn = {978-1-7281-4973-8},
   pages = {1-6},
   publisher = {IEEE Xplore Digital Library},
   year = {2020}
}

 

Files

flows_dataset_anonymized.zip

Files (700.8 MB)

Name Size
md5:ebb0f2c0117193e43d9089d613748c7f
700.8 MB Preview Download

Additional details

Funding

European Commission
CONCORDIA - Cyber security cOmpeteNCe fOr Research anD InnovAtion 830927