Published May 1, 2020
| Version 1.0
Dataset
Open
Host Network Traffic 2019
Description
Dataset Summary
- Timespan: 2019-01-01 : 2019-12-31
- Granularity: 1-hour disjoint time windows
- # of characteristics observed: 9
- Hosts observed: 65536
- Labels: included
- Unzipped volume: approx. 10 GB
Dataset Origins
Dataset was collected over the whole year 2019. The observation points for the collection of IP flows were located at the borders of the university campus network. The campus university network has /16 CIDR IPv4 network range at disposal and contains various network segments from segments connecting dormitories, over server segments, to a segment containing working stations of university administrative workers. A host in our dataset is identified by its source IPv4 address.
Variables
The dataset contains the following variables:
- Aggregations - created sums of the individual variables over a one-hour interval:
- # of flows - number of flows for a given source IP
- # of packets - number of packets for a given source IP
- # of bytes - number of packets for a given source IP
- flow duration - average flow duration in seconds
- Distinct Counts - count of distinct values for each variable over a one-hour window
- # of peers - number of distinct communication peers for a given source IP
- # of ports - number of distinct destination ports for a given source IP
- # of protocols - number of distinct communication protocols for a given source IP
- # of AS numbers - number of distinct destination AS numbers for a given source IP
- # of countries - number of distinct destination countries for a given source
Dataset Structure
- Dataset Files - each variable is contained in one Comma-Separated File (.csv) file
- Row index - timestamp of the observation window (8760 rows)
- Columns index - anonymized IP addresses (65536 columns)
- Label File - contains labels of the individual IP addresses from the Dataset Files
- Row index - anonymized IP addresses (65536 rows)
- Columns index - labels for the IP addresses
- Subnet - ID of a subnet - hosts belonging to the same subnet have the same Id.
- Subnet_range - CIDR range of a subnet
- Unit - an ID of administrative unit owning the network range
- Sub-unit - an ID of administrative sub-unit owning the network range
- Subnet_label - subnet label
- Servers - selected subnets containing mostly servers (133.250.178.0/24, 133.250.163.0/24)
- Workstations - selected subnets containing mostly workstations (133.250.146.0/24, 133.250.157.128/25)
Further notes
- N/A values
- Variables - means that in a given observation window, the host did not communicate
- Labels - no additional information on this IP is available
- Dataset load
-
df = pd.read_csv(<filename>,header=[0], index_col=[0])
-
Files
Files
(1.6 GB)
Name | Size | Download all |
---|---|---|
md5:0775fd4e5b18da80be448a2673757bc4
|
1.6 GB | Download |