10.5281/zenodo.3799932
https://zenodo.org/records/3799932
oai:zenodo.org:3799932
Tomas Jirsik
Tomas Jirsik
ICS, Masaryk University
Host Network Traffic 2019
Zenodo
2020
network traffic
netflow
host behavior
time series
labels
2020-05-01
eng
10.5281/zenodo.3799931
https://zenodo.org/communities/eu
1.0
Creative Commons Attribution 4.0 International
Dataset Summary
Timespan: 2019-01-01 : 2019-12-31
Granularity: 1-hour disjoint time windows
# of characteristics observed: 9
Hosts observed: 65536
Labels: included
Unzipped volume: approx. 10 GB
Dataset Origins
Dataset was collected over the whole year 2019. The observation points for the collection of IP flows were located at the borders of the university campus network. The campus university network has /16 CIDR IPv4 network range at disposal and contains various network segments from segments connecting dormitories, over server segments, to a segment containing working stations of university administrative workers. A host in our dataset is identified by its source IPv4 address.
Variables
The dataset contains the following variables:
Aggregations - created sums of the individual variables over a one-hour interval:
# of flows - number of flows for a given source IP
# of packets - number of packets for a given source IP
# of bytes - number of packets for a given source IP
flow duration - average flow duration in seconds
Distinct Counts - count of distinct values for each variable over a one-hour window
# of peers - number of distinct communication peers for a given source IP
# of ports - number of distinct destination ports for a given source IP
# of protocols - number of distinct communication protocols for a given source IP
# of AS numbers - number of distinct destination AS numbers for a given source IP
# of countries - number of distinct destination countries for a given source
Dataset Structure
Dataset Files - each variable is contained in one Comma-Separated File (.csv) file
Row index - timestamp of the observation window (8760 rows)
Columns index - anonymized IP addresses (65536 columns)
Label File - contains labels of the individual IP addresses from the Dataset Files
Row index - anonymized IP addresses (65536 rows)
Columns index - labels for the IP addresses
Subnet - ID of a subnet - hosts belonging to the same subnet have the same Id.
Subnet_range - CIDR range of a subnet
Unit - an ID of administrative unit owning the network range
Sub-unit - an ID of administrative sub-unit owning the network range
Subnet_label - subnet label
Servers - selected subnets containing mostly servers (133.250.178.0/24, 133.250.163.0/24)
Workstations - selected subnets containing mostly workstations (133.250.146.0/24, 133.250.157.128/25)
Further notes
N/A values
Variables - means that in a given observation window, the host did not communicate
Labels - no additional information on this IP is available
Dataset load
df = pd.read_csv(<filename>,header=[0], index_col=[0])
European Commission
10.13039/501100000780
833418
Sharing and Automation for Privacy Preserving Attack Neutralization