Dataset Open Access
Jirsik, Tomas; Smeriga, Juraj
<?xml version='1.0' encoding='UTF-8'?> <record xmlns="http://www.loc.gov/MARC21/slim"> <leader>00000nmm##2200000uu#4500</leader> <datafield tag="041" ind1=" " ind2=" "> <subfield code="a">eng</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">network traffic</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">time series</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">host</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">clustering</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">classification</subfield> </datafield> <controlfield tag="005">20200124192456.0</controlfield> <controlfield tag="001">2669079</controlfield> <datafield tag="700" ind1=" " ind2=" "> <subfield code="u">Masaryk University</subfield> <subfield code="a">Smeriga, Juraj</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">158716697</subfield> <subfield code="z">md5:1a72f130f9bfd95c3107309419221ad2</subfield> <subfield code="u">https://zenodo.org/record/2669079/files/host-network-traffic-time-series-2019-01-annon.csv</subfield> </datafield> <datafield tag="542" ind1=" " ind2=" "> <subfield code="l">open</subfield> </datafield> <datafield tag="260" ind1=" " ind2=" "> <subfield code="c">2019-05-06</subfield> </datafield> <datafield tag="909" ind1="C" ind2="O"> <subfield code="p">openaire_data</subfield> <subfield code="o">oai:zenodo.org:2669079</subfield> </datafield> <datafield tag="100" ind1=" " ind2=" "> <subfield code="u">Masaryk University</subfield> <subfield code="a">Jirsik, Tomas</subfield> </datafield> <datafield tag="245" ind1=" " ind2=" "> <subfield code="a">Host network traffic time series 2019/01</subfield> </datafield> <datafield tag="540" ind1=" " ind2=" "> <subfield code="u">https://creativecommons.org/licenses/by/4.0/legalcode</subfield> <subfield code="a">Creative Commons Attribution 4.0 International</subfield> </datafield> <datafield tag="650" ind1="1" ind2="7"> <subfield code="a">cc-by</subfield> <subfield code="2">opendefinition.org</subfield> </datafield> <datafield tag="520" ind1=" " ind2=" "> <subfield code="a"><p><em><strong>General info</strong></em></p> <p>Dataset&nbsp;was collected over one <strong>month period in January 2019</strong>. The observation points for the collection of IP flows were located at the borders of the university campus network. The campus university network has /16 CIDR IPv4 network range at disposal and contains various network segments from segments connecting dormitories, over server segments, to a segment containing working stations of university administrative workers. The size of the raw IP flows used to create the dataset was over 860GB. <strong>A host in our dataset is identified by its source IPv4 address. &nbsp;</strong><br> &nbsp;</p> <p><em><strong>Variables</strong></em></p> <p>The dataset contains the following variables:</p> <ul> <li><strong>Aggregations</strong> - created from five-minute total volumes aggregated&nbsp;over&nbsp;one-hour disjoint windows using&nbsp;mean/max/min aggregation functions <ul> <li><strong># of flows (FL) </strong>- number of flows for a given source IP&nbsp;</li> <li><strong># of packets (PKT)</strong> -&nbsp;number of packets for a given source IP</li> <li><strong># of bytes (BYT)</strong> -&nbsp;number of packets for a given source IP</li> <li><strong>flow duration (DUR)</strong> - average flow duration in seconds</li> </ul> </li> <li><strong>Distinct Counts&nbsp;</strong>- count of distinct values for each variable in five-minute window aggregated&nbsp;over&nbsp;one-hour disjoint windows using&nbsp;mean/max/min aggregation functions <ul> <li><strong># of peers (PEER)</strong> - number of distinct communication peers for a given source IP</li> <li><strong># of ports (PORTS)</strong> - number of distinct destination ports&nbsp;for a given source IP</li> <li><strong># of protocols (PROTO)</strong> - number of distinct communication protocols&nbsp;for a given source IP</li> <li><strong># of AS numbers (AS)</strong> - number of distinct destination AS numbers for a given source IP</li> <li><strong># of countries (CTRY)</strong> - number of distinct destination countries&nbsp;for a given source IP</li> </ul> </li> <li><strong>Labels</strong> <ul> <li><strong>Range (RNG)</strong> - a network range a host belongs to (anonymized)</li> <li><strong>Unit (UNT) </strong>- an administrative unit owning the network range</li> <li><strong>Sub-unit (SUB-UNT)</strong> - a sub-unit of the unit</li> </ul> </li> </ul> <p>&nbsp;</p> <p><em><strong>Dataset format</strong></em></p> <ul> <li>The dataset is in <strong>comma-separated values (CSV)</strong> format.&nbsp;</li> <li><strong>Header</strong> - multilevel, first 3 lines <ul> <li>1 level - aggregation type {mean|min|max}</li> <li>2 level - variable {see above}</li> <li>3 level - hour of a day {00,01,02,03,...,22,23}</li> </ul> </li> <li><strong>Lablels</strong> - last 4 columns</li> <li><strong>Dataset size&nbsp;</strong> <ul> <li>rows: 65536 host records&nbsp;+ 3 headers</li> <li>columns: 648 variables + 4 labels</li> </ul> </li> </ul> <p>&nbsp;</p></subfield> </datafield> <datafield tag="773" ind1=" " ind2=" "> <subfield code="n">doi</subfield> <subfield code="i">isVersionOf</subfield> <subfield code="a">10.5281/zenodo.2669078</subfield> </datafield> <datafield tag="024" ind1=" " ind2=" "> <subfield code="a">10.5281/zenodo.2669079</subfield> <subfield code="2">doi</subfield> </datafield> <datafield tag="980" ind1=" " ind2=" "> <subfield code="a">dataset</subfield> </datafield> </record>
All versions | This version | |
---|---|---|
Views | 378 | 378 |
Downloads | 352 | 352 |
Data volume | 55.9 GB | 55.9 GB |
Unique views | 334 | 334 |
Unique downloads | 244 | 244 |