Dataset Open Access

Host network traffic time series 2019/01

Jirsik, Tomas; Smeriga, Juraj


MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nmm##2200000uu#4500</leader>
  <datafield tag="041" ind1=" " ind2=" ">
    <subfield code="a">eng</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">network traffic</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">time series</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">host</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">clustering</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">classification</subfield>
  </datafield>
  <controlfield tag="005">20200124192456.0</controlfield>
  <controlfield tag="001">2669079</controlfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Masaryk University</subfield>
    <subfield code="a">Smeriga, Juraj</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">158716697</subfield>
    <subfield code="z">md5:1a72f130f9bfd95c3107309419221ad2</subfield>
    <subfield code="u">https://zenodo.org/record/2669079/files/host-network-traffic-time-series-2019-01-annon.csv</subfield>
  </datafield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  </datafield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2019-05-06</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">openaire_data</subfield>
    <subfield code="o">oai:zenodo.org:2669079</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">Masaryk University</subfield>
    <subfield code="a">Jirsik, Tomas</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">Host network traffic time series 2019/01</subfield>
  </datafield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u">https://creativecommons.org/licenses/by/4.0/legalcode</subfield>
    <subfield code="a">Creative Commons Attribution 4.0 International</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2">opendefinition.org</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;&lt;em&gt;&lt;strong&gt;General info&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Dataset&amp;nbsp;was collected over one &lt;strong&gt;month period in January 2019&lt;/strong&gt;. The observation points for the collection of IP flows were located at the borders of the university campus network. The campus university network has /16 CIDR IPv4 network range at disposal and contains various network segments from segments connecting dormitories, over server segments, to a segment containing working stations of university administrative workers. The size of the raw IP flows used to create the dataset was over 860GB. &lt;strong&gt;A host in our dataset is identified by its source IPv4 address. &amp;nbsp;&lt;/strong&gt;&lt;br&gt;
&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Variables&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The dataset contains the following variables:&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;&lt;strong&gt;Aggregations&lt;/strong&gt; - created from five-minute total volumes aggregated&amp;nbsp;over&amp;nbsp;one-hour disjoint windows using&amp;nbsp;mean/max/min aggregation functions

	&lt;ul&gt;
		&lt;li&gt;&lt;strong&gt;# of flows (FL) &lt;/strong&gt;- number of flows for a given source IP&amp;nbsp;&lt;/li&gt;
		&lt;li&gt;&lt;strong&gt;# of packets (PKT)&lt;/strong&gt; -&amp;nbsp;number of packets for a given source IP&lt;/li&gt;
		&lt;li&gt;&lt;strong&gt;# of bytes (BYT)&lt;/strong&gt; -&amp;nbsp;number of packets for a given source IP&lt;/li&gt;
		&lt;li&gt;&lt;strong&gt;flow duration (DUR)&lt;/strong&gt; - average flow duration in seconds&lt;/li&gt;
	&lt;/ul&gt;
	&lt;/li&gt;
	&lt;li&gt;&lt;strong&gt;Distinct Counts&amp;nbsp;&lt;/strong&gt;- count of distinct values for each variable in five-minute window aggregated&amp;nbsp;over&amp;nbsp;one-hour disjoint windows using&amp;nbsp;mean/max/min aggregation functions
	&lt;ul&gt;
		&lt;li&gt;&lt;strong&gt;# of peers (PEER)&lt;/strong&gt; - number of distinct communication peers for a given source IP&lt;/li&gt;
		&lt;li&gt;&lt;strong&gt;# of ports (PORTS)&lt;/strong&gt; - number of distinct destination ports&amp;nbsp;for a given source IP&lt;/li&gt;
		&lt;li&gt;&lt;strong&gt;# of protocols (PROTO)&lt;/strong&gt; - number of distinct communication protocols&amp;nbsp;for a given source IP&lt;/li&gt;
		&lt;li&gt;&lt;strong&gt;# of AS numbers (AS)&lt;/strong&gt; - number of distinct destination AS numbers for a given source IP&lt;/li&gt;
		&lt;li&gt;&lt;strong&gt;# of countries (CTRY)&lt;/strong&gt; - number of distinct destination countries&amp;nbsp;for a given source IP&lt;/li&gt;
	&lt;/ul&gt;
	&lt;/li&gt;
	&lt;li&gt;&lt;strong&gt;Labels&lt;/strong&gt;
	&lt;ul&gt;
		&lt;li&gt;&lt;strong&gt;Range (RNG)&lt;/strong&gt; - a network range a host belongs to (anonymized)&lt;/li&gt;
		&lt;li&gt;&lt;strong&gt;Unit (UNT) &lt;/strong&gt;- an administrative unit owning the network range&lt;/li&gt;
		&lt;li&gt;&lt;strong&gt;Sub-unit (SUB-UNT)&lt;/strong&gt; - a sub-unit of the unit&lt;/li&gt;
	&lt;/ul&gt;
	&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Dataset format&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;The dataset is in &lt;strong&gt;comma-separated values (CSV)&lt;/strong&gt; format.&amp;nbsp;&lt;/li&gt;
	&lt;li&gt;&lt;strong&gt;Header&lt;/strong&gt; - multilevel, first 3 lines
	&lt;ul&gt;
		&lt;li&gt;1 level - aggregation type {mean|min|max}&lt;/li&gt;
		&lt;li&gt;2 level - variable {see above}&lt;/li&gt;
		&lt;li&gt;3 level - hour of a day {00,01,02,03,...,22,23}&lt;/li&gt;
	&lt;/ul&gt;
	&lt;/li&gt;
	&lt;li&gt;&lt;strong&gt;Lablels&lt;/strong&gt; - last 4 columns&lt;/li&gt;
	&lt;li&gt;&lt;strong&gt;Dataset size&amp;nbsp;&lt;/strong&gt;
	&lt;ul&gt;
		&lt;li&gt;rows: 65536 host records&amp;nbsp;+ 3 headers&lt;/li&gt;
		&lt;li&gt;columns: 648 variables + 4 labels&lt;/li&gt;
	&lt;/ul&gt;
	&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">doi</subfield>
    <subfield code="i">isVersionOf</subfield>
    <subfield code="a">10.5281/zenodo.2669078</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.2669079</subfield>
    <subfield code="2">doi</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">dataset</subfield>
  </datafield>
</record>
378
352
views
downloads
All versions This version
Views 378378
Downloads 352352
Data volume 55.9 GB55.9 GB
Unique views 334334
Unique downloads 244244

Share

Cite as