Dataset Open Access

Host network traffic time series 2019/01

Jirsik, Tomas; Smeriga, Juraj

JSON Export

  "files": [
      "links": {
        "self": ""
      "checksum": "md5:1a72f130f9bfd95c3107309419221ad2", 
      "bucket": "3a5e150a-3fa9-47c1-befd-c1876758b27b", 
      "key": "host-network-traffic-time-series-2019-01-annon.csv", 
      "type": "csv", 
      "size": 158716697
  "owners": [
  "doi": "10.5281/zenodo.2669079", 
  "stats": {
    "version_unique_downloads": 245.0, 
    "unique_views": 335.0, 
    "views": 380.0, 
    "version_views": 380.0, 
    "unique_downloads": 245.0, 
    "version_unique_views": 335.0, 
    "volume": 56185710738.0, 
    "version_downloads": 354.0, 
    "downloads": 354.0, 
    "version_volume": 56185710738.0
  "links": {
    "doi": "", 
    "conceptdoi": "", 
    "bucket": "", 
    "conceptbadge": "", 
    "html": "", 
    "latest_html": "", 
    "badge": "", 
    "latest": ""
  "conceptdoi": "10.5281/zenodo.2669078", 
  "created": "2019-05-20T11:03:30.227222+00:00", 
  "updated": "2020-01-24T19:24:56.382838+00:00", 
  "conceptrecid": "2669078", 
  "revision": 4, 
  "id": 2669079, 
  "metadata": {
    "access_right_category": "success", 
    "doi": "10.5281/zenodo.2669079", 
    "description": "<p><em><strong>General info</strong></em></p>\n\n<p>Dataset&nbsp;was collected over one <strong>month period in January 2019</strong>. The observation points for the collection of IP flows were located at the borders of the university campus network. The campus university network has /16 CIDR IPv4 network range at disposal and contains various network segments from segments connecting dormitories, over server segments, to a segment containing working stations of university administrative workers. The size of the raw IP flows used to create the dataset was over 860GB. <strong>A host in our dataset is identified by its source IPv4 address. &nbsp;</strong><br>\n&nbsp;</p>\n\n<p><em><strong>Variables</strong></em></p>\n\n<p>The dataset contains the following variables:</p>\n\n<ul>\n\t<li><strong>Aggregations</strong> - created from five-minute total volumes aggregated&nbsp;over&nbsp;one-hour disjoint windows using&nbsp;mean/max/min aggregation functions\n\n\t<ul>\n\t\t<li><strong># of flows (FL) </strong>- number of flows for a given source IP&nbsp;</li>\n\t\t<li><strong># of packets (PKT)</strong> -&nbsp;number of packets for a given source IP</li>\n\t\t<li><strong># of bytes (BYT)</strong> -&nbsp;number of packets for a given source IP</li>\n\t\t<li><strong>flow duration (DUR)</strong> - average flow duration in seconds</li>\n\t</ul>\n\t</li>\n\t<li><strong>Distinct Counts&nbsp;</strong>- count of distinct values for each variable in five-minute window aggregated&nbsp;over&nbsp;one-hour disjoint windows using&nbsp;mean/max/min aggregation functions\n\t<ul>\n\t\t<li><strong># of peers (PEER)</strong> - number of distinct communication peers for a given source IP</li>\n\t\t<li><strong># of ports (PORTS)</strong> - number of distinct destination ports&nbsp;for a given source IP</li>\n\t\t<li><strong># of protocols (PROTO)</strong> - number of distinct communication protocols&nbsp;for a given source IP</li>\n\t\t<li><strong># of AS numbers (AS)</strong> - number of distinct destination AS numbers for a given source IP</li>\n\t\t<li><strong># of countries (CTRY)</strong> - number of distinct destination countries&nbsp;for a given source IP</li>\n\t</ul>\n\t</li>\n\t<li><strong>Labels</strong>\n\t<ul>\n\t\t<li><strong>Range (RNG)</strong> - a network range a host belongs to (anonymized)</li>\n\t\t<li><strong>Unit (UNT) </strong>- an administrative unit owning the network range</li>\n\t\t<li><strong>Sub-unit (SUB-UNT)</strong> - a sub-unit of the unit</li>\n\t</ul>\n\t</li>\n</ul>\n\n<p>&nbsp;</p>\n\n<p><em><strong>Dataset format</strong></em></p>\n\n<ul>\n\t<li>The dataset is in <strong>comma-separated values (CSV)</strong> format.&nbsp;</li>\n\t<li><strong>Header</strong> - multilevel, first 3 lines\n\t<ul>\n\t\t<li>1 level - aggregation type {mean|min|max}</li>\n\t\t<li>2 level - variable {see above}</li>\n\t\t<li>3 level - hour of a day {00,01,02,03,...,22,23}</li>\n\t</ul>\n\t</li>\n\t<li><strong>Lablels</strong> - last 4 columns</li>\n\t<li><strong>Dataset size&nbsp;</strong>\n\t<ul>\n\t\t<li>rows: 65536 host records&nbsp;+ 3 headers</li>\n\t\t<li>columns: 648 variables + 4 labels</li>\n\t</ul>\n\t</li>\n</ul>\n\n<p>&nbsp;</p>", 
    "language": "eng", 
    "title": "Host network traffic time series 2019/01", 
    "license": {
      "id": "CC-BY-4.0"
    "relations": {
      "version": [
          "count": 1, 
          "index": 0, 
          "parent": {
            "pid_type": "recid", 
            "pid_value": "2669078"
          "is_last": true, 
          "last_child": {
            "pid_type": "recid", 
            "pid_value": "2669079"
    "version": "1.0.0", 
    "keywords": [
      "network traffic", 
      "time series", 
    "publication_date": "2019-05-06", 
    "creators": [
        "affiliation": "Masaryk University", 
        "name": "Jirsik, Tomas"
        "affiliation": "Masaryk University", 
        "name": "Smeriga, Juraj"
    "access_right": "open", 
    "resource_type": {
      "type": "dataset", 
      "title": "Dataset"
    "related_identifiers": [
        "scheme": "doi", 
        "identifier": "10.5281/zenodo.2669078", 
        "relation": "isVersionOf"
All versions This version
Views 380380
Downloads 354354
Data volume 56.2 GB56.2 GB
Unique views 335335
Unique downloads 245245


Cite as