Published June 13, 2023 | Version v1
Dataset Open

CESNET-MINER22-TS: Periodic Behavior Features of Cryptomining Communication

  • 1. Czech Technical University in Prague
  • 2. CESNET, a.l.e.

Description

CESNET-MINER22-TS: Periodic Behavior Features of Cryptomining Communication

Datasets were created for the paper: Enhancing DeCrypto: Finding Cryptocurrency Miners Based on Periodic Behavior -- Josef Koumar, Richard Plný, Tomáš Čejka -- which was published at The 19th International Conference on Network and Service Management (CNSM) 2023. Please cite usage of our datasets as:
 

J. Koumar, R. Plný and T. Čejka, "Enhancing DeCrypto: Finding Cryptocurrency Miners Based on Periodic Behavior," 2023 19th International Conference on Network and Service Management (CNSM), Niagara Falls, ON, Canada, 2023, pp. 1-7, doi: 10.23919/CNSM59352.2023.10327904.

 

The files cesnet_miner22_design_with_FTS_proba.zip and cesnet_miner22_evaluation_with_FTS_proba.zip contain one .csv file with IP flows. The IP flows were taken from the CESNET-MINER22 dataset [1], which was created by monitoring national research and educational network CESNET2. Furthermore, we add two features ID_DEPENDENCY (string) and PERIODICITY_PROBA (double). ID_DEPENDENCY is an ID of a network dependency (see the article [2]) and the PERIODICITY_PROBA is the predicted probability by FTS analysis. The files from periodicity_features.zip contain periodic behavior features for Machine Learning. The files names are in format "{evaluation/design}.periodicity_features.{TIME_INTERVAL}.{SIG_SPACE}.{PER_LEVEL}.csv" and have the following format of columns:

  • id_dependency -- Identification of a network dependency observed as a Flow time series (FTS).
  • label -- The labels ("Miner" or "Other") of periodic FTS.
  • packet_value -- Value of Clear periodic behavior of the metric packet.
  • packet_value_x -- Value of the interval's lower value of Sinusoidal periodic behavior of the metric packets.
  • packet_value_y -- Value of the interval's upper value of Sinusoidal periodic behavior of the metric packets.
  • packet_mean -- Mean value of the metric packet.
  • packet_std -- Standard deviation value of the metric packet.
  • packet_skewness -- Skewness value of the metric packet.
  • packet_kurtosis -- Kurtosis value of the metric packet.
  • bytes_value -- Value of Clear periodic behavior of the metric bytes.
  • bytes_value_x -- Value of the interval's lower value of Sinusoidal periodic behavior of the metric bytes.
  • bytes_value_y -- Value of the interval's upper value of Sinusoidal periodic behavior of the metric bytes.
  • bytes_mean -- Mean value of the metric bytes.
  • bytes_std -- Standard deviation value of the metric bytes.
  • bytes_skewness -- Skewness value of the metric bytes.
  • bytes_kurtosis -- Kurtosis value of the metric bytes.
  • duration_value -- Value of Clear periodic behavior of the metric duration.
  • duration_value_x -- Value of the interval's lower value of Sinusoidal periodic behavior of the metric duration.
  • duration_value_y -- Value of the interval's upper value of Sinusoidal periodic behavior of the metric duration.
  • duration_mean -- Mean value of the metric duration.
  • duration_std -- Standard deviation value of the metric duration.
  • duration_skewness -- Skewness value of the metric duration.
  • duration_kurtosis -- Kurtosis value of the metric duration.
  • difftimes_value -- Value of Clear periodic behavior of the metric difftimes.
  • difftimes_value_x -- Value of the interval's lower value of Sinusoidal periodic behavior of the metric difftimes.
  • difftimes_value_y -- Value of the interval's upper value of Sinusoidal periodic behavior of the metric difftimes.
  • difftimes_mean -- Mean value of the metric difftimes.
  • difftimes_std -- Standard deviation value of the metric difftimes.
  • difftimes_skewness -- Skewness value of the metric difftimes.
  • difftimes_kurtosis -- Kurtosis value of the metric difftimes.
  • max_power -- Represent the maximum power of the LS periodogram.
  • max_frequency -- Describe the frequency of the maximum power of the LS periodogram.
  • min_power -- Represent the minimum power of the LS periodogram.
  • min_frequency -- Describe the frequency of the minimum power of the LS periodogram.
  • spectral_energy -- Represents the total energy present at all frequencies in LS periodogram.
  • spectral_entropy -- The degree of randomness or disorder in the LS periodogram.
  • spectral_kurtosis -- Indicates a nonstationary or non-Gaussian behavior in the power spectrum.
  • spectral_skewness -- The measure of peakedness or flatness of power spectrum.
  • spectral_rolloff -- It is defined as frequency below 85% of the distribution power.
  • spectral_cetroid -- Indicates at which frequency the energy of a spectrum is centered upon.
  • spectral_spread -- It is the difference between the highest and lowest frequency in the power spectrum.
  • spectral_slope -- The slope of the power spectrum trend in a given frequency range.
  • spectral_crest -- Refers to the rate of shift of the sign of a wave, which is the rate of change from negative to positive or the reverse.
  • spectral_flux -- The rate of change of periodogram power with increasing frequency.
  • spectral_bandwidth -- Describes the difference between upper and lower frequencies at which spectral energy is half its maximum value.

 

The files from time_series.zip contain FTS of used time interval. The file names are in format "{evaluation/design}.time_series.{TIME_INTERVAL}.csv" and have the following format of columns:

  • ID_DEPENDENCY -- Identification of a network dependency observed as a FTS.
  • N_FLOWS -- Number of flows in time series, i.e., number of data points.
  • N_PACKETS -- Number of packets in time series, i.e., the sum of metric PACKETS.
  • N_BYTES -- Number of bytes in time series, i.e., the sum of metric PACKETS.
  • PACKETS -- The array containing the time series metric number of packets in the IP flow.
  • BYTES -- The array containing the time series metric number of bytes in the IP flow.
  • START_TIMES -- The array containing the time series time axis of the flows starts.
  • END_TIMES -- The array containing the time series time axis of the flows ends.
  • LABELS -- The array of labels ("Miner" of "Other") of each datapoint.

 

[1] Richard Plný et al. CESNET-MINER22: Datasets of Cryptomining Communication. Zenodo, October 2022.

[2] Koumar, Josef, and Tomáš Čejka. "Network traffic classification based on periodic behavior detection." 2022 18th International Conference on Network and Service Management (CNSM). IEEE, 2022.

Notes

This research was funded by the Ministry of Interior of the Czech Republic, grant No. VJ02010024: Flow-Based Encrypted Traffic Analysis and also by the Grant Agency of the CTU in Prague, grant No. SGS23/207/OHK3/3T/18 funded by the MEYS of the Czech Republic.

Files

cesnet_miner22_design_with_FTS_proba.zip

Files (1.6 GB)

Name Size Download all
md5:9b0a33ffd7f5f41ff1075b228ceb69eb
560.1 MB Preview Download
md5:19191e9f34fd9379b1ef7efaf5d02f07
329.2 MB Preview Download
md5:8e634b1781b984c71cb21767aac827a0
394.6 MB Preview Download
md5:8a85ca5198d1d0faca2c00e7222536fd
351.0 MB Preview Download

Additional details

References

  • Richard Plný et al. CESNET-MINER22: Datasets of Cryptomining Communication. Zenodo, October 2022.
  • Koumar, Josef, and Tomáš Čejka. "Network traffic classification based on periodic behavior detection." 2022 18th International Conference on Network and Service Management (CNSM). IEEE, 2022.