Dataset Open Access

Exploiting Statistical and Structural Features for the Detection of Domain Generation Algorithms

Constantinos Patsakis; Fran Casino

This repository contains a dataset for the research of domain generation algorithms (DGAs) and machine learning. More precisely, it targets dictionary-based DGAs.

Constantinos Patsakis, Fran Casino: "Exploiting Statistical and Structural Features for the Detection of Domain Generation Algorithms", Journal of Information Security and Applications, 2021.

Features ordered as in the shared dataset:

  • Family: DGA that the domain belongs to
  • SLD: SLD of the Domain
  • L-LEN: The length of Domain
  • L-DIG: The number of digits in Domain
  • L-CON-MAX: The maximum number of consecutive consonants Domain
  • R-CON-VOW: Number of consonants divided by L-LEN 
  • L-SYM: The number of special characters
  • R-SYM-LEN: L-SYM divided by L-LEN
  • R-Dom-3G: Ratio of benign grams in Dom-3G
  • R-Dom-4G: Ratio of benign grams in Dom-4G
  • R-Dom-5G: Ratio of benign grams in Dom-5G
  • L-W2: Number of words with more than 2 characters in Domain
  • L-W3: Number of words with more than 3 characters in Domain
  • R-WS-LEN: Dom-WS divided by L-LEN
  • R-WDS-LEN: Dom-WDS divided by L-LEN
  • R-W2-LEN: Dom-W2 divided by L-LEN
  • R-W3-LEN: Dom-W3 divided by L-LEN
  • M2-Dom-Ws: 2-Chain Markov English grams applied to Dom-WS
  • M2-Dom-WDS: 2-Chain Markov English grams applied Dom-WDS
  • E-Dom-WS: Entropy of Dom-WS 
  • E-Dom-WDS: Entropy of Dom-WDS
  • E-Dom-W2: Entropy of Dom-W2
  • E-Dom-W3: Entropy of Dom-W3
Files (52.2 MB)
Name Size
dictionary_DGAs_dataset.zip
md5:92cd328d57a2ea5126eac1c1ef19a179
52.2 MB Download
77
7
views
downloads
All versions This version
Views 7777
Downloads 77
Data volume 365.5 MB365.5 MB
Unique views 6464
Unique downloads 77

Share

Cite as