Dataset Open Access

Exploiting Statistical and Structural Features for the Detection of Domain Generation Algorithms

Constantinos Patsakis; Fran Casino


MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nmm##2200000uu#4500</leader>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">DGAs</subfield>
  </datafield>
  <controlfield tag="005">20201221201941.0</controlfield>
  <controlfield tag="001">4010620</controlfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">University of Piraeus</subfield>
    <subfield code="0">(orcid)0000-0003-4296-2876</subfield>
    <subfield code="a">Fran Casino</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">52219491</subfield>
    <subfield code="z">md5:92cd328d57a2ea5126eac1c1ef19a179</subfield>
    <subfield code="u">https://zenodo.org/record/4010620/files/dictionary_DGAs_dataset.zip</subfield>
  </datafield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  </datafield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2020-09-01</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">openaire_data</subfield>
    <subfield code="o">oai:zenodo.org:4010620</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">University of Piraeus</subfield>
    <subfield code="0">(orcid)0000-0002-4460-9331</subfield>
    <subfield code="a">Constantinos Patsakis</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">Exploiting Statistical and Structural Features for the Detection of Domain Generation Algorithms</subfield>
  </datafield>
  <datafield tag="536" ind1=" " ind2=" ">
    <subfield code="c">832735</subfield>
    <subfield code="a">Lawful evidence collecting and continuity platform development</subfield>
  </datafield>
  <datafield tag="536" ind1=" " ind2=" ">
    <subfield code="c">780498</subfield>
    <subfield code="a">Cybersecurity Awareness and Knowledge Systemic High-level Application</subfield>
  </datafield>
  <datafield tag="536" ind1=" " ind2=" ">
    <subfield code="c">830929</subfield>
    <subfield code="a">Cyber Security Network of Competence Centres for Europe</subfield>
  </datafield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u">https://creativecommons.org/licenses/by/4.0/legalcode</subfield>
    <subfield code="a">Creative Commons Attribution 4.0 International</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2">opendefinition.org</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;This repository contains a&amp;nbsp;dataset for the research of domain generation algorithms (DGAs) and machine learning. More precisely, it targets dictionary-based DGAs.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Constantinos Patsakis, Fran Casino: &amp;quot;Exploiting Statistical and Structural Features for the Detection of Domain Generation Algorithms&amp;quot;,&amp;nbsp;Journal of Information Security and Applications, 2021.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Features ordered as in the shared dataset:&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;Family: DGA that the domain belongs to&lt;/li&gt;
	&lt;li&gt;SLD: SLD of the Domain&lt;/li&gt;
	&lt;li&gt;L-LEN: The length of Domain&lt;/li&gt;
	&lt;li&gt;L-DIG: The number of digits in Domain&lt;/li&gt;
	&lt;li&gt;L-CON-MAX: The maximum number of consecutive consonants Domain&lt;/li&gt;
	&lt;li&gt;R-CON-VOW: Number of consonants divided by L-LEN&amp;nbsp;&lt;/li&gt;
	&lt;li&gt;L-SYM: The number of special characters&lt;/li&gt;
	&lt;li&gt;R-SYM-LEN: L-SYM divided by L-LEN&lt;/li&gt;
	&lt;li&gt;R-Dom-3G: Ratio of benign grams in Dom-3G&lt;/li&gt;
	&lt;li&gt;R-Dom-4G: Ratio of benign grams in Dom-4G&lt;/li&gt;
	&lt;li&gt;R-Dom-5G: Ratio of benign grams in Dom-5G&lt;/li&gt;
	&lt;li&gt;L-W2: Number of words with more than 2 characters in Domain&lt;/li&gt;
	&lt;li&gt;L-W3: Number of words with more than 3 characters in Domain&lt;/li&gt;
	&lt;li&gt;R-WS-LEN: Dom-WS divided by L-LEN&lt;/li&gt;
	&lt;li&gt;R-WDS-LEN: Dom-WDS divided by L-LEN&lt;/li&gt;
	&lt;li&gt;R-W2-LEN: Dom-W2 divided by L-LEN&lt;/li&gt;
	&lt;li&gt;R-W3-LEN: Dom-W3 divided by L-LEN&lt;/li&gt;
	&lt;li&gt;M2-Dom-Ws: 2-Chain Markov English grams applied to Dom-WS&lt;/li&gt;
	&lt;li&gt;M2-Dom-WDS: 2-Chain Markov English grams applied Dom-WDS&lt;/li&gt;
	&lt;li&gt;E-Dom-WS: Entropy of Dom-WS&amp;nbsp;&lt;/li&gt;
	&lt;li&gt;E-Dom-WDS: Entropy of Dom-WDS&lt;/li&gt;
	&lt;li&gt;E-Dom-W2: Entropy of Dom-W2&lt;/li&gt;
	&lt;li&gt;E-Dom-W3: Entropy of Dom-W3&lt;/li&gt;
&lt;/ul&gt;</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">doi</subfield>
    <subfield code="i">isVersionOf</subfield>
    <subfield code="a">10.5281/zenodo.4010619</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.4010620</subfield>
    <subfield code="2">doi</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">dataset</subfield>
  </datafield>
</record>
134
11
views
downloads
All versions This version
Views 134134
Downloads 1111
Data volume 574.4 MB574.4 MB
Unique views 109109
Unique downloads 1111

Share

Cite as