Android Malware Dataset with VirusTotal Labels
Authors/Creators
Description
This dataset contains labels of 2.47 million Android apk hashes extracted from VirusTotal reports.
The dataset was used in the experiments of our publication titled An Analysis of Android Malware Classification Services
The csv of the labels that was extracted from the VirusTotal reports is provided in labeling_dataset.csv.gz . A cell's value of -1 is used whenever there was no result from the
engine for the given apk file hash value. The column names are provided in cols_labeling_dataset.csv.
Note
-1 is a string and not an integer
If you use information from this repo, please cite our paper
Rashed M, Suarez-Tangil G. An Analysis of Android Malware Classification Services. Sensors. 2021; 21(16):5671. https://doi.org/10.3390/s21165671
BibTeX
@Article{s21165671,
AUTHOR = {Rashed, Mohammed and Suarez-Tangil, Guillermo},
TITLE = {An Analysis of Android Malware Classification Services},
JOURNAL = {Sensors},
VOLUME = {21},
YEAR = {2021},
NUMBER = {16},
ARTICLE-NUMBER = {5671},
URL = {https://www.mdpi.com/1424-8220/21/16/5671},\
ISSN = {1424-8220},
DOI = {10.3390/s21165671}
}
Required Software
gzip
- Debian-based Linux: you may install it using the following command
apt-get install gzip - MacOS:
gzipis pre-installed - Windows: you may download
gzipfrom http://gnuwin32.sourceforge.net/packages/gzip.htm
How to use the file?
There are two ways to use the file:
- Extract the gzip file and then you will have a csv output file. For that you need to install gzip and then extracting .csv.gz. The user may use the command
gunzip labelingDataset.csv.gz - Extract information from the zipped file directly (following the same logic of AndroZoo's csv):
To extract the first column and save to a file calledlist_of_selected_sha256, run the following command:zcat labelingDataset.csv.gz | cut -d',' -f1 > list_of_selected_sha256
To obtain rows of apk hashes that were first seen after the 1st of May, 2016, run this command:zcat labeling_dataset.csv.gz | grep -v ',snaggamea' | awk -F, '{if ( $2 >= "2016-05" ) {print} }'
Files
mra12/labelingDataset-Android_Malware_Labels_Dataset.zip
Files
(2.8 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:47651b0a8964df9aec98d28e0fdf78df
|
2.8 kB | Preview Download |
Additional details
Related works
- Is supplement to
- Software: https://github.com/mra12/labelingDataset/tree/Android_Malware_Labels_Dataset (URL)