This document describes experimental data generated for a malware detection project that was published in the following research paper:
M. Dimjašević, S. Atzeni, I. Ugrina, and Z. Rakamarić, "Evaluation of Android Malware Detection Based on System Calls," in Proceedings of the International Workshop on Security and Privacy Analytics (IWSPA), 2016.
See the following page for the citation details: http://dx.doi.org/10.1145/2875475.2875487
This dataset contains execution logs of more than 12,000 Android applications and machine learning feature matrices constructed from them. It consists of six different data subsets, each of them corresponding to a different experiment. For each experiment we had two ways of encoding application behavior into features, which resulted in two feature matrices per experiment. An experiment is defined by the number of internal events we use to drive application execution as well as external events we send to the Android system. The numbers of events are 1, 500, 1000, 2000, and 5000. Each subset can be identified by its file name prefix. The first part of the prefix denotes the number of events. The second part of the prefix is either "with" or "wo" denoting whether the experiment was carried out either with or without external events, respectively. Each subset contains the feature matrices generated by processing the logs. For an explanation of what features represent and how they are computed see the accompanying paper mentioned above.
Per subset, there are two different feature matrices, and each feature matrix is available in two different formats. Therefore, you can choose a format that suits you better. You can use XZ Utils to extract matrices from their archive files (XZ Utils web page).
These matrices are in the MatrixMarket format: https://stat.ethz.ch/R-manual/R-devel/library/Matrix/html/externalFormats.html
Three numbers on the first line denote the number of applications (i.e., vectors or rows in a matrix), features per vector, and the number of values in the matrix, respectively. The feature at the end of each vector is its label: 0 for a benign application, and 1 for a malicious application.
The first line contains two numbers which are respectively the number of lines (N, the number of applications) and the number of features for each application (DIM, the number of features). The second line contains the percentage of vectors to be used as a training set. The third line contains a flag to specify if you want to randomize the data or not (used by our data processing scripts). The rest of the data are feature vectors for each app (N x DIM). As in the other format, the feature at the end of each vector is the application's label: 0 for a benign application, and 1 for a malicious application.
The 1-event experimental data is not well aligned with data from other experiments. This is evidenced by the fact that the 1-event experiment analyzes 13561 applications, while all the other experiments analyze 12660 applications. While the other experiments analyzed the same set of applications and every particular application has the same row number across different matrices, with the experiment where only 1 internal event was used this is not the case and one cannot directly compare the results with those of the other experiments. Therefore, when interpreting our results reported in the paper mentioned earlier in this document, or when doing your own analysis of the data, take the 1-event experiment as a proxy not directly comparable to others.
For each experiment we provide a collection of raw logs. They are available in android-logs.tar.xz files. In every experiment for every application execution there are two logs:
From the .log file we generated two feature vectors with our analysis tool, and each vector was written to a separate file. Feature vector files have .freq and .graph extensions and are described above. Each vector contributes a row in the corresponding feature matrix.
We used our tool maline to generate and analyze the data. It is a free software framework available at: