Dataset Open Access
About the Data
Overview of Data
This data set contains execution logs of 12,000 Android applications and machine learning feature matrices constructed from them.
With Android being the most widespread mobile platform, protecting it against malicious applications is essential. Android users typically install applications from large remote repositories, which provides ample opportunities for malicious newcomers. In this paper, we evaluate a few techniques for detecting malicious Android applications on a repository level. The techniques perform automatic classification based on tracking system calls while applications are executed in a sandbox environment. We implemented the techniques in the maline tool, and performed extensive empirical evaluation on a suite of around 12,000 applications. The evaluation considers the size and type of inputs used in analyses. We show that simple and relatively small inputs result in an overall detection accuracy of 93% with a 5% benign application classification error, while results are improved to a 96% detection accuracy with up-sampling. This indicates that system-call based techniques are viable to be used in practice. Finally, we show that even simplistic feature choices are effective, suggesting that more heavyweight approaches should be thoroughly (re)evaluated.