Dataset Open Access


Zvonimir Rakamarić

About the Data

Overview of Data

This data set contains execution logs of 12,000 Android applications and machine learning feature matrices constructed from them.

Paper Abstract

With Android being the most widespread mobile platform, protecting it against malicious applications is essential. Android users typically install applications from large remote repositories, which provides ample opportunities for malicious newcomers. In this paper, we evaluate a few techniques for detecting malicious Android applications on a repository level. The techniques perform automatic classification based on tracking system calls while applications are executed in a sandbox environment. We implemented the techniques in the maline tool, and performed extensive empirical evaluation on a suite of around 12,000 applications. The evaluation considers the size and type of inputs used in analyses. We show that simple and relatively small inputs result in an overall detection accuracy of 93% with a 5% benign application classification error, while results are improved to a 96% detection accuracy with up-sampling. This indicates that system-call based techniques are viable to be used in practice. Finally, we show that even simplistic feature choices are effective, suggesting that more heavyweight approaches should be thoroughly (re)evaluated.

Files (4.9 GB)
Name Size
4.9 GB Download
All versions This version
Views 194194
Downloads 3232
Data volume 157.9 GB157.9 GB
Unique views 171171
Unique downloads 2828


Cite as