Journal article Open Access
Hatamian, Majid; Serna, Jetzabel; Rannenberg, Kai
Popular smartphone apps may receive several thousands of user reviews containing statements about apps’ functionality, interface, user-friendliness, etc. They sometimes also comprise privacy relevant information that can be extremely helpful for app developers to better understand why users complain about certain privacy aspects of their apps. However, due to the complicated and sometimes vague nature of reviews, it is quite though and time consuming for developers to go through all these reviews to get information about privacy aspects of apps. Furthermore, previous studies confirmed that sometimes bad privacy practices happen due to the app developers’ lack of knowledge in API definition and usage. In addition, such information can be useful for mobile users as the lack of privacy indicators in smartphone ecosystems prevents them from being able to compare apps in terms of privacy and to perform informed privacy decision making when selecting apps. Therefore, in this paper we propose Mobile App Reviews Summarization (MARS) to overcome the aforementioned difficulties. We exploit user reviews on the Google Play Store as a relevant source in order to extract and quantify privacy relevant claims associated with apps. Based on Machine Learning (ML), Natural Language Processing (NLP) and sentiment analysis techniques, MARS detects privacy relevant reviews and categorizes them into a pre-identified list of privacy threats in the context of mobile apps. The combination of these concepts provides developers with specific knowledge about the privacy threats and behavior of apps based on user generated reports that are otherwise difficult to detect. Not only developers, but also users can benefit from such mechanism to compare apps in terms of privacy aspects. To this end, we complement MARS by a novel app behavior monitoring tool that further enhances the whole reliability of the results generated by MARS. Our results demonstrate the applicability of our approach which provides precision, recall and F-score as high as 94.84%, 91.30% and 92.79%, respectively. Also, we obtained interesting findings concerning the quantity and quality of privacy relevant information published in the user reviews and their relation to the apps’ behavior in reality indicating that user reviews are important and valuable source of information regarding the privacy behavior of mobile apps.