SetDroid: Detecting User-Configurable Setting Issues of Android Apps via Metamorphic Fuzzing

Android, the most popular mobile system, offers a number of app-independent, user-configurable settings (e.g., network, location and permission) for controlling the devices and the apps. However, apps may fail to properly adapt their behaviors when these settings are changed, and thus frustrate users. We name such issues as setting issues, which reside in the apps and are induced by the changes of settings. According to our investigation, the majority of setting issues are non-crash (logic) bugs, which however cannot be detected by existing automated app testing techniques due to the lack of test oracles. To this end, we designed and introduced, setting-wise metamorphic fuzzing, the first automated testing technique to overcome the oracle problem in detecting setting issues. Our key insight is that, in most cases, the app behaviors should keep consistent if a given setting is changed and later properly restored. We realized this technique as an automated GUI testing tool, SetDroid, and applied it on 26 popular, open-source Android apps. SetDroid successfully found 32 unique, previously-unknown setting issues in these apps. So far, 25 have been confirmed and 17 were already fixed. We further applied SetDroid on 4 commercial apps with billions of monthly active users and successfully detected 15 previously unknown setting issues, all of which have been confirmed and under fixing. The majority of all these bugs (37 out of 47) are non-crash bugs, which cannot be detected by any prior testing technique.

To understand the setting issues in Android apps, we have studied 1,074 setting issues from 180 popular Android apps on GitHub and found that the majority of these issues (759/1,074 ≈70.7%) lead to non-crash consequences, e.g., problematic UI display, stuck, and function failure. However, due to the lack of test oracles, existing automated GUI testing tools can hardly uncover these issues [5], [6].
To fill the gap, we leveraged the idea of metamorphic testing and introduced setting-wise metamorphic fuzzing, the first automated testing approach to overcome the oracle problem in detecting setting issues for Android apps. We implemented this approach as an automated GUI testing tool, SETDROID, and applied it on 26 popular open-source apps and 4 commercial apps. Finally, it revealed 47 previously-unknown setting issues from these apps. So far, 40 were confirmed and 17 of them were fixed. Most of these bugs (37 out of 47) are non-crash bugs, and cannot be detected by existing testing techniques.
II. RELATED WORK Android app testing has received much attention [7]- [14]. However, existing generic automated testing tools are limited to crash bugs due to the lack of test oracles and ineffective in detecting setting issues. Prior work [15], [16] however explores limited types of settings and have different research focuses from ours. Sadeghi et al. [15] propose PATDROID to detect bugs caused by changing app permissions. Lu et al. [16] propose PREFEST to detect bugs caused by the changes of apps' own preferences and some system settings (i.e., WiFi, Bluetooth, mobile data, and location). PATDROID and PREFEST focus on reducing the testing cost due to the combinations of different options but do not consider the impact of setting changes during app usage. Moreover, they can only detect crash bugs, while our work can detect both crash and non-crash bugs.

III. APPROACH AND IMPLEMENTATION
Metamorphic testing [17] is a property-based software testing approach to addressing the test oracle problem. In our scenario, our key observation is that, in most cases, the app behaviors should keep consistent if a given setting is changed and later properly restored. Otherwise, a likely setting issue happens. For example, an app's function should not be affected if (1) the network is closed but immediately opened; or (2) a specific app permission is revoked but later granted when the app requests that permission again. We leverage this observation as one kind of metamorphic relation to overcome the oracle problem. Our Approach. We formalize our technique as follows. Let e be a GUI event (e.g., click, edit, swipe, rotate screen); let e.w be the GUI widget w that e targets. Let be a GUI layout (page) of an app which represents a GUI hierarchy tree. Let E be a given seed GUI test which is a sequence of events E = [e 1 , e 2 , . . . , e n ]. Starting from the initial page 1 of the app, E can be executed on an app P to obtain a sequence of GUI layouts L = [ 1 , 2 , . . . , n+1 ]. Specifically, we can view the execution of e i as a function, i.e., i+1 = e i ( i ). Then, we inject a pair of events e c , e u into E to obtain a mutant test E , where e c changes a given setting at random position, while e u restores the setting. Specifically, we designed two strategies to obtain E by injecting e c , e u into E: • Immediate setting mutation. We inject e c followed immediately by e u . For example, e c closes the network, and e u immediately opens the network. • Lazy setting mutation. We inject e c first, and only inject e u when it is necessary (e.g., the app prompts an alert dialog). For example, e c revokes an app permission, and e u grants the permission only when the app requests that permission. By comparing the GUI consistency between the seed test E and the mutant test E , we can detect setting issues. Formally, the oracle checking rule is: if there exists one GUI event e i ∈ E (e i corresponds to e i ∈ E), and its target widget e i .w cannot be located on the corresponding layout i ∈ L ( i corresponds to i ∈ L), then a likely setting issue is found. ∃e i .e i .w ∈ l i ∧ e i .w / ∈ l i (1) Tool Implementation. We realized our approach as an automated GUI testing tool SETDROID. Fig. 1 depicts the overview of SETDROID, which contains three main modules: (1) test executor, (2) setting change injector, and (3) oracle checker. We implemented this tool on the UI Automator test framework [18], which provides a set of APIs to perform interactions with apps and obtain apps' information such as GUI layouts. (a) Test Executor. The test executor randomly generates a seed test on device A, and replays the same event sequence (but injected with setting changes) on reference device B. We generate random seed tests because such tests are expected to be much more diverse, practical, and scalable to obtain. SETDROID can also integrate with existing test input generation tools to obtain seed tests.   The corresponding event e u will be inserted after e c to restore the corresponding settings as predefined.
(c) Oracle Checker. This module will check whether the oracle checking rule is violated. If the rule is violated, a corresponding bug report will be generated, which records the executed events and GUI screenshots for bug diagnosing. IV. EVALUATION We use the 26 apps from the prior work [16] as our evaluation subjects. We ran SETDROID and the two relevant testing tools PATDROID and PREFEST with the same time (12 hours per app). For any found non-crash bug, we manually inspected the bug report and counted the true positives (TP for short) and false positives (FP for short). We replayed each TP bug on real Android devices for validation before reporting on GitHub. In addition, we used SETDROID to test 4 commercial apps from Tecent and ByteDance. They are WeChat [19], QQMail [20], TikTok [21] and CapCut [22], which have billions of monthly active users.
During testing, SetDroid reported 156 errors, 134 of which were TPs (131/156≈83.9%). We analyzed the FPs and found these FPs are caused by specific app features. For example, when the screen orientation setting is changed, AlwaysOn will pop up an animation on top of the screen to explain the app function. As shown in Table I, out of the 26 apps, SetDroid found 32 unique and previously unknown setting issues from 24 apps. So far, 25 have been confirmed and 17 were already fixed. We also found 15 setting issues in the 4 commercial apps, all of which have been confirmed and under fixing.
Considering PREFEST and PATDROID only cover limited types of settings, we compare the number of bugs they detected in terms of the settings they cover w.r.t. SETDROID. As shown in Table II, PREFEST did not detect any bug while PATDroid detected two crash bugs related to permission. Most importantly, we can see that SetDroid detected 32 noncrash setting issues from the 26 open-source apps, none of which can be detected by PREFEST and PATDROID. Overall, these results clearly show that SETDROID is effective and outperforms existing tools.