Leveraging Microsoft's mobile usability guidelines: Conceptualizing and developing scales for mobile application usability $

This research conceptualizes mobile application usability and develops and validates an instrument to measure the same. Mobile application usability has attracted widespread attention in the ﬁ eld of human – computer interaction because well-designed applications can enhance user experiences. To conceptualize mobile application usability, we analyzed Microsoft ’ s mobile usability guidelines and de ﬁ ned 10 constructs representing mobile application usability. Next, we conducted a pilot study followed by a quantitative assessment of the content validity of the scales. We then sequentially applied exploratory factor analysis and con ﬁ rmatory factor analysis to two samples ( n ¼ 404; n ¼ 501) consisting of German consumers using mobile social media applications on their smartphones. To evaluate the con ﬁ rmatory factor model, we followed a step-by-step process assessing unidimensionality, discriminant validity and reliability. To assess the nomological validity of our instrument, we examined the impact of mobile application usability on two outcomes: continued intention to use and brand loyalty. The results con ﬁ rmed that mobile application usability was a good predictor of both outcomes. The constructs and scales associated with mobile application usability validated in this paper can be used to guide future research in human – computer interaction and aid in the effective design of mobile applications.


Introduction
Over the last decade, the use of mobile devices has grown exponentially, with worldwide sales of more than 1.9 billion units in 2014 alone (Gartner Research, 2015). In many developed countries, individuals often own more than one mobile phone Fosso Wamba and Chatfield, 2009;Fosso Wamba, 2012;Anand and Fosso Wamba, 2013). In conjunction with these developments, mobile devices have become more sophisticated and recent models enable individuals to interact with mobile applications on the go (Lal and Dwivedi, 2009;Harvey and Harvey, 2014). Particularly, Internetenabled smartphones are becoming increasingly popular and recent reports found that smartphones accounted for more than 60 percent of mobile phone sales in 2014, resulting in smartphone sales surpassing traditional mobile phone sales (Gartner Research, 2015).
In spite of high rates of smartphone diffusion, only a third of all firms selling consumer goods have established mobile strategies and two-thirds of all firms do not provide mobile applications for their customers (Forrester Research, 2011). Recent market research shows that managers recognize that they miss out business opportunities in the mobile market and 70% of firms are currently adjusting their mobile strategies (Forrester Research, 2011). Developing welldesigned mobile applications is a challenge for organizations (e.g., firms, governmental agencies, libraries) and prior research suggests that the usability of mobile applications is particularly important for effective user experiences (Venkatesh and Ramesh, 2006;Adipat et al., 2011). Establishing mobile application usability is difficult because smartphones have relatively small screens and the input mechanisms are tiny  relative to traditional computer keyboards Hong et al., 2004a, b;Dwivedi and Kuljis, 2008). In order to support organizations aiming to develop user-friendly mobile applications, operating system (OS) vendors, including Apple, Microsoft and Google, provide application development guidelines. These guidelines include general advice on how to design user-friendly and well-designed mobile applications. For instance, one of Microsoft's application development guidelines suggests that it is: "important to take full advantage of design principles to ensure that your application's functionality is quickly and clearly conveyed at every step of the user interaction" (Microsoft, 2014). 3 Although this suggests that design Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/ijhcs principles are an important aspect for the usability of mobile applications, it does not provide information on how important design principles are and whether a given application incorporates design principles effectively. We believe that systematically developed research instruments could help researchers and practitioners to better address this issue. In particular, we analyze Microsoft's mobile usability guidelines to develop and validate a rich conceptualization of mobile application usability and associated scales. 4 We extend our previous work (Hoehle and Venkatesh, 2015), which was focused on Apple's guidelines, by (a) developing a conceptualization and measurement of mobile application usability based on Microsoft's mobile usability guidelines and (b) validate our instrument. We expect our work will help practitioners in achieving better mobile application design that helps individuals to more effectively interact with the application.
Although a considerable amount of literature has studied mobile application usability, we found three key shortcomings in the existing literature. First, in much of the literature we found, the concept of mobile application usability evolved from website usability (e.g., Venkatesh and Ramesh, 2006), much like website usability evolved from software usability . Although a useful starting point, we argue that it is best to develop research instruments that account for the unique characteristics of mobile applications, such as small screen sizes and clumsy input mechanisms (Kurniawan, 2008). Second, the majority of studies in the area of human-computer interaction (HCI) have been laboratory experiments to evaluate mobile application usability. Experimental research design is particularly useful for benchmarking competing mobile application prototypes using task-based assessment to measure user performance in terms of speed and accuracy (Lazar et al., 2010). However, such a research design is less suited for holistically evaluating the usability of mobile applications and for dissecting design aspects to improve mobile applications. In other words, experimental research designs help determining if an application prototype allows users to perform a task fast and accurately, but they may not capture a complete picture of interface design elements. Third, prior research has used a variety of conceptually dissimilar constructs for evaluating the usability of mobile applications including readability, ease of learning, design aesthetics and satisfaction Cyr et al., 2006;. Associating the concept of ease of learning and design aesthetics with mobile application usability seems problematic because such a practice could result in interpretational confoundingi.e., if the empirical meaning of a latent variable varies from the meaning assigned by a researcher (Burt, 1976;Bollen, 2007).
Against this backdrop, we argue that it is important to think from the ground up about mobile application usability and develop and validate a survey instrument for assessing the usability of mobile applications. We assess the predictive validity of the survey instrument using two theoretically relevant outcomes: intention to use and brand loyalty. A systematically developed survey instrument should help practitioners in designing mobile applications and to study individuals' views on to-be-developed or existing mobile applications. Well-designed mobile applications should help users to effectively interact and become more satisfied with the application. Likewise, it will be beneficial for research in this area because such a study will provide theoretical clarity on the underlying factors influencing mobile application usability. Thus, our objectives are: (a) to systematically review and analyze Microsoft's mobile application usability guidelines, (b) to develop relevant constructs that represent mobile application usability, and (c) to develop and validate a survey instrument to measure the constructs by following the scale development procedure of Lewis et al. (2005). We validate our survey instrument in the context of social media applications, which are increasingly leveraged for both hedonic as well as professional purposes (Scheepers et al., 2014).

Mobile application usability
Mobile application usability is defined, drawing from the International Standards Organization's (ISO) definition of usability, as the degree to which a mobile application can be used by specified users to achieve specified goals with effectiveness, efficiency, and satisfaction in a specified context of use (Venkatesh and Ramesh, 2006). Over the last decade, the concept of mobile application usability has been the focus of much research in HCI and information systems (IS) and research that falls at the intersection of these two areas, such as mobile commerce and ecommerce. In order to identify theoretically motivated studies, we searched for mobile application usability studies in leading HCI journals-namely, ACM Transactions on Computer-Human Interaction, AIS Transactions on Human-Computer Interaction, Behavior and Information Technology, Human-Computer Interaction, Interacting with Computers, International Journal of Human-Computer Interaction, International Journal of Human-Computer Studies, and Journal of Usability Studies. Because IS researchers commonly investigate HCI-related phenomena (Hong et al., 2004a), we also targeted leading journals in information systems-namely, Communications of the ACM, European Journal of Information Systems, IEEE Transactions on Human-Machine Systems, Information Systems Journal, Information Systems Research, Journal of AIS, Journal of Information Technology, Journal of MIS, Journal of Strategic Information Systems, and MIS Quarterly. Our search strategy included various keywords, such as usability theory, mobile application usability, mobile application usability theory and mobile interface usability. The search yielded 93 peer-reviewed articles. We studied the identified articles for how mobile application usability was conceptualized, the proposed usability evaluation methods, and associated scales used to measure mobile application usability. Based on our assessment, we found three key shortcomings in the literature on mobile application usability.
First, we found that few, if any, field studies used research instruments that were specifically designed to evaluate mobile application usability in the field (Hoehle and Venkatesh, 2015;Kjeldskov and Stage, 2004;Nielsen et al., 2006;Avouris et al., 2008). In our recent work (Hoehle and Venkatesh, 2015), we developed a conceptualization of mobile application usability and an instrument based on Apple's user experience guidelines for mobile applications (Apple, 2011). Here, we add to this work because Microsoft's mobile usability guidelines vary from Apple's guidelines in that they emphasize different aspects of mobile application usability. For example, Microsoft's mobile application usability guidelines underline color and hierarchy, whereas Apple's guidelines include search features as part of mobile application usability. We also found conceptual overlaps in both guidelines. For example, control obviousness is highlighted in both guidelines and we therefore conceptualize it here and in our previous work. The other few field studies we found conceptualized mobile application usability in a more simplistic manner, which is understandable because research on mobile application usability is not as mature as research on website usability. For instance, Huang (2012) note that mobile application usability is a critical success factor for mobile marketing. Mobile application usability in the context of marketing though focused only on whether 4 We also considered examining guidelines from Microsoft's competitors, including Apple, Google, and Blackberry. Due to the extensive detail in each set of guidelines and the length of the paper even as it stands now, we were unable to integrate all guidelines in this paper. Instead, we focused only on Microsoft's mobile application usability guidelines. Further information on the usability guideline selection criteria is provided in the literature review section. the application is easy to use and whether users can effortlessly gather marketing information (Huang, 2012). Hence, most studies on mobile application usability drew on research instruments that were originally developed to evaluate the usability of traditional computers and websites (Battleson et al., 2001;Thong et al., 2002;Wu and Wang, 2005;Hsu et al., 2007;Kim et al., 2010). For example, Venkatesh and Ramesh (2006) surveyed consumers in Finland and the U.S. to better understand the differences between website and mobile application usability. In order to measure mobile application usability, the authors adapted the website usability instrument originally developed by Agarwal and Venkatesh (2002). In such a study, it is likely that important usability requirements of the mobile context were omitted. For instance, much experimental research on mobile application usability found that mobile application buttons should be large and appropriate to the size of fingertips (Kurniawan, 2008). This would be necessary because mobile application users navigate through menus using their fingers for smartphone interfaces (Kurniawan, 2008). Given that traditional computer interfaces including websites are operated through mouse cursors (see Thong et al., 2002), website usability instruments would be unlikely to account for the concept of fingertip-sized controls.
Second, we found that the majority of the studies in HCI journals studied mobile application usability in laboratory environments using experimental research designs. This is in line with Kjeldskov and Graham (2003) who identified that more than 70% of all mobile application usability evaluations take place in laboratory settings. In these studies, mobile application prototypes were typically benchmarked to examine the influence of interface design in relation to specific outcome variables, such as user performance (Ziefle and Bay, 2005;Adipat et al., 2011). For example, Adipat et al. (2011) studied the effect of interface structure of mobile applications on user performance. In this experiment, task complexity and mobile interface structure were manipulated in order to determine the most effective mobile interface structure. Although experimental research designs yield important findings for better understanding mobile application usability, they face several limitations including the artificial nature of the setting and the limited numbers of variables that can be manipulated (Kjeldskov and Graham, 2003).
Third, our literature review found that researchers have defined and conceptualized mobile application usability inconsistently. For instance, Sonderegger et al. (2012) conducted a longitudinal experiment in which they conceptualized mobile application usability as a combination of design aesthetics and readability. Kim et al. (2005) used thirteen mobile application usability elements, namely predictability, learnability, consistency, memorability, familiarity, simplicity, feedback, effectiveness, efficiency, flexibility, minimal memory load, satisfaction and helpfulness. Several studies also integrated concepts commonly seen in the technology acceptance literature (e.g., ease of use) with concepts from the marketing research discipline (e.g., satisfaction), as well as HCI principles (e.g., design aesthetics) (see Cyr et al., 2006). Although some studies suggested that efficiency and effectiveness were part of mobile application usability (e.g., Kim et al., 2005), others argued that both concepts are a result, or outcome, of mobile application usability. Table 1 summarizes measurement approaches and conceptualizations that prior studies have used for evaluating mobile application usability.

Microsoft's usability guidelines
In order to support developers in designing user-friendly mobile applications, Microsoft provides usability guidelines for its mobile operating system. These guidelines are now available through Microsoft's mobile portal (Microsoft, 2014). The guidelines are comprehensive and cover six distinct aspects for designing mobile applications, including development frameworks, platform specific advice, web development, mobile design, tools and resources, and natural languages. Many sections contain technical instructions (e.g., development frameworks) and were thus not relevant to user perceptions of usability. Most relevant to our work was the mobile design section because it focuses exclusively on improving the usability of mobile applications. We found Microsoft's guidelines particularly suited for developing a usability survey instrument for several reasons. First, Microsoft, through its acquisition of Nokia, is one of the leading companies in the smartphone industry and the firm sold approximately 40 million smartphones in 2013 (Gartner Research, 2014). Therefore, we felt it is reasonable to say that Microsoft's guidelines underscore the most critical aspects for designing successful mobile applications. Second, we believe that using Microsoft's guidelines for developing a mobile application usability survey instrument would help us to produce relevant research. Rosemann and Vessey (2008) propose that relevant research should be based on practitioners' recommendations and Microsoft's guidelines provide such an opportunity to develop practitioner-based research (Rosemann and Vessey, 2008).

Instrument development
To design our mobile application usability instrument, we drew on measurement theory. Measurement theory emerged from the reference discipline of psychometrics and it aims to measure human perceptions, behaviors and attitudinal beliefs (Burt, 1976). One particular stream of measurement theory focuses on the development and validation techniques for scientific research instruments to precisely measure attitudinal beliefs (MacKenzie et al., 2011). Although there are alternative approaches to develop instruments, we followed the methodology suggested by Lewis et al. (2005). The proposed methodology consists of three major stages. The first stage includes the conceptualization of the construct domain and includes a content analysis of the constructs of interest. The second stage focuses on the scale development process and relates to a pre-test, pilot test and quantitative assessment of the content validity of the measures. The third stage aims to validate the survey instrument and includes an exploratory and confirmatory assessment of the scales (Lewis et al., 2005). Below, we discuss each stage and outline how we applied them to our research. For each stage, we summarize the recommended activities, followed by a discussion of our actions undertaken as part of the scale development process.

Stage 1: domain
The first stage of the scale development involves establishing the domain of the conceptual idea (Lewis et al., 2005). Lewis et al. (2005) recommended content analysis, which is a technique used to draw inferences from text-based material (Lewis et al., 2005). Content analysis also helps to develop the purpose and/or importance of a conceptual construct and it can be used to develop a conceptual definition of it (Lewis et al., 2005).
We initially used content analysis to examine Microsoft's usability guidelines for mobile applications. In particular, one author systematically reviewed and analyzed the guidelines. To conduct the content analysis, we applied Strauss and Corbin's (1990) open and axial coding procedures. Open coding is the "analytical process through which concepts are identified and their properties and dimensions are discovered in the data" (Strauss and Corbin, 1990, p. 101). Axial coding is the process of "relating categories to their subcategories, termed 'axial' because coding occurs around the axis of a category, linking categories at the level of properties and dimensions" (Strauss and Corbin, 1990, p. 123). We applied these coding procedures to identify conceptually similar themes discussed in Microsoft's guidelines. Initially, one author reviewed Microsoft's guidelines and coded the content using Strauss and Corbin's (1990, p. 119) line-by-line analysis. Next, the open codes were grouped and subcategories were formed to identify conceptually similar codes. Using axial coding, the open codes were examined for similarities or differences and then organized as conceptual units. For example, we identified two open codes that focused on the concept of graphics in mobile applications: (1) images and graphics must enhance and support the user experience and (2) graphics should be designed aesthetically and should not replace or overlap important textual content.
Both open codes were combined into one subcategory that was labeled as "well-designed and aesthetic graphics". Then, using axial coding, the major category was labeled as aesthetic graphics. Next, the results were organized in a matrix as outlined by Miles and Huberman (1994). Organizing codes in a data matrix is useful to compress coded information and it supports drawing conclusions (Miles and Huberman, 1994). Subsequently, a second author reviewed the usability guidelines and associated coding patterns. In a few cases, there was a Cognition support (predictability, learnability, structure principle, consistency, memorability, familiarity), information support (recognition, visibility, simplicity, subsitutivity), interaction support (feedback, error indication, synthesizability, responsiveness), user support (recoverability, flexibility, user control, customizability), and performance support (effectiveness, efficiency, effort) Ji et al. (2006) Cross-sectional usability expert survey Predictability, learnability, consistency, memorability, familiarity, simplicity, feedback, effectiveness, efficiency, flexibility, minimal memory load, satisfaction, and helpfulness  Laboratory experiment Errors Kim et al. (2005) Literature analysis Context, content, community, customization, communication, connection, and commerce Lee and Benbasat (2003) disagreement between the authors. In these instances, we asked two independent judges, who were unfamiliar with the study, to facilitate a discussion in order to reach a coding consensus. Table 2 shows the final matrix derived from Microsoft's guidelines. Next, we used the axial codes, shown on the left hand side of Table 2, as the basis for conceptualizing each construct. To further inform the construct conceptualization, we compared the axial codes with the existing literature on mobile application usability. In all instances, we found literature, whether in the domain of traditional desktop, website usability or mobile application usability, supporting the identified axial codes shown in Table 1. Below, we discuss the outcome of this process and define the constructs we derived.

Aesthetic graphics
Based on early research on the aesthetics of desktop and web applications (Lavie and Tractinsky, 2004) and online shopping environments (Porat and Tractinsky, 2012), more recent studies started to examine the effect of aesthetic graphics on outcomes (e.g., intention to use) in the context of mobile applications (Cyr et al., 2006;Nathan-Roberts and Liu, 2015). A review of the literature shows that researchers conceptualized and measured aesthetic graphics differently. For instance, Li and Yeh (2010) defined aesthetics as "the balance, emotional appeal, or aesthetic of a website and it may be expressed through the elements of colors, shapes, language, music or animation" (p. 674). Sonderegger et al. (2012) conceptualized aesthetic qualities less comprehensively and argued that clearness, symmetry, and color settings are the most critical factors underlying the concept. To measure aesthetic graphics, Sonderegger et al. (2012) asked individuals to rate a mobile application along the identified aesthetic graphics dimensions.
In summary, the literature on mobile applications suggests that aesthetic graphics is an important concept when evaluating the overall mobile application usability. In our work, we define aesthetic graphics as "the extent to which a user perceives that the mobile application makes use of aesthetic graphics." Our conceptualization of aesthetic graphics is consistent with existing definitions and emphasizes that the mobile application user desires aesthetically pleasing designs (Hoehle and Venkatesh, 2015).

Color
Prior work found that color is another important factor to consider when studying mobile application usability (Hartmann, et al. 2008;Sonderegger et al., 2012) because colorful application interfaces produce initial affective user reactions, which could ultimately impact the user's continued intention to use mobile application (Nilsson, 2009;Leung et al., 2011;Dong and Zhong, 2012). Relevant work on other types of applications suggests that color becomes important in providing guidance to users (Brandse and Tomimatsu, 2014) or influencing trust in online shopping contexts (Pelet and Papadopoulou, 2011). Instead of conceptualizing color as an independent construct, mobile application usability studies typically manipulated color as part of experimental studies. For example, Sonderegger et al. (2012) manipulated the color of text and icons as part of a mobile application and asked individuals to rate the color schemes. Color was manipulated in the experiment and the authors produced low, moderate, and highly colorful application designs. For instance, a disharmonious combination of magenta, amber, and green was chosen for the design with moderate aesthetic appeal (Sonderegger et al., 2012). The results showed that highly colorful interface designs yielded the most favorable user reactions (Sonderegger et al., 2012).
In sum, much prior work suggests that color is important for a user's overall evaluation of mobile application usability (Nilsson, 2009;Leung et al., 2011;Dong and Zhong, 2012). Instead of examining specific color combinations and attributes, we treat color as an independent construct. Specifically, we focus on whether the use of color is appropriate from the user's perspective. We closely followed Microsoft's guidelines and did not identify specific color combinations or attributes because the main purpose of our conceptualization is to be able to examine usability across different types or contexts of mobile applications. In our work, we define color as "the degree to which a user perceives that the mobile application uses colors effectively." Thus, we extend existing work in this area because the color scope is not limited to aesthetical perceptions.

Control obviousness
The existing literature on mobile application usability suggests that the application's controls should be immediately obvious to application users (Ji et al., 2006). The importance of control obviousness for mobile applications is consistent with research on traditional web applications, which also emphasizes ease of searching and executing shopping tasks within online stores with minimal effort (Nah and Davis, 2002;Porat and Tractinsky, 2012). However, work on HCI emphasized that mobile applications' functionalities should be obvious to users because they are displayed on small screens (Huang et al., 2006). In these studies, researchers typically exposed research participants to mobile applications, asked them to execute predefined tasks using the mobile application, and surveyed participants afterwards. For instance, Ji et al. (2006) surveyed mobile application developers and usability experts and asked them if executing controls would be consistent and clear. Examples of predefined tasks included the examination of confirmation, input, termination, cancel, and search tasks (Ji et al., 2006).  added that the location of soft keys is most critical for the controls obviousness as part of mobile applications. In summary, mobile application's controls and buttons should make it easy for users to pick the desired functions (Hoehle and Venkatesh, 2015).
We define control obviousness as "the degree to which a user perceives that the mobile application deploys controls that are immediately obvious." Our measurement captures the extent to which the main function is apparent and whether the application makes use of commands and controls that are intuitive and obvious. Our conceptualization is consistent with existing studies on mobile application usability in that it emphasizes the easiness of finding and executing controls (see Ji et al., 2006).

Entry point
The concept of entry point focuses on a user's ability to access a given mobile application via several alternative entry points. Our literature review suggests that there is a lack of discussion on usability issues pertaining to entry points to mobile applications. Our conceptualization of entry point follows Microsoft's guidelines closely and is different from the W3C's view of accessibility, which emphasizes making the applications more acceptable to people with cognitive or physical disabilities (W3C, 2015). Although we were unable to find studies that conceptualized and measured entry point in the context of mobile applications, Benbunan-Fich and Benbunan (2007) found that smartphone users become frustrated if they are unable to find an application on the mobile phone after downloading it. These findings overlap with the concept of entry point because they emphasize that it is critical that users can enter mobile applications easily (Benbunan-Fich and Benbunan, 2007).
We focus on accessing the mobile application from an interface design perspective and define entry point as "the degree to which a Table 2 Coding matrix adapted from Miles and Huberman (1994).

Axial codes Subcategory Open codes derived from Microsoft's guidelines
Aesthetic graphics Well-designed and aesthetical graphics Images and graphics must enhance and support the user experience. Graphics should be designed aesthetically and should not replace or overlap important textual content.

Color Contrast and color
The text of applications should have a good contrast with the background. Color assists in the organization and grouping of information, helping to focus attention, convey differentiation, and establish relationships and visual hierarchies between elements.
Color can help readers scan information and quickly identify structural or functional elements, such as headers, menu items and hyperlinks.
When used incorrectly, however, color can easily distract attention from the task at hand. If a color is being used to convey a specific meaning (for example, red to warn of danger or an error), chosen colors should be universally associated with the intended meaning and potential conflicts that result from cultural misinterpretation should be avoided.

Control obviousness Consistent use of controls
User controls should be obvious and interaction should be familiar, clear, and trustworthy. The design should be consistent, logical, and coherent both within the application and within the target platform.
Controls and application features should be used consistently.

Entry point Application accessibility and application entrance points
Users should have several options to choose from if aiming to access an application. The application should be designed in a way that it is accessible via direct controls or application menus, or a combination of both.
Well-designed applications should have several points to access a menu or an application.

Fingertip-size controls Button size and control size
Interface elements should not be smaller than the smallest average finger pad, that is, no smaller than 1 cm (0.4") in diameter or a 1 cm Â 1 cm square.
The width of a finger limits the density of items on screen. If the items are too close, the user will not be able to choose a single one.
As the user is more likely to touch higher on the button by mistake than on either side, consider the height of your buttons and icons.
Essential information or features, such as a label, instructions, or sub-controls should be placed below an interface element that can be touched, as it may be hidden by the user's own body.

Font Font style
Font is an important consideration for designing applications because users appreciate well-chosen font styles.
Devices normally have one standard font style, which should be used as the application's default typeface.

Gestalt Gestalt principles and proximity of interface elements
Information and content should be organized in accordance with the Gestalt principles. Each part of the application is affected by what surrounds it. Users should be able to quickly make sense of the elements on-screen and understand what functionality or data they represent.
Elements that are close together are naturally perceived as being related.
Because of the small screen size, however, the use of proximity may be limited.

Hierarchy
Hierarchical menu structure and application navigation Drill-down views offer hierarchical navigation for applications that need to provide access to hierarchies of information.
The layout of the various views in the navigation chain is not restricted to lists, and should be optimized for the type of content and/or functionality.
In all cases, users navigate hierarchies in drill-down views by tapping items in a view to 'drill down' another level in the information hierarchy.
Users should also be able to move back toward the top or 'root' of the hierarchy and back commands should be also available.
Tapping 'back' at any level takes the user up to the previous level in the hierarchy.

Subtle animation Animation use and simplicity of animated content
Animations should be kept simple. Avoid complex animations and, in particular, multiple simultaneous timeline-based motion animations.
Avoid unnecessary alpha effects or gradients and do not combine transitions with changes in transparency or other graphical effects because they are likely to slow down the animations.

Transition Transition and flow of user interface elements
Well-designed transitions help users and make the user interface more engaging. Without transitions, the interaction feels less natural. Transitions can be used to inform the users of what is going on. Transitions should be used wisely and it is useful to test how users feel about them. Transitions can easily create a WOW-factor to applications. If every user interface element is twitching and turning wildly, it could as easily exhaust the user.
user perceives that the mobile application can be accessed through alternative entry points." This view is consistent with Benbunan-Fich and Benbunan's (2007) findings in that it emphasizes making mobile applications as accessible as possible for users. Due to the fact that Benbunan-Fich and Benbunan (2007) exclusively focused on the accessibility of newly downloaded applications, we extend this notion and study whether or not an application can be accessed using different icons and menu access points.
3.1.5. Fingertip-size controls Due to the hardware limitations of smartphones (Romano et al., 2014), such as limited screen size and relatively small keyboards, mobile applications developers should consider the size of buttons (Brewster, 2002). For instance, Kurniawan (2008) studied the effect of control size on application usability and surveyed elderly mobile application users. The study found that relatively large, i.e., fingertip-size controls, helped users to select functions and menus in mobile applications (Kurniawan, 2008).
In line with Kurniawan's (2008) work, we use the term fingertip-size controls as suggested in Microsoft's guidelines. Consistent with other studies, we define fingertip-size controls as "the degree to which a user perceives that the mobile application deploys fingertip-size controls." This definition captures the extent to which the users can tap on controls easily, which requires an appropriate size of controls.

Font
Font is another relevant design element that has been studied from different perspectives in the context of mobile application usability and traditional desktop applications (Bernard et al., 2003;Ling and Schaik, 2006). For example, several variations of font, such as style (e.g., Arial versus Times) and size, have been studied in relation to readability of application content (see Ling and Schaik, 2006;Moshagen and Thielsch, 2010). Kim et al. (2005) suggested that font size is a critical part of mobile application usability because it influences how efficiently the information is shown, how easy it is to read the presented information, and how effectively the information is presented to users (Kim et al., 2005).
We define font as "the degree to which a user perceives that the mobile application uses font effectively." Our definition covers not only font size but also the extent to which font is perceived as good and appealing.

Gestalt
Gestalt theory has been utilized to study aspects of mobile application usability (see for example Paay and Kjeldskov, 2007) as they have proven to be highly effective in the context of traditional desktop and website applications (Moller et al., 2012). Gestalt theory includes several gestalt laws (e.g., proximity of objects) and explains how humans perceive objects in their environment and how they form such perceptions (Wertheimer and Riezler, 1944). For instance, Paay and Kjeldskov (2007) applied gestalt theory to study location-based mobile applications and they identified five relevant gestalt laws in this context, namely proximity, closure, symmetry, continuity, and similarity. For each law, Paay and Kjeldskov (2007) developed 2-3 survey questions that developers could use to examine adherence to the identified gestalt rules. The results of a qualitative study showed that the identified gestalt rules helped in explaining how users perceive and make sense of mobile location-based services (Paay and Kjeldskov, 2007).
Microsoft's guidelines also emphasize gestalt principles and apply the laws of similarity and proximity to mobile application usability. Hence, our conceptualization of gestalt is consistent with these two laws based on gestalt theory. We define gestalt as "the degree to which a user perceives that the mobile application uses gestalt principles effectively."

Hierarchy
Application hierarchy is another relevant concept for the organization of application content and elements. In traditional website applications, hierarchy has been emphasized as an important design aspect that embeds structure (Agarwal and Venkatesh, 2002), which makes it easier for users to perceive the overall organization of the website. For example, Adipat et al. (2011) suggest that mobile sites should have a hierarchal structure because it informs mobile users about the inherent logic of the site. Integration of titles and sub-titles would indicate several hierarchal levels and help users in navigating the mobile application easily (Adipat et al., 2011). Further, Kim et al. (2005) emphasized the concept of hierarchy as part of mobile applications and suggested categorizing, labeling and sequencing application menus in order to help users during navigation. Most recently, Hoehle and Venkatesh (2015) noted that users should be able to perceive an effective structure in the mobile application interface.
We define hierarchy as "the degree to which a user perceives that the mobile application has a hierarchical structure." The term is consistent with existing views of mobile application literature in that it identifies application hierarchy as a critical part of mobile application usability (Kim et al., 2005).

Subtle animation
Research on traditional website applications emphasized the need for using media appropriately and to effectively communicate the content (Agarwal and Venkatesh, 2002). Animation and media use are also important design aspects of mobile application interfaces. For example, Venkatesh and Ramesh (2006) examined media use as a subcategory of application content and the concept captured the extent to which media is used appropriately and effectively to communicate content (Venkatesh and Ramesh, 2006). Although media use in wireless contexts was deemed less important than media use in web contexts (Venkatesh and Ramesh, 2006), mobile users still need to perceive media use as appropriate. From a design and user perspective, the use of richer and complex graphics and animations, for instance, does not guarantee successful communication of the content, because such use might be perceived as distracting (Mayer, 2001). Based on this assumption, we use the term subtle animation to capture the preciseness of media use and define it as "the degree to which a user perceives that the mobile application uses subtle animations effectively." This term captures the user's perception of appropriateness of animation use and content communication effectiveness.

Transition
It is important that mobile applications are designed in a way that they help users transitioning from one page to another (Adipat et al., 2011). The importance of simple transitioning within mobile applications is consistent with the need for easy navigation in the context of traditional website applications (Nielsen, 2000;Porat and Tractinsky, 2012). Transitioning from a page to another without problems also captures efficiency, as in traditional usability studies (Nielsen, 2000), because it reduces the time to navigate within an application. In the context of mobile applications, Lee et al. (2009) emphasized that users should be able to effectively navigate between screens because this would improve their overall perception of system quality. Likewise, Benbunan-Fich and Benbunan (2007) proposed that navigation problems could be identified by measuring how users move between pages and how users access specific information within each application page.
We define transition as "the degree to which a user perceives that the mobile application transitions from one page to another." Application transition needs to be smooth so that the user can easily determine his/her position while navigating through the application.
Next, the identified constructs were reviewed for conceptual similarities. This is an important step for identifying higher-order constructs (Lewis et al., 2005). Through iterations following discussions between the authors and our literature review described above, we converged on ten independent constructs forming mobile application usability. Table 3 provides a summary of construct definitions based on the content analysis and literature review.

Stage 2: instrument construction
The second stage of the construct development methodology focuses on the survey instrument development and involves three distinct phases (Lewis et al., 2005). In the first phase, researchers develop items for the identified constructs and pre-test the scales (Lewis et al., 2005). In the second phase, a pilot study should be conducted in order to purify the wording of the items and to obtain initial feedback on the survey instrument (Lewis et al., 2005). The third phase involves screening the items and assessing the scales for content validity (Lewis et al., 2005). Content validity is the extent to which a scale represents all facets of a given construct (Anderson and Gerbing, 1991;Hinkin and Tracey, 1999;Lewis et al., 2005;Lawshe, 1975).
We drew on the codes derived from Microsoft's usability guidelines in order to develop a pool of items. Particularly, the open codes listed in Table 2 were helpful during this stage and we also leveraged existing literature that previously measured usability. We created 4-6 items for each construct to assure a reliable measurement of the conceptual domain. This led to an initial pool of 58 items. Next, we conducted a pre-test of the survey instrument and asked six Australian University staff members to complete a paper-based survey containing the newly developed items. Three administrative staff members, two PhD students and one Masters student completed the survey. Before asking the participants to pre-test our survey, we asked them if they owned a smartphone and had experience with mobile applications. We felt this was necessary in order to avoid confusion about the questions asked in the survey. All items were randomized and we included feedback fields within the survey. We asked all respondents to flag unclear items or sections of the survey instrument that they viewed as confusing or vague. Out of the 58 items, 42 were identified as clear and none of the participants suggested altering these questions. For 4 of the 58 items, the participants proposed minor changes. We modified these items in accordance with the obtained feedback and kept 46 questions in the item pool. All items that the respondents flagged as unclear were excluded from the item pool. The next step of the survey instrument development included a pilot test of the survey instrument. Lewis et al. (2005) recommended that participants for the pilot test should come from the main population of interest. Thus, we collected 30 responses from German consumers recruited by a market research firm. The firm invited potential respondents to complete the survey online and participation was encouraged via small monetary incentives. The respondents were provided with instructions and the survey was available to them in German, e.g., the items were translated and back-translated by bilingual professionals to ensure crosslanguage equivalence in meaning. This procedure is common in cross-cultural research (see Zhang et al., 2007). The respondents' demographics are shown in Table A1. As can be noted from the descriptive statistics in Table A1, the respondents interacted with different types of social media applications. We did not find any significant differences in the constructs based on the type of social media application. Regarding the pre-test, respondents were provided with opportunities to give feedback on the survey structure and items. The results suggested that the survey instructions were clear and we obtained positive feedback from most respondents. Out of the 46 newly developed items, 12 questions were flagged as worded vaguely by these respondents. These items were excluded from the item pool. This led to 3-4 items for each construct identified in stage 1 of the instrument development process. In order to have at least 4 items per construct, we modified the wording for some of the flagged items and opted to have 4 items for each construct. This led to 40 items based on the pilot study. Table 4 lists the items.
We next evaluated the content validity of the new scales, which can be done using multiple approaches. Lewis et al. (2005) recommended using Lawshe's (1975) content validity ratio that requires subject matter experts to judge how essential each item is in relation to a given construct. We sought to pursue this approach but after many experts declined our request due to lack of time and others repeatedly rescheduling and failing to complete the requested assignment, we turned to the literature for an alternative. Anderson and Gerbing (1991) proposed an alternative approach to assess the content validity of newly developed scales. This approach works on the assumption that each item represents only one construct (Anderson and Gerbing, 1991;Yao et al., 2007). This procedure includes the use of a matrix in which construct definitions are listed on top of the columns and items are placed in the rows. Individuals can be asked to select the most appropriate item-to-construct combination and raters are not required to be experts in the field of study (Anderson and Gerbing, 1991). We followed the procedure outlined by Anderson and Gerbing (1991) and developed four matrices in which we organized our construct definitions in rows and listed them on top of the columns. The items were listed in the columns and we hired the same market research firm that was employed to conduct the pilot study. The firm invited potential respondents via email. The invited individuals were asked to complete the survey online and participation was encouraged via small monetary incentives provided by the market research firm. In total, 318 U.S. consumers who were familiar with mobile applications evaluated how well our items fit with our construct definitions. Table A1 includes the demographic information on the research participants. Next, we computed the proportion of substantive agreement (PSA) and substantive validity coefficients (CSV) as explained by Anderson and Gerbing (1991). ,5 These values can range between 0 and 1 where higher Table 3 Construct definitions based on the content analysis and literature review.

Construct name Construct definition
The degree to which a user perceives that the mobile application… Aesthetic graphics ……makes use of aesthetic graphics. Color ……uses colors effectively. Control obviousness ……deploys controls that are immediately obvious. Entry point ……can be accessed through alternative entry points. Fingertip-size controls ……deploys fingertip-size controls. Font ……uses font effectively. Gestalt ……uses gestalt principles effectively. Hierarchy ……has a hierarchical structure. Subtle animation ……uses subtle animations effectively. Transition ……transitions from one page to another. 5 We used the equation proposed by Anderson and Gerbing (1991) to calculate the CSV, which is equal to the difference between the number of panelists judging an item to be essential and the highest number of assignments of the item to any other construct in the set divided by the total number of panelists. We also used the equation proposed by Anderson and Gerbing (1991) to calculate the PSA, which is values indicate a high degree of content validity and low values indicate that the item does not overlap with the intended construct definition. Yao et al. (2007) suggested 0.25 as a cut-off point for PSA and CSV values. Table 5 shows that the content validity ratios obtained were high, thus indicating that most respondents sorted the majority of items into the posited construct definitions. Out of 40 items, only HIER1 was lower than the recommended threshold of 0.25. Hence, we re-worded the item in order to align it better with the construct domain.

Stage 3: evaluation of measurement properties
The third stage of the instrument development process focuses on evaluating the measurement properties of the new scales. Lewis et al. (2005) recommended using two independent samples that are relevant to the population of interest. Exploratory factor analysis (EFA) should be used to discover the factor structure in the first sample. Then, using the second sample, confirmatory The mobile application tells the user when switching from one screen to another. 0.72 0.55 TRAN3 The mobile application moves from one screen to another without any problems. 0.81 0.72 TRAN4 The mobile application switches from one screen to the next smoothly. 0.81 0.76 a We listed the original item for HIER1 used for the content validity check as well as the modified item. The modified item was used during stage 3 of this study.
(footnote continued) equal to the number of respondents assigning a measure to its posited construct divided by the total number of respondents.
factor analysis (CFA) should be used to validate the scale properties (Lewis et al., 2005). In the confirmatory phase, researchers should also assess the nomological network of the scales by testing if the constructs of interest predict theoretically relevant dependent variables.

Exploratory study
Following Lewis et al. (2005), we initially collected a sample of German consumers using mobile applications. Germany is a large European economy where mobile smartphones are widely used. Germany also has a very high population density and mobile networks cover most areas of the country. Forrester Research suggests that around 40% of all German consumers use mobile data services and access Internet-based mobile applications on their smartphones (Savvas, 2010).
Due to this, we felt surveying German consumers regarding their perceptions toward the usability of mobile applications would be particularly interesting for companies developing and distributing mobile applications in Europe. As with the pilot study, we executed the data collection for the exploratory phase through a market research firm that recruited German consumers. We used the instructions developed for the pre-test and pilot study. All items listed in Table 4 were measured using a 7-point Likert-type scale (1 ¼ strongly disagree…7 ¼strongly agree) and we tailored the questions toward mobile social media applications, such as Facebook. Tailoring questions to the context of a particular study is a well-accepted practice in IS research (Venkatesh et al., 2003;Venkatesh and Ramesh, 2006). At the start of the survey, we provided a list of the most common social media applications, including Facebook, LinkedIn, Twitter, My Space and Google þ . Depending on a respondent's choice, we programmed the survey to carry over the response individuals provided at the beginning of the survey. This way, the items were displayed as "Facebook (mobile) uses beautiful artwork" instead of "The mobile social media application uses beautiful artwork." For the exploratory study, we collected data from 464 actual consumers. We hired a different market firm from the one used for the pilot study to ensure that the respondents came from a different respondent pool. As with the pilot study, the instructions and survey was available to the respondents in German. We hired professionals to translate and back-translate the items to ensure cross-language equivalence in meaning. Initially, all responses were checked for the time respondents took to complete the survey. Respondents who took too little time and/or did not correctly answer reverse-coded filler items were excluded from our sample. This led to 404 usable responses. We also tested for nonresponse bias and the data showed no significant differences in terms of demographic characteristics between the respondents and non-respondents. Further, we examined how well the profile of our respondents corresponded to the profile of the sampling frame provided by the market research firm. The results confirmed that the respondents' characteristics of our sample matched the sampling frame provided by the market research firm. We did not see a need to compare early versus late responses because all responses were collected during a single weekend and no reminders were employed (Churchill, 1979;Hair et al., 1998). Table A1 summarizes the respondent demographics.
Next, we conducted an exploratory factor analysis (EFA) with direct oblimin rotation to allow for correlated factors. It is recommended that the item-to-response ratio be in the range from 1:3 to 1:8 (Hair et al., 1998). Given that our scales included 40 newly developed items, our ratio of items to responses seemed adequate for exploratory analysis. The results of the EFA confirmed a solution with ten factors, each with eigenvalues greater than 1.0. As shown in Table 5, the items explained a reasonable amount of covariance in the associated constructs ranging from 14.6% to 32.3%. All item loadings were greater than .70. Given these results, we felt that dropping items was unnecessary. We inspected the reliability of the items by computing Cronbach's α coefficients for all scales-all of which were above .77 or greater and thus higher than the recommended threshold of .70 (Fornell and Larcker, 1981).

Confirmatory study
Following the procedure used for the exploratory study, we collected a new sample for the confirmatory phase of this research. As part of the confirmatory assessment of the survey instrument development process, Lewis et al. (2005) recommended evaluating the nomological network of the scales. To do this, theoretically related variables should be included in the survey to test the predictive validity of a given construct of interest. Therefore, based on existing information sciences and mobile application research, we included items for two dependent variables, namely continued intention to use and brand loyalty. Intention in particular is a critical indicator of success of newly implemented information technologies and associated services (Hu et al., 2010(Hu et al., , 2005Hu et al., 2009). Prior research suggests aesthetic and colorful graphics positively influence consumers' continued intention to use and brand loyalty toward mobile applications (Scornavacca et al., 2006). Similarly, obvious controls, multiple entry points and fingertip-size controls of mobile applications will have a positive effect on consumers' continued intention to use as well as their brand loyalty toward mobile applications (Barnes, 2002(Barnes, , 2003Barnes and Huff, 2003;Kurniawan, 2008). Research also proposes that the type of font, hierarchical structure and gestalt principles (e.g., similar application components are grouped together) positively influence users' continued intention to use and brand loyalty toward mobile applications (Barnes, 2003). Likewise, research found that a user's continued intention to use and brand loyalty toward a mobile application is positively influenced if menus follow a clear hierarchy, pages flow smoothly from one page to another, incorporate subtle animations (Scornavacca et al., 2006). The scales used to measure continued intention to use and brand loyalty constructs were adapted from prior research (Johnson et al., 2006;Venkatesh and Goyal, 2010). Table A2 lists the items used to measure the outcome variables. As shown in Fig. 1, mobile application usability is conceptualized using 10 unique constructs identified based on Microsoft's guidelines.
Similar to the exploratory study, we collected data from a new sample consisting of 550 German consumers using mobile social media applications. We hired the same market research firm we used for the exploratory study. Care was taken not to invite respondents who participated in the exploratory study. Following the steps undertaken for the exploratory study, we excluded problematic responses (e.g., those who spent too little time on the survey and responded incorrectly to reverse-coded items). In total, we received 501 usable responses. Table A1 shows the demographic information of the respondents. Similar to the exploratory study, the instructions and survey were professionally translated and were made available to the respondents in German. As with the exploratory study, we did not find any significant differences in terms of demographic characteristics between the respondents and non-respondents. Our sampling frame also matched the sampling frame provided by the market research firm. Like in the exploratory study, we felt that it was unnecessary to compare early versus late responses because all responses were collected during a single weekend and no reminders were employed (Churchill, 1979;Hair et al., 1998).
As recommended by Lewis et al. (2005), we also carefully assessed our sample for the shape of the distribution and checked for skewness and kurtosis before starting the data analysis, and found no significant issues. Next, following Lewis et al. (2005), we used confirmatory factor analysis (CFA) to evaluate the psychometric properties of the scales. AMOS was used to assess all factors separately, then in pairs and then as a collective network as outlined by Lewis et al. (2005). We then examined the construct Fig. 1. Structural model. validity of the scales (see Lewis and Byrd, 2003). The results are shown in Table 6. All items loaded highly on the intended construct, with item-to-construct loadings between .71 and .87, thus supporting convergent validity.
Following Lewis et al. (2005), we next examined if the correlations between pairs of factors were significantly different from unity. Such results would suggest discriminant validity between the pair of factors (Lewis and Byrd, 2003;Lewis et al., 2005). Table 7 shows the results for the pairwise tests among the mobile application usability factors.
The significant χ 2 tests confirmed discriminant validity.
We next used our sample to assess the fit of the measurement model. Following Lewis et al. (2005), we initially examined the factor-centric fit indexes. This step is useful for determining the extent to which the set of items assessing a given factor defines the latent trait of the factor under investigation (Lewis et al., 2005). Overall, the goodness of fit indexes were well in line with the cutoff values recommended by Hair et al. (1998). Table 8 shows the factor-centric fit indexes.
We continued our analysis by determining the model fit indexes of the overall model. The results are shown in Table 9. Overall, the goodness of fit indexes were well in line with the cutoff values recommended by Hair et al. (1998), thus supporting the validity of our model.
To further evaluate the psychometric properties of the scales, we examined the Cronbach's αs, AVEs and inter-construct correlations. Table 9 shows that the AVEs were all above .70, which is the recommended threshold (Straub et al., 2004). The results also confirmed that the AVEs for each construct exceeded the squared correlation of the construct with other constructs (Fornell and Larcker, 1981), thus providing further evidence of discriminant validity. Table 10 also shows that the reliabilities, assessed using Cronbach's αs, for all scales were above the threshold of .70.
Next, we examined the structural model results, which are shown in Table 11. The 10 usability constructs explained 21% of variance in continued intention to use. The R 2 was slightly higher for mobile application loyalty (25%). Seven paths between the mobile application usability constructs and continued intention to use were significant, with gestalt (.21), followed by fingertip-size controls (.17), followed by subtle animation (.16) being the strongest determinants. Six mobile application usability constructs were significant predictors of brand loyalty, with gestalt, fingertipsize controls and control obviousness being the strongest.

Discussion
Due to the widespread diffusion of mobile technologies, more and more organizations are seeking to incorporate mobile presences into their existing channel strategies. Although mobile vendors, such as Microsoft, Apple and Google, provide general guidance for developing mobile applications, we could not identify any scientific instruments that help practitioners to accurately measure the overall usability of mobile applications. Against this backdrop, we developed a rich conceptualization and psychometrically sound instrument for mobile application usability. Following the methodology for construct development suggested by Lewis et al. (2005), we initially analyzed Microsoft's development guidelines to conceptualize mobile application usability. Based on a content analysis of the guidelines, we conceptualized mobile application usability via ten unique constructs. Next, we developed, pre-tested and pilot tested the new scales. Then, we quantitatively assessed the content validity of the newly created items. To validate our instrument, we collected two independent samples consisting of German consumers who use mobile social media applications on their smartphones. We initially used exploratory factor analysis to identify the factor structure in the first sample. Subsequently, using confirmatory factor analysis, we validated the findings of the exploratory study and tested the generalizability of our mobile application usability scales. The findings confirmed that our conceptualization and associated instrument were strong predictors of consumers' continued intention to use and brand loyalty toward mobile social media applications. Our research has several implications for research and practitioners.

Research implications
First, our comprehensive conceptualization of mobile application usability contributes to HCI literature studying the design and use of information technology at the individual level. Much HCIrelated work drew on well-established IS acceptance theories, such as TAM (Davis et al., 1989), IS success model (Delone and McLean, 1992) and UTAUT (Venkatesh et al., 2003) to predict why and how individuals interact with IS artifacts. Due to their parsimony, these theories do not aim to specifically explain why individuals find a specific application interface as easy to use. From an HCI perspective, our research provides a more detailed view on why individuals continue to use mobile applications because we conceptualized 10 unique mobile application usability constructs. We found that 7 constructs contributed significantly to explaining continued intention to use mobile applications. We believe that our fine-grained and context-specific instrument helps HCI researchers to accurately predict mobile application usability and provide a better understanding of why individuals continue to use mobile application interfaces for obtaining information that is of value to them.
Second, much prior HCI literature has studied mobile application usability in laboratory environments using experimental research designs, which have the limitations of artificial settings and limited numbers of variables that can be manipulated (Kjeldskov and Graham, 2003). Our study complements such work by providing a validated research instrument for comprehensively assessing the usability of mobile applications using a survey method. This should be helpful for future HCI studies researching real world mobile applications in the field. For instance, if researchers are interested in studying particular design aspects of mobile application usability, such as user interface output and input, they can draw on our instrument and leverage relevant constructs separately. If the goal is to study mobile application usability holistically (e.g., if studying the usability of existing mobile applications), they could use our entire instrument. Our instrument should help future studies aiming to evaluate user experiences with mobile applications even in laboratory environments to understand participants' reactions to mobile application prototypes. By leveraging our scale, studies evaluating prototypes will be able to effectively identify the most effective prototypes and discover avenues for improving them. Also, design science researchers focusing on developing effective mobile application interfaces could use our instrument to evaluate usability as part of the artifact development.
Third, our study is among the first studies that developed usability scales tailored to the interactions with mobile applications. When evaluating mobile application usability, we found that it is important  to identify factors that are unique to mobile applications. For example, our findings suggest that gestalt principles are critical in a mobile context because design elements that are logically ordered and organized help users to navigate mobile applications on small screens. Although it may be that gestalt principles are also important for websites, our study supports that it is one of the most important aspects of mobile application usability. Likewise, fingertip-size controls was found to be a significant part of the overall usability of mobile applications. This aspect of mobile application usability is less likely to be relevant in the website context due to the fact that most user interfaces on stationary computers (e.g., in libraries) are operated via mouse movements. Similarly, animations should be designed particularly subtle in context of mobile applications. One reasonable explanation for this is the limited screen size on which mobile applications are displayed. When using animations extensively as part of a mobile application, users might feel overwhelmed and the animations may distract the user. We also found that some usability elements were less important than expected. For instance, color neither significantly influenced individuals continued intention to use nor their brand loyalty. Fourth, this study contributes to measurement theory (Straub et al., 2004;MacKenzie et al., 2011). Lewis et al. (2005) suggested a comprehensive methodology for developing survey instruments. To the best of our knowledge, the current study is among the first studies that closely followed Lewis et al. (2005) to develop a survey instrument. We applied the recommendations and did not encounter major issues by following Lewis et al. (2005). In a few instances, we deviated from their recommendations for practical reasons. For example, for the content validity check, we initially followed Lawshe's (1975) recommendations and invited industry experts to judge the content validity of our scales. Few subjectmatter experts indicated that they were available to participate in our study due to time constraints. Thus, we decided to employ Anderson and Gerbing's (1991) approach because the method proposes that judges are not required to be subject-matter experts. It is also important to note that during all stages of the instrument development procedure, we asked research participants to provide feedback and this was generally positive.
Fifth, we felt that Microsoft's guidelines helped us to provide a relevant contribution to practitioners aiming to mobilize the workforce and/or business operations. Rosemann and Vessey (2008) suggest that developing relevant research is "not necessarily based in theory, [but] involves examining a practical intervention using a wellestablished, rigorous research approach" (p. 7). Our study followed this recommendation and employed Microsoft's user experiences guidelines. These guidelines were developed by practitioners, and we rigorously developed the constructs and an associated survey instrument to represent mobile application usability.
Finally, the present work is expected to serve as a critical starting point for future scientific investigations of mobile application usability as the wireless technology revolution continues to grow. Specifically, our study could guide future research through two avenues. First, the developed constructs could be leveraged to assess usability of applications based on their purpose. For example, the relative importance of the constructs might be assessed for hedonic versus utilitarian applications, such as games versus office or productivity apps. Second, our instrument could be used to understand organizational phenomena, such as job stress and job satisfaction (Morris and Venkatesh, 2010;Sykes 2015). With an emphasis on organizational phenomenon in the context of utilitarian mobile applications (Nah et al., 2005), future research could also examine the influence of important individual differences, such as age (Morris and Venkatesh, 2000) and physical or cognitive disabilities on mobile application usability.
Further, our work examined the applicability of the mobile application instrument in one country (i.e., Germany) but future studies should test the generalizability of our scales in different countries (see Venkatesh and Ramesh, 2006). With the extensive diffusion of smartphones in European, Asian and North American markets, theoretically motivated studies would serve an important   Venkatesh and Ramesh, 2006) and Apple experience guidelines (Apple, 2011)-with what we have developed and validated here. As discussed earlier, in briefly examining the various guidelines, we find that each of them provides some unique recommendations. For instance, only Microsoft recommends Gestalt principles. Interestingly, our findings confirmed that the Gestalt construct was most influential on continued intention to use and brand loyalty. In contrast, aesthetic graphics are emphasized in Apple's and Microsoft's guidelines. Aesthetic graphics were also found to be important for continued intention to use but not for brand loyalty. Although we believe that it is beyond the scope of the current paper, future studies should analyze the guidelines offered by Microsoft's competitors in order to explore additional factors that are relevant to mobile application usability.

Practical implications
Enterprise mobility fundamentally changes the IT landscape of enterprises. Our study should help organizations to integrate mobile applications into their day-to-day business activities including supply chain management, sales force automation and field force automation as mobile technologies have become a central component of organizational IT infrastructures. Further, the instrument should be useful for understanding the usability of enterprise software from the perspective of platform providers (e.g., Kude et al., 2012;Tiwana et al., 2010). Specifically, our conceptualization of mobile application usability is helpful for designing mobile information systems that are no longer bounded by fixed organizational systems. Despite the fact that our conceptualization and associated scales of mobile application usability are contextualized to mobile social media applications, we believe that companies could leverage our scales in an organizational context to explore the meaning and implications of organizational mobility. Specifically, our implications are of value for different application areas. Those include, but are not limited to requirement engineering for mobile applications as well as the development and maintenance of mobile applications in consumer and organizational contexts.
Requirement engineering encompasses tasks that developers perform to determine the needs or conditions for an application to be developed (Sonderegger et al., 2012). As part of this process, application developers often leverage interviews or focus groups in order to determine the desired features of a given mobile application. During this phase, our survey instrument could be used to develop interview protocols featuring open-ended questions, especially with usercentered approaches to mobile applications design (Kangas and Kinnunen, 2005). An example of a non-structured question presented to the participants could be: "How important do you consider aesthetic graphics as part of mobile applications?" This question was adapted from the items we developed for aesthetic graphics and application designers could derive similar questions for the remaining usability constructs.
During the development of mobile applications, firms typically use a variety of software application methods, such as agile methods (see Strode, 2006). One particular form of agile methods is the scrum methodology. Scrum is a methodology that emphasizes iterative software development and it provides just enough rules for teams to be able to focus on innovation (Strode, 2006). For example, at the initial phases of a scrum project, application owners determine the scope of what needs to be built in a given timeframe. Once the development team has built the software, the outcome is demonstrated to the application owner and subsequent steps can be determined (Strode, 2006). Our coding matrix could be particularly useful for such situations because it could help the involved parties in discussing the mobile application progress including the usability of the completed software components. For example, the codes derived for the subtle animation construct emphasize that the mobile applications use subtle animations effectively and avoid complex animations. The mobile application owner could test the completed software components and decide whether the animations are subtle and not overly complex.
Extreme programing (XP) is another form of agile methods and it is typically used in high change environments using small teams (see Strode, 2006;Kude et al., 2012;Stuckenberg et al., 2014). Extreme programming empowers developers to respond to changing customer requirements, even in late application development stages. Another unique characteristic of XP is that developers constantly communicate with their customers and fellow programmers in order to get feedback regarding the developed software. To facilitate the process of obtaining customer feedback, XP teams could employ our scales to survey customers regarding their perceptions toward the usability of the developed application. Based on the feedback obtained, applications could be modified. For instance, if customers dislike the color scheme of the mobile application, the XP programmers could modify the color scheme and ask customers to re-evaluate the new design.
With the demonstrated predictive validity of the instrument, organizations will also have a useful tool to maintain and monitor the performance of their newly developed or existing mobile applications. The ubiquity of mobile technology offers opportunities for organizations to reach and maintain relationships with customers. To accrue such opportunities, organizations need to carefully capture customers' perceptions of mobile application usability, as mobile applications are increasingly utilized in creating and delivering products. For instance, in the healthcare industry, more mobile applications are introduced to offer home healthcare, hospice, and personal care services. For example, there are mobile applications that match blood donors with those who need it (Ramya, 2013), applications that help users manage their nutrition intake (Philippine Daily Inquirer, 2013), applications that help users manage their dental claims (Delta Dental, 2013), and applications that enable users to evaluate their symptoms and manage diseases (PRWeb Newswire, 2013). For such critical applications, understanding users' perceptions of mobile application usability becomes important so that healthcare providers can offer better services and operate more efficiently. For example, when designing these mobile applications, developers should particularly pay attention to gestalt principles, fingertip-size controls, subtle animation, aesthetic graphics and transition because these constructs were most influential in predicting continued intention to use of mobile applications. This illustrates that practitioners should emphasize different aspects of mobile application usability depending on the specific outcome of interest.
Organizations will also have a useful tool to disseminate information internally or monitor knowledge management processes that are conducted via mobile applications. For employees, social media communities offer group interactions through which knowledge is created and exchanged, which could ultimately be integrated in product development. For example, knowledge workers could access their corporate Wikis through their mobile phones and collaborate on creating knowledge, which could ultimately be integrated in product development. There is also an increasing trend toward using mobile applications to execute business processes. For example, the airline industry is using in-flight mobile point-of-sale applications (Delta, 2013) and the trucking industry will be using mobile applications to manage vehicle and operator data (XRS, 2013). With such diverse and critical mobile applications, organizations need a validated usability instrument that captures the most relevant mobile usability dimensions because such applications are becoming important in creating and delivering products.

Conclusions
Internet-enabled smartphones have become increasingly accepted in recent years. Due to this, consumers today expect user-friendly mobile applications from organizations in many industries. Yet, little systematic guidance is available that supports mobile application designers in capturing consumers beliefs regarding the usability of mobile applications in the field. Therefore, the current study analyzed Microsoft's usability guidelines and conceptualized mobile application usability and validated an associated survey instrument. We found strong support for the psychometric properties for our scales. We also found support for the constructs in predicting two critical outcomesi.e., continued intention to use and brand loyalty. The findings are relevant to both academics and practitioners as they shed light on a topic of significance to researchers and practitioners alike. In particular, researchers can leverage our conceptualization and scales to study mobile usability and practitioners can use them to evaluate existing and to-be developed mobile applications.  Continued intention to use I intend to continue using the mobile application. Bhattacherjee (2001), and Venkatesh and Goyal (2010). I want to continue using the mobile application rather than discontinue. I predict I will continue using the mobile application. I plan to continue using the mobile application. I do not intend to continue using the mobile application in future. Chances are high that I will continue using the mobile application in future.
Brand loyalty I encourage friends and relatives to be the customers of the mobile application. Johnson et al. (2006) I say positive things about the mobile application to other people. I will use more services offered by the mobile application in the next few years. I would recommend the mobile application to someone who seeks my advice. I consider the mobile application to be my first choice.