Evaluating and Improving Microservice Architecture Conformance to Architectural Design Decisions

. Microservices are a commonly used architectural style targeting independent development, deployment, and release of services, as well as supporting polyglot capabilities and rapid release strategies. This depends on the presence of certain software architecture qualities. A number of architecture patterns and best practices that support the required qualities have been proposed in the literature, but usually in isolation of one another. Additionally, in real-world systems, assessing conformance to these patterns and practices and detecting possible violations is a signiﬁcant challenge. For small-scale systems of a few services, a manual assessment and violation detection by an expert is probably both accurate and sufﬁcient. However, for industrial-scale systems of several hundred or more services, manual assessment and violation detection is laborious and likely leads to inaccurate results. Furthermore, manual assessment is impractical for rapidly evolving and frequently released system architectures. In this work we examine a subset of microservice-relevant patterns, and propose a method for the semi-automatic detection and resolution of conformance violations. Our aim is to assist the software architect by providing a set of possible ﬁx options and generating models of “ﬁxed” architectures.


Introduction
Microservices are one of many service-based architecture decomposition approaches (see e.g. [1,2,3,4]). The chief features of microservices are that they communicate via message-based remote APIs in a loosely coupled fashion, and that they can be highly polyglot; ideally, microservices should not share their data with other services. This allows the rapid evolution of individual microservices independently of one another, and their independent deployment in lightweight containers or other virtualized environments. These features make microservices ideal for DevOps practices (see e.g. [5,6]).
While a large body of literature has examined architectural patterns and recommended "best practices" in a microservice context [3,7,8], translating these theoretical insights into usable tools to assist the architectural evolution of actual microservicebased systems has lagged behind. While the theoretical tenets proposed in the literature are easy to grasp and maintain in small-scale systems, ensuring conformance in large, complex, as well as rapidly and independently evolving systems quickly becomes a laborious affair requiring considerable manual work and resulting in extensive overhead effort. Furthermore, patterns have mutual dependencies, meaning that improvement in one area can result in deterioration in another. Real-world architectures are also impacted by a number of non-microservice-specific requirements, which also can lead to unintended violations of microservice best practices.
This work provides a set of actionable solutions to violations on different aspects of microservice architectures, as part of a larger study on the topic. Three architectural design decisions (ADDs) were selected as representing very different aspects of architecting microservices, so as to demonstrate the wide applicability of our approach. Other ADDs have already been covered in our prior work. More specifically, for covering the best practices of client-system communication we chose the External API decision; for the guaranteed delivery of messages, a critical aspect of many business-critical microservice systems, we used the Inter-Service Message Persistence decision to examine the relevant recommended practices; finally, to cover the logging and monitoring practices that ensure observability of the microservices and their complex interactions, we used the End-to-End Tracing decision. In this context, we aim to study the following research questions: -RQ1 What are the possible architecture violations related to the above-mentioned ADDs and how can they be automatically detected? -RQ2 What are the possible fixes for the violations found in RQ1 and how can architects be assisted in choosing the appropriate solutions and applying them?
We propose a novel architecture refactoring approach that uses empirically validated metrics proposed in our prior work [9] to evaluate the degree of architecture conformance for each of the given ADDs. For every ADD design option, we define every possible violation and propose a corresponding, automated violation detection algorithm, as well as a set of possible fixes. For each microservice-based system, the sets of ADD options, violations, and fixes leads to a search tree of possible architecture designs that partly or entirely enforce conformance to best practices, which we can continually assess using our metrics.
To evaluate our approach we utilized a set of 24 models of microservice-based systems from third-party practitioners (see Table 1). For each of these, we implemented the automated violation detection and refactoring (fix) algorithms to detect the possible violations and to generate all the possible fixes for addressing each violation, resulting in a set of models. Using our metrics, we evaluated the improvements compared with the original version, as well as any outstanding issues. This process was iteratively repeated until all violations were resolved. Each of the violations found in the 24 models can be fully resolved leading to optimal metric values within at most 3 refactoring steps, usually with many suggested optimal models provided as options for architects to choose from. This paper is structured as follows: In Section 2 we analyze the ADDs examined in this work, the associated patterns and practices, and the corresponding metrics. Section 3 discusses and compares our approach to existing studies in the literature. Our research methods and the tools we have applied in our study are described in Section 4, followed by a detailed explanation of our approach in Section 5. The evaluation process is given at Section 6, the results are discussed in Section 7, and the threats to validity in Section 8. Finally, in Section 9 we draw conclusions and discuss future work.

Background: Decisions and Metrics
In this section, we briefly introduce the three ADDs and the corresponding patterns and practices as decision options, based on our prior work. The decisions have been modeled based on an empirical study of existing best practices and patterns by practitioners [10], while the metrics used to assess the pattern conformance of each given system derive from [9].
External API Decision. A fundamental decision in microservice-based systems is how external clients are connected to the system services. This can affect aspects related to loose coupling, releasability, independent development and deployment, and continuous delivery. The simplest method, but with the highest negative impact, occurs when the clients can call into system services directly, resulting in high coupling that impedes releasing, developing, and deploying the clients and system services independently of each other. Another option, that solves possible problems caused by client-service direct connections, is the API Gateway [3], which provides a common entry point for the system (Facade component) and all client requests are routed via this component. It is a specialized variant of a Reverse Proxy, which covers only the routing aspects of an API Gateway but not further API abstractions such as authentication, rate limiting, etc. (see [7]). The Backends for Frontends pattern [3] is another variant of API Gateway that specializes in handling different types of clients (e.g., mobile and desktop clients). Alternatively, the API Composition pattern [3] describes a service that shields other services from the clients by actively gathering and composing their data. In our previous work [9], we have empirically defined two metrics that can be used to assess conformance to each of the decision options: -Client-side Communication via Facade utilization metric measures how many unique client links are using the External API used by one of the Facade components (i.e. offered through patterns such as API Gateway, Reverse Proxy, Backends for Frontends) compared to the total number of unique client links. -API Composition utilization metric measures the proportion of clients connected services which are possibly composing an External API using API Composition.
Inter-Service Message Persistence Decision. The persistence or missing persistence of the inter-service messages is another decision with considerable impact on the qualities of the system. Many real-world systems use no inter-service message persistence, while options that support message persistence are the Messaging pattern [11], in which persistent message queuing is used to store a producer's messages until the consumer receives them, or alternatively Stream Processing [8] components (e.g. Apache Kafka). Another option is Interaction through a Shared Database, since it supports some level of message persistence, but not the automated support of Messaging. A technique that is more microservice-relevant and able to support a lower level of persistence to Messaging or a Shared Database is the combination of the Outbox and the Transaction Log Tailing patterns [3]. A persistence more tailored to event-driven or eventually consistent microservice architectures can be achieved following the Event Sourcing pattern [3]. For this decision, too, we have empirically defined three metrics that can be used to assess conformance to each of the decision options: End-to-end Tracing Decision. End-to-end tracing is an important aspect in microservice architectures since they are usually highly distributed and polyglot systems with complex interactions. One option, like in the other decisions, is to offer no tracing support. Alternatively, traces can be recorded on either the services themselves or facade components (or both) via Distributed Tracing [3]. A less comprehensive level of tracing can be achieved when service communication is routed through a central component, which stores some, but not all inter-service communication (e.g., Publish/Subscribe, Message Broker [11], API Gateway or Event Logging [3,8]); the exception is Event Sourcing, which temporarily stores all service events. For this decision, too, we have empirically defined three metrics that can be used to assess conformance to each of the decision options:

Related Work
The fundamentals of the term "microservices" were first discussed by Fowler and Lewis [12], and fundamental tenets by Zimmermann [5]. Richardson [3] has published a collection of microservice patterns and practices, while a mapping study by Pahl and Jamshidi [1] has summarized much of the previous literature on patterns. Skowronski [8] has examined event-driven microservice architectures specifically, and microservice API patterns were studied by Zimmermann et al. [7]. A number of studies have focus on techniques for detecting design or architecture "bad smells" (violations). Taibi and Lenarduzzi [13] defined a list of microservicespecific smells, while Neri et al. [14] have presented an extensive examination of architectural smells for independent deployability, horizontal scalability, fault isolation, and decentralisation of microservices, as well as suggesting refactorings to resolve them. Most similar studies are more generic, but still useful. Le et al. [15] proposed a classification of architectural smells and their impact on different quality attributes. Catalogs of smells have been published by Garcia et al. [16,17] and Azadi et al. [18]. Detection strategies for smell categories related to our study are discussed by Brogi et al. [19], Le et al. [20], Marinescu [21], and especially Neri et al. [14], along with suggested refactorings for resolving them. Although these works study various aspects of architecture violations detection, and some investigate aspects related to the microservice domain, none covers detecting and addressing violations specifically associated with the ADDs covered in this work (external API, persistent messaging, and end-to-end tracing) in a microservice context, which our work investigates in detail.
As a result, we expect that our work produces more accurate detection of decisionspecific violations and more targeted suggestions for fixes. On the other hand, our approach requires a model in which the component and connector roles in a microservice architecture have been modeled (as for instance done with stereotypes in the model introduced in Figure 2). That is, our work requires additional insight into a system's architecture, and some effort in encoding the corresponding models; however, this knowledge is at a relatively high level of abstraction and the resulting models are not impacted by changes in service implementation. We are currently working on a semi-automatic approach for architecture reconstruction and modelling that relies on reusable code abstractions and is thus suitable for complex systems with short delivery cycles.

Research and Modeling Methods
In this section, we summarize the main research methods applied in our study. These have been more extensively described in our previous work [22]. For reproducibility, all the code of the algorithms' implementation and the models produced in this study will be made available online, as an open-access dataset in a long-term archive 3 . Figure 1 shows the structure of the research process of this study. In Section 2 we have already explained in detail the architectural decisions and the model-based metrics on which this study is based. In Section 5 we present precise definitions and algorithms a) for the detection of possible violations per decision option, and b) for the possible fixes (architecture refactorings) for each violation.

Research Method
We have tested our approach by applying the algorithms to the 24 models in our data set. First all violations present in each model were detected, and then all possible fixes for each violation were applied in an iterative-exhaustive manner, i.e., on the resulting, refactored models for each violation fix, we again performed all violation detection algorithms and applied all possible refactorings, until either no more violations were detected, or we arrived at a refactored model identical to a previous version. In the latter case, which we did not encounter here, this would have meant that a violation could not be entirely resolved, as its fix introduced other violations. For each of the final models (the 'leaves' of the iteration tree), we assessed pattern conformance through our metrics on microservice coupling, to judge the improvement compared to the original model.

Architecture Refactoring Approach
From an abstract point of view, a microservice-based system is composed of components and connectors, with distinct sets of component types and connector types. This applies also to indirect or implicit relationships between components, such as indirect dependencies, which can be described as a special set of connectors. For example, in Stereotypes can be applied to components to denote their type, such as Service, API Gateway, etc. Stereotypes can be applied to connectors to denote their type, such as Read_Data, RESTful HTTP, or Asynchronous. Some are specialized with tagged values (details omitted here for space reasons).
-cp_annotations : CP → {String} is a function that maps an component to its set of annotations. Annotations are used in our approach (in some of the fixes) to document aspects that need further consideration or maybe manual refactoring. -cn_annotations : CN → {String} is a function that maps a connector to its set of annotations.
Please note that we define many additional model traversal operations not detailed here for space reasons. All services, facades and the corresponding connectors in the model are traversed, and it is checked whether services and/or facades support tracing or whether an intermediary component is presented. If no intermediary component or tracing support on services/facades is found, the violation is raised and the list of all relevant connectors is returned by the detector operation.  Table 2 summarizes the possible violations we have identified for each of the decisions. The table also describes in detail how the algorithms that we use for detecting the violations in the models work. As a detailed example, Algorithm 1 detects the Services communicate without using an intermediary component violation of Decision D2. It returns a list of connected service pairs s i and s j , that are not connected via an intermediary component.  Table 3 details all the fixes for each identified violation, along with a summary of the fix algorithm. Please note that many algorithms can only be applied fully automatically with their default values. Many of them require human review and decision by the architect. For example, the architects can be presented with a choice of an intermediary component to use to replace services links.

Violation Fix
Fix and Fix Algorithm Summary D1: External API The architect has to select if an existing intermediary component can be used for the fix, or a new one has to be created. Replace interconnections that don't support end-to-end tracing with interconnections via this component.
D3.V1.F3: Connect services and facades that don't support end-to-end tracing with a tracing component (e.g., Zipkin) The architect has to select if an existing tracing component can be used for the fix, or a new one has to be created. Introduce interconnections from service and facades to tracing component.

Example Application
In Figure 2 the model CI4 from Table 1 is shown as an illustrative example to demonstrate all three violations and possible fixes. In this model the Cinema Catalog service is connected directly with Movie and Booking services, causing D2.V1 and D3.V1, while Client is connected directly with Cinema Catalog service, causing D1.V1. In contrast, Booking Payment and Notification services are connected to each other and with the Client through the API Gateway, resulting in no violation. If we run our fix algorithms, some of the resulting refactoring suggestions are:

Iterative Application and Evaluation
To evaluate our work, we have fully implemented our algorithms for detecting violations and performing fixes, as well as generating the set of metrics described in Section 2 to measure the improvements and the presence of remaining violations, in our model set. In case multiple violations are present in a model, then the algorithms can be employed iteratively, until all violations have been fully resolved.
As an example, let us illustrate this exhaustive iterative refactoring for the previously mentioned CI4 Model (see Figure 2). CI4 violates all the three decisions as indicated by the corresponding decision-related measures in Table 4. The incremental refactoring process is illustrated in Figure 3. At the first iteration, there are three branches, indicating the respective violations. The first refactoring step produces 6 possible model variants, one for each fix option from Table 3. All resulting models have resolved the respective violation, but have the other two unresolved, requiring another refactoring step that produces 18 new model variants. In turn, 7 of the resulting models still violate D1.V1 and D2.V1, requiring a third step to be resolved. At the end of the third step, we have 29 suggested model variants (M1_1, M2_1, M2_3, M1_2_1-M1_2_2, M2_1_1-M2_2_2, M2_4_1-M2_4_2, M3_1, M3_2_1-M3_2_2, M4_1, M4_2_1-M4_2_2, M4_3_1-M4_3  _2, M5_1-M5_2, M4_4_1-M4_4_2, M6_1_1-M6_2_2, M6_2_1-M6_2_2, M6_3_1-M6_3_2, M6_4_1-M6_4_2) which all fully resolve the violations (i.e., scoring 1.00 in our assessment scale). The architect can choose the refactoring sequence, and from among those final optimal model variants, but can also choose to not apply certain fixes, e.g. due to other constraints that are outside of the scope of our study.
For evaluation purposes, we have performed this procedure for all 24 system models in Table 1. The resulting number of intermediary models and violation instances per step, and the number of final suggested models with an optimal assessment of 1.00, are given in Table 4, along with the initial violations and architecture assessment values for each model. Please note that the metrics reported here are the ones associated with each of the decisions in Section 2. Please also note that for each violation to be fixed, it is enough that at least one of the corresponding metrics is optimal (1.00). Obviously, the number of steps required to reach optimal models depends on a) the number of the violations present in the initial model and b) on the possible appearance of new violations during the refactoring process, which did not occur in the present case. As can be seen in Table 4, all models are fully resolved-i.e., all assessment metrics are 1.00-after at most three steps.  Table 4: This table shows a) the architecture assessment (per decision/violation pair) of the original models used in our study, b) the number of models generated at each step of an iterative application of our algorithms, and c) the number of violation instances (generated models × violations per model) still remaining, or introduced, after each iteration, plus d) the resulting number of suggested (optimal) models at the end (cf. Figure 3 for a detailed example).

Discussion
To answer RQ1 we have systematically specified a number of decision-based violations related to each possible decision option, summarized in Table 2. As we have empirically shown in our prior work [9] that the metrics described in Section 2 can reliably distinguish favored or less favored design options, the role of the violation detectors is to find the precise locations in the models where the violations occur. For each system model in our evaluation dataset it was possible to suggest fixes that bring the architecture to optimal values, meaning that the algorithms have found the right place(s) to apply the fixes.
Regarding RQ2 we defined a number of algorithms addressing every possible violation, with multiple fix options (cf. Table 3). If all options are tried out, this results in a search tree of possible architecture models, which can in turn be assessed, using our metrics, to measure improvements to the initial architecture and detect any remaining violations. We have shown (cf. Table 4) that an iterative approach results, within a few steps, in a sufficient variety of possible architecture models that remove all detected violations and ensure pattern conformance of the system architecture. The multiple optimal model variants that result from our approach give architects substantial levels of freedom in their design decisions. As detection is fully automated and human expertise is limited to the fix process, the approach is well suited to be run in a continuous delivery environment, which was one of our research goals.

Threats to Validity
The basis material of our study derives from third-party sources: the solutions we propose are gathered from the best practices recommended in the published literature, and our evaluation dataset is a fairly representative set of systems (cf. Table 1), derived from nine different sources and published with the express purpose of demonstrating microservice architecture features. One possible threat to the internal validity of our algorithms is that they depend on the particular modelling approach we have adopted. However, our approach is by design abstract and generic, based on typical componentand-connector models used widely in the literature. The author team, with considerable experience in modeling methods, performed the system modeling as well as, repeatedly and independently cross-checked all models. As the main modelling criterion was the ability to adequately represent the context of our systems, we cannot exclude that other teams might arrive at different interpretations, but we are confident that any resulting models would be broadly similar and compatible with our results. Furthermore, the algorithms we specified could easily be adapted to a different model, as they operate on the level of basic architectural constructs.
Nevertheless, some limitations remain. In order to remove the obstacles provided by the polyglot nature of microservice-based systems, we have chosen to apply our metrics and tools at a relatively high level of abstraction. We also limited our evaluation in the present paper to the patterns, metrics, and concerns applying to the given three ADDs, which in a real-world architecture would be insufficient. This point is addressed in previously published and ongoing parts of our work, which extend the coverage to additional ADDs, and aim to extend and test our approach in a larger set of patterns, design requirements, and more granular parameters. The same concern applies as to the lack of evaluation of the applicability of our approach on larger and more complex systems that are commonly found in industry, but which were not accessible to us for study. The lack of full automation is also a major obstacle to practical application, as the process still requires considerable input by the architect. At the same time, our approach can not match the ability of an experienced architect, familiar with the system, to devise a much more optimal solution. This is a limitation of all generic architecture assistance approaches, and one we intend to improve on. We want to emphasize that the present approach is a starting point from which the question of evaluating and improving microservice architectures can be examined, facilitating and building up to more complex and nuanced methods as more systems and decisions are modelled and tested. The generated models are also not optimal, as they are not evaluated, for example, on the coding/refactoring effort required to implement them. Nevertheless, the existence of a semi-automatic approach that detects and analyzes violations in an architecture remains of great value, since practitioners often ignore best practices, systems are often developed without a conscious effort to follow best practices, or are allowed to drift from the original architecture specifications over time.

Conclusion and Future Work
In this paper we present a set of violations for three microservice-related ADDs. Building on previous work, we have defined automatic detectors, which return the location where the violations occur, a set of possible fixes for each violation, and automatic algorithms for refactoring the system in order to fix the violations. We have evaluated our approach on a set of 24 models of various degrees of pattern violations and architecture complexity, and have shown that our approach is capable of resolving these violations in at most 3 refactoring steps. Both metric calculation and violation detection are fully automated, but the choice of fixes and refactoring sequence remains with the human architect. Thus the approach is still flexible enough to let the architect make meaningful architectural design choices.
In our future work, we aim to broaden the set of ADDs and violations included in our approach, enrich it with runtime metrics and other architecture aspects such as deployment environments, and extend our model dataset to include larger and more complex systems. In addition, we hope to experimentally validate our approach by employing it in real-world delivery pipelines as part of a feedback loop.