From OpenAPI Fragments to API Pattern Primitives and Design Smells

In the past few years, the OpenAPI Specification (OAS) has emerged as a standard description language for accurately modeling Web APIs. Today, thousands of OpenAPI descriptions can be found by mining open source repositories. In this paper, we attempt to exploit these artifacts to extract commonly occurring building blocks used in Web API structures, in order to assist Web API designers in their modelling task. Our work is based on a fragmentation mechanism, that starts from OpenAPI descriptions of Web APIs to extract their structures, then fragment these structures into smaller blocks. This approach enabled us to extract a large dataset of reoccurring fragments from a collection of 6619 API specifications. Such fragments have been found multiple times in the same or across different APIs. We have classified the most reoccurring fragments into four pattern primitives used to expose in the API access to collections of items. We distinguish for each primitive variants from design smells. This classification is based on the specific combinations of operations associated with the collection items and on an in-depth analysis of their natural language labels and descriptions. The resulting pattern primitives are intended to support designers who would like to introduce one or more collections for a specific class of items in their HTTP-based API.


INTRODUCTION
Application Programming Interfaces (APIs) open up software architectures so that the resulting software system can be integrated with external systems, developed at different times by different parties.In this paper, out of many existing kinds of APIs, we focus on Web APIs [3,14,17,18,21,24] remotely accessible through the HTTP protocol and described using the standard OpenAPI specification language [22].We do so because of the large number of API descriptions using this language which can be retrieved by crawling open source repositories (Figure 3).While the original purpose of Ope-nAPI was to generate human-readable documentation, it can also be used to generate interactive test clients, as well as client-side and server-side stubs [1,2,11].
In this paper, we statically analyze a large collection of real-world API descriptions looking for recurring structures that can play the roles of pattern primitives [25] which can be composed to obtain API design patterns [28].In particular, we are interested in the resources exposed by the HTTP API naming and in the relationship between resource paths and the corresponding HTTP methods.This information can be used by clients to invoke the corresponding operations.
As shown in Figure 1, we started by crawling open-source code repositories for API description documents that use the OpenAPI specification.These documents are parsed and fed to a model from which API structure trees can be extracted.These trees are then fragmented, and the resulting fragments are matched to detect reoccurring ones.Finally, they are clustered to obtain known uses.
Our contributions include: (1) A method to detect similar reoccurring API structures, which takes into account natural language labels associated with each path segment.This method can be also used to compare the structure of different Web APIs.
(2) A collection of widely used API fragments, with a quantitative analysis about how frequently they occur across the same or different real-world APIs.
(3) A collection of structural pattern primitives which have been used as building blocks for HTTP-based APIs.In particular we selected API structures used to provide API clients with access to resource collections of related items (e.g., user accounts, purchase orders and their items, computational jobs, blog posts and their comments, videos or audio tracks).
(4) A classification of some design smells found across Web APIs.(5) Two composition operators for assembling the pattern primitives into larger API structures and a proposal for connecting them with API design patterns and service contracts.
The remainder of this paper is structured as follows: In Section 2, we first give an overview about the data set of OpenAPI documents under study.Then we present our approach that consists of representing APIs as trees based on their textual documentation.We finally detail the fragmentation approach we followed in order to extract common frequent structures that can be found across same or different APIs.Section 3 presents our two-step clustering approach, which takes into accounts both the structure and the semantic closeness between the labels sequences attached to the nodes of a specific fragment.The collection of structural API pattern primitives is presented in Section 4. Section 5 provides two examples of how these pattern primitives can be combined to form larger API structures.It also demonstrates how the pattern primitives and fragments can be mapped to architectural patterns and interface description languages.Section 6 and Section 7 cover related work and present the threats to validity of our work.Finally in Section 8, we present our conclusions.

FRAGMENTING APIS 2.1 API Collection Overview
We analyze technical API descriptions written in OpenAPI, a commonly used Interface Description Language (IDL) to specify the functional characteristics of HTTP-based and RESTful APIs, as well as selected non-functional ones (for instance, some security policies).
By mining public repositories shared on GitHub during December 2020 and January 2021, we collected a data set of 6919 API description documents, with an average size of 22.8KB, including specifications of some well-known APIs such as Twilio, Slack, Flickr APIs, Google APIs, and Amazon APIs.All the APIs described in the descriptions under study contain at least one method and one path.Because of the lack of space, in this paper, we only include one visualization of one of the largest API in the dataset (Figure 9), and other examples of smaller APIs (Appendix A) to show how the detected primitives are used as building blocks to construct the whole API's structure.
The yearly distribution of the age of the OpenAPI documents in our dataset is depicted in Figure 3.The horizontal axis refers to the year of the last commit that updated the document.
2.1.1OAS versions distribution.The collection studied in this work contains 6619 OAS descriptions.28.9% are written in OpenAPI 3.0 and 71.1% are written in Swagger 2.0, coming from more than 600 different providers.
The two versions are slightly different from each other on the content level, but they both allow describing API structures with almost the same level of granularity.In Figure 4, we describe the main differences between the two versions.The numbers (1), (2),.., (9), show the mappings between the sections of a description written in OAS 2.0 and their corespondents in a description document written in OAS 3.0.The first difference is in the servers details section.While in OAS 2.0 it was possible to include only one endpoint for an API, in OAS 3.0 it is allowed to include multiple server objects.Other structural rearrangements have been done in OAS 3.0 in order to increase the reusability of definitions, such as the inclusion of the Components section, where securityDefinitions, schema definitions, parameters, and responses are defined.In addition, in OAS 3.0 a Component object can also contain callback descriptions, which makes this version more efficient in describing asynchronous APIs.Moreover, OAS 3.0 improved the description of the parameters and supports more security schemes and bearer formats than OAS 2.0.
In our case, these differences between the two versions do not impact the results of the structural analysis and the APIs fragments extraction, because our study focused on the paths and methods provided by APIs, which are described in both versions.
2.1.2HTTP Methods usage.In Figure 2 we show an overview of HTTP methods usage in a subset of the APIs under study.We classify APIs based on which HTTP methods they use following the RESTful maturity model [7], which distinguishes L0) APIs that use only one endpoint and one method from L1) APIs that use multiple endpoints and still one method associated to each endpoint, and L2) APIs which use multiple methods with multiple endpoints.Given the lack of support for describing hypermedia in OpenAPI documents, we are unable to distinguish the highest level of the maturity model L3, which includes the APIs that make use of the REST principle Hypermedia as the Engine of Application State (HATEOAS).Still, we can clearly see different types of APIs emerging if we simply count how many HTTP methods are associated with each path enumerated in the API description (Figure 5).We have grouped the APIs into sets according to the HTTP method combinations they use and depicted the results in the bar chart in Figure 5.The most popular group makes use of the CRUD-like primitives of GET, PUT, POST, and DELETE.The second most popular group only uses the read-only GET method.This is closely followed in terms of size, by the APIs which use only the GET or POST methods.Another group of similar size can be observed by combining CRUD APIs which do not use the PUT method (so they alias update and creation operations under the same POST method) together with APIs which instead of using the PUT method they replace it with the PATCH method.The next group includes the pure RPC APIs, which only use the POST method.The last group worth mentioning is the ones that use all five methods, which includes 442 APIs.The collection also includes about 500 APIs with different method combinations, but of rather a small size.

Domain Concepts
In this work, we focus on analyzing the structures of Web APIs with the goal of detecting APIs with similar structures.Due to the granularity of API descriptions in OpenAPI models, we could a create tree model representation for each API in the collection, which we call from now on API Tree.For lifting the level of abstraction of the tree model, we unlabel all its nodes.We refer to the unlabeled tree model as API Tree Structure.
In our analysis, we aim to detect repetitive tree fragments in the API tree models.For that, we define an object called API Fragment, a subtree of an API tree.As for an API, the fragment also has an unlabeled version, which we call henceforth Fragment Tree Structure.After matching and filtering the set of API tree structures extracted from the whole API collection under study, we obtained a list of API structure primitives and another for API Structure smells, as described in the domain concepts summary of Figure 7.

Representing the API Structure as a Tree
2.3.1 API Tree Model.We transform the textual documentation related to the resources and the methods supported by the API into a tree data model, to represent the nesting relationships between the API endpoint URIs, enumerated as paths in the OpenAPI specification.This model has two purposes: (1) It can be used to visualize the functional characteristics of the APIs graphically, to provide a quick overview supporting the understanding of the APIs structure.
(2) The second purpose of this tree data model, described in Figure 8, is to help to rapidly spot commonly used patterns by analyzing reoccurring fragments found within a large set of APIs.The elements colored in gray in Figure 8 are the ones being mapped to graphical notations for being visualized in the API Tree representation.
An example of an API Tree model visualization is in Figure 9.Each API operation, originally listed in the OpenAPI file, can be enumerated by following the path from the root until reaching a leaf of the tree.These later represent the HTTP methods, enumerated in each path in the Open API description.The nodes within the tree are labeled with the corresponding URI path segment and labeled depending on the type of the path segment (Table 1).The types of nodes are explained in Section 3.1.1.
Due to this graphical representation we can also visually detect a repetitive usage of an API Fragment in an API.This same fragment can also reoccur in other APIs, with different labels.
In Table 1, we summarize the notation used in our APIs Tree visualization.

Name
Notation Signification Root R The root of the API Tree.

Method
The HTTP methods, where each method has a specific color.

Static path segment
Path segment with no parameter Parametric path segment Path segment with single {pa-rameter} Complex path segment Path segment mixing parameters with static labels 2.3.2OpenAPI to Tree model transformation.For explaining the model transformation, responsible of producing the tree visualization of the OpenAPI descriptions, we use the description example in Listing 1, which is an excerpt extracted from the OpenAPI description of Apacta Web API whose API Tree is shown in Figure 9.
The path /cities only contains one path segment {1:cities} labeled cities.In our transformation, each segment is transformed to a PathSegment object (Figure 8).We always connect the first path segment to the Root object R , an added graphical element which helps to visualize the API model as a tree.The {1:cities} path segment has no in-path parameters, thus it is mapped to the static path segment notation ( Table 1).As a result, the obtained first portion of the tree is R cities , where we label the path segment node 'cities'.Moreover, the PathSegment object contains fields holding some original information such as the summary, description and the parameters information, for further usages.In this study, we only distinguish between paths that are having in-path parameters This path provides only one GET operation, which allows to get a city by its zip code.The HTTP methods are transformed to the Method object, which also keeps most of the original information about the method, such as the summary, description, and the response details.This Method object is mapped to the graphical ...
Listing 1: Excerpt from the OpenAPI description of the Apacta API shown in Figure 9 the notation should be GET .And as a result, the whole path visual representation is: R cities GET Once all the methods of a path are all transformed, the algorithm jumps to the next path and start extracting the path segments, and put them in a list, respecting their original order.In our example the second path is /cities/city_id.It contains tow path segments { 1: cities, 2: city_id }.The path segment { 1: cities } has already been created.Knowing that each next path segment is a child of the previous one, the path segment {2: city_id} should be then connected to { 1: cities }, which is already created and added to the tree.This new node is also mapped to the PathSegment object, and more specifically to the ParametricPathSegment object, which is associated to the notation: .As consequence, the tree becomes : Looking at the tree model visualization in Figure 9, we can notice this same portion of the tree, constructed from the example in Listing 1, appears multiple times with different labels.For computing exactly how much frequently, a specific structure of a tree fragment appears in the set of APIs in our collection, we proceed to apply the fragmentation and matching technique presented in Section 2.4.

API Fragments
An API Fragment is any sub-tree (a connected sub-graph that includes some of the leaves of the original tree) of an API tree structure.A sub-tree is also a tree, therefore a fragment can be also seen as an API itself, which can be further decomposed.For instance, the excerpt in Listing 1, is an example of an API fragment extracted from the Apacta API (Figure 9).
To achieve our goal of detecting recurrent fragments in the API structures, we present a two-step approach that uses an algorithm that first extracts significant model fragments from a dataset of APIs models, and then compares them across multiple APIs to detect recurring ones.

APIs fragmentation approach.
A tree  is a non-linear data structure, where each non-leaf node can be seen as a root of one or many sub-trees.The goal of the fragmentation function  : →   1 ,  2 ,..,   is to extract all the possible sub-trees   1 ,  2 ,..,   containing a sub-set of the ensemble of leaves of  .For collecting the nodes we walk using Depth-First Search (DFS).In this way, we can extract all the trivial sub-trees, which are the ones having as root the different nodes of .The algorithm extracts also non-trivial sub-trees, which are built by extracting all the branches of a sub-tree having as root a node  , then reconstructing the Tree Structures from all the possible combinations of the branches.Note that a branch starts from the root of the tree, and keeps all the methods attached to the deepest path segment node of the tree.Doing so, we obtain all the possible sub-trees having as a root the node  .Once a sub-tree is retrieved, it is serialized as JavaScript Object Notation (JSON) and stored in a MongoDB database.The same process is repeated over all the nodes of  until no node is left.
We analyse each API description in the collection and extract  1 , 2 ,..,  , where  is equal to the size of our OpenAPI descriptions collection.Then we apply on each tree to extract all possible labeled sub-trees, or labeled fragments  which include a subset of the leaves of the tree from where it was extracted  .While labeled fragments carry the original path segments labels, unlabeled fragments   only distinguish whether a path segment is parametric or not, and if it contains an unusual label.The leaves of both labeled and unlabeled fragments refer to the HTTP methods which can be applied to the corresponding sub-path.
We extract all fragments from all API trees in the collection and look for reoccurring ones.To speed up the process, we first match unlabeled fragments based on their topology, then we further compare the semantic similarity of labeled fragments sharing the same structure.To do so, we project the labeled fragments    into Label Sequences which enumerate the labels found during the traversal of each node of the API fragment tree.In other words, we apply the projection function  :    → (   ,  ), to obtain for each labeled fragment a Tree Structure    (also called unlabeled fragment   ) and a Labels Sequence   .All the resulted output objects are also serialized as JavaScript Object Notation (JSON) and stored in a MongoDB database.

FRAGMENTS CLUSTERING 3.1 API Fragments Clustering and Selection
Having obtained the set of all labeled API fragments, which in our collection corresponds to 277'094 entities, we proceed to remove duplicates and cluster them.
For clustering the fragments, we followed a two-step similarity checking approach, which consists of exact topologies matching and labels closeness similarity scoring: (1) first by their common structure (i.e., the unlabeled fragment), (2) then, we compute the average label semantic similarity for each cluster of fragments sharing the same structure.
The output of structural clustering consists of a set of clusters where the elements of each cluster share the same API Structure, using different labels.We give higher priority to the larger clusters (more than 40 elements), knowing that the size of the cluster reflects how common is a specific structure.These Known uses are then considered as candidate structural pattern primitives.The goal behind semantically comparing the fragment sharing the same structure is to find out if there is a common use context of a highly recurring API Structure.

Structures matching.
In our approach, we see an API fragment as a sequence of labels  placed on the nodes of a Tree Structure  .Where a node of a   can be either a path segment or a leaf representing an HTTP method.During our analysis, we decided to distinguish between three types of path segment: segments containing a parameter, noted as single word label between { }, segments that are not containing a parameter, and segments holding labels  with more complex parameter notations, such as the example in Figure 10, which occurs 222 times.In our comparison approach, we consider the type of the path segment as part of  .Thus, API Fragments in Figure 11 and Figure 10 are detected to be distinct since the first structural clustering step.In this way, we already distinguish fragments, which even if they have the same tree topology, have parameters in different positions along the tree.Following our fragments   comparison approach, we extracted, from a set of 277'094 labeled fragments, 79'728   unlabeled fragments sharing the same tree structures, considering also the type of path segment node, and the types of the HTTP method in the leaves.

Semantic closeness.
Oftentimes, path segment labels carry some semantic meaning related to the resource handled by the path.For that reason, we considered taking into account the labels of the fragments nodes.Doing so, we involve the semantic context and have a better understanding of the common usage contexts of a specific fragment.
In our two-step similarity checking approach, we first clustered the fragments by their  , then extracted all ordered sequences of node labels found for each   of the labeled fragment (Figure 12).Doing so, we obtain a collection of labels sequences for each  .The size of the sequence is equal to the number of nodes of the  , excluding the leaves.
To compute the similarities between the labels sequences, we use spaCy 1 , an open-source library for Natural Language Processing (NLP) in Python and Cython.In our case, we use a spaCy's trained model for English language [20], using the latest version of the "en_core_web_md" model package, multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl for spaCy.
We distinguish the following types of labels: (1) Single words (i.e., stream, details, etc as in the example fragment in Figure 11), (2) composed labels, which concatenate single words using camelcase, or a "-" or a "_" symbol (Figure 42), (3) unusually long, complex labels (i.e., #x-amz-target=codedeploy 20141006deleteapplication), We added a formatter to the spaCy's processing pipeline in order to cover the different labels types cases we have.We also added a filter at the end of the pipeline, which has a goal to exclude the labels that could not be matched to any semantic concept.
We define the distance between two label sequences  = { 1 ,.. Doing so, within each   cluster, we measure the semantic closeness of each sequence by calculating a similarity score between the labels attached to the same nodes of the tree.This score consists of the distance between the vectors representing the labels sequences of each fragment.Using Agglomerative Hierarchical Clustering, we obtain the semantic clusters for each set of structurally similar fragments, by setting a threshold depending on the similarity score distribution in each   cluster.

Labels Similarity Results
While we do not have space to include the complete clustering results, we summarize the results with five metrics (Table 2): (1) The average distance between each couple of sequences: the goal of this metric is to depict how much are each two labels sequences are alike or similar.A low average means that most of the labels sequences are composed of semantically close elements.
(2) The median of these distances: the median gives an idea about the distribution of the distances.A high median means that the majority of the labels sequences are not semantically close.Table 2 shows that label sequences in different collections of fragments are semantically similar.For each primitive, we will provide detailed examples of labels associated with each variant/smell in the next Section.

STRUCTURAL API PRIMITIVES
Out of the results obtained from the fragmentation and clustering process, we selected a set of most occurring fragments and classified them to four primitives (Figure 14), depending on their functionality based on their structures.The context for all pattern primitives is the same: a designer needs to use an HTTP-based API to provide access to a collection of items which are stored on the server.
All the primitives are used to expose in the API collections of items, where each collection if identified by a statically-named container resource and its items are dynamically addressed within the container resource.We distinguish each primitive based on which combination of HTTP methods are attached to the container resource.

HTTP methods
The Enumerable Collection (P1) primitive is used when clients can use the API to only discover the content of the collection by retrieving a list of their items.The Appendable Collection (P2) primitive makes it possible for clients to only append items into the collection exposed by the API.The Collection (P3) primitive combines both features of the Appendable Collection and the Enumerable Collection, so that clients may use it to both append new items and list existing items.Since this primitive is the most commonly found one, we choose to name it with the simplest and shortest name, while adding qualifiers to the names of the other Within each primitive, we have collected many variants and design smells depending on which combination of HTTP methods is attached to the collection item resource.
In Figure 13, we provide a more detailed overview showing for each primitive the corresponding variants and design smells.Each variant and design smell of the same primitive are encapsulated in a gray frame.We also show how each variant can be obtained by changing another one with the black and gray arrows.The black arrows trace the paths that allow moving from a structure primitive to another by adding an operation on the items of the collection.And the gray ones are showing which methods are added to the container resource.In the rest of this section we present overviews focused on each structural primitive.
During our analysis, we have also detected some structural design smells, which we highlight in Figure 13 with colored frames.We classified the detected smells into the following categories: Create without Delete: API structures that allow the clients to create elements from a collection, but do not provide a possibility to delete elements from it.
Delete without Create: API structures that allow the clients to delete elements from a collection, but do not provide a possibility to append elements to it.
Ambiguous POST: API structures that contain a POST operation on the items of a collection.Is this POST method used to append items to the collection?Ambiguous PUT: API structures that provide a PUT operation on the collection.Is this PUT method really used to update the whole collection?
Write-Only: API structures that have no read operation neither on the whole collection nor on its items.In Table 3, we show an overview of the selected collections of fragments, by listing their occurrences and the number of unique labels sequences used by the same structures in the same API or across different ones.We also give an example of the most distinct sequences fount among the unique labels sequences, in order to show the extreme use contexts for each structure.
The rest of this section details each of the selected primitives where we present for each primitive the different occurring variants.For each variant of each primitive we filtered the most frequent labels used by all the variants of a primitive, and sort them alphabetically to ease the readability of the heatmaps (Tables 5, 7, 10, and 12).In the Figures, we show the occurrences (counting how many times a Labels Sequence is used for the same  ) of each cluster of labels in a specific variant/smell.
We also provide a set of guidance tables, based on known uses.The listed labels are obtained by clustering the Labels Sequences by the container resource label, as explained in Appendix D.
The goal is to support designers who would like to introduce a collection for a specific class of items in their API.They can take advantage of the observations we have collected as they attempt to look up the collection label and see if there is a non-ambiguous mapping to a given primitive variant.
In order to give an idea about the yearly distribution of the variants ages and popularity, we calculate the number of APIs in which a specific variant appears (Tables 4, 8, 9, 11).

Enumerable Collection (P1)
Summary.Expose an enumerable set of items within their own container resource.
Problem.How to make the collection items discoverable by clients?
Solution.Provide a unique address for each collection item.Allow clients to read the content of each items applying the GET method to the address of the item.Group together related items under the same resource path prefix.And, allow clients to enumerate the items within the collection by applying the GET method to the container resource.In Table 4, we can clearly see the increasing usage of the Variants and Smells in the API collection over time.This increase can be both because of the yearly distribution of the API specifications gathered in our data set, and to the popularity the structural primitives gained through the years.
• Enumerable Collection Variants For the Enumerable Collection primitive we have identified 2 variants and 3 design smells (Figure 15).

GET (P1.v1
).The read-only variant is one of the most occurring structures, which allows clients only to enumerate the content of the collection and to read the corresponding items.APIs use it to publish one immutable set of related items.By setting a threshold of 5 obtained 218 Labels Sequences clusters.Which depicts the variety of usage contexts of this variant.
According to the whole labels sequences set that we extracted, we noticed that this read-only structure is widely used for different domains.In Table 5, we show some of the labels clusters used by this variant.We can notice that all the most frequent labels in the Enumerable Collection (P1) are used by this variant, except 3 ones:  and and , which are used by the variant GET/PUT/DEL (P1.v4) which allows also to update and delete the items of the container resource, and , which is only used by GET/PUT (P1.v2).
An example of API where this structure primitive is present several times, in the Apacta API showed in Figure 9.In this API we can see clearly the high occurrence of GET (P1.v1) with different labels, combined with variants of other primitives.
Size: 4 -Occurrence: 1588 -Distinct Labels: 744  ).This variant allows clients to use the GET and PUT methods on the collection items.This makes it possible to read and update the content of individual collection items.This API structure also appears in Apacta API (Figure 9). Figure 43, is a use case example of this API structure.The GET operation in the resource handled by the path /users/id/topics allows the client to get all the topics of a specific user.The get operation in the path /users/{id}/topics/{topic_id} has as goal to verify if a user is following a specific topic.The response is an object of boolean type.In this case, the PUT operation is for interpolating the FOLLOW / UNFOLLOW relationship between the user {id} and the topic {topic_id}.
Size: 5 -Occurrence:99 -Distinct Labels: 43 PUT/DEL (P1.v3).The particularity of this variant is that it allows to both update and delete items, however, it does not allow the client to create new items in the collection by using the container resource.Instead, it still allows them to do so by invoking the PUT method directly on the items to be created.In this case, clients themselves should provide the identifiers for the items to be added to the collection.
Size: 5 -Occurrence: 40 -Distinct Labels: 26 This API structure appears one in the TVmaze user API showed in Figure 42.Where it is used for reading the collection of shows, and deleting or updating each.Another API example where this structure appears in Invotra API (Figure 48).In this case, adding a new user to the users' collection is possible due to the POST operation of the path users.However, it seems that the client is not allowed to add new user memberships to a specific team.While, according to the descriptions of the operations, it is possible to remove a user's membership of the team or update information about his team membership.Then, how can a team have new members?In this case, the user object schema is having a teams property of array type.Thus, adding a new member to a team is performed by means of the PUT operation provided in the path: /users/{userId}.GET/PUT/DEL (P1.v4).The main characteristic of this structure is that it in addition to the GET and PUT methods it also exposes a DELETE method on the collection items.This way, clients can not only read and write the associated content but can also remove items from the collection.Size: 6 -Occurrence: 176 -Distinct Labels: 71 While in general, it can be useful to allow clients to remove items from a collection, it is not clear whether an API should support this for collections whose content can only be enumerated without providing the means for the service to mint identifiers for new items.Instead, new items can only be added by clients as long as they provide the new item's identifier.
While this can lead to crashes when multiple clients attempt to invoke the PUT operation on the same item, we have observed different semantics for the PUT and DELETE methods.For example, some APIs use the DELETE method for something different: task cancellation.In this case, we assume that the tasks being performed within the server can be monitored by clients and when necessary can be interrupted.
For a better understanding, we have extracted the content of the description field of the DELETE method.
Looking at the descriptions of the delete method extracted from the OpenAPI documents in Table 16 (Appendix B), it is clearly understandable that the DELETE operation is not always meant for clients to delete an item from the collection.
More in detail, in the description D-40, the DELETE method is allowing the client to delete a person from the list of followed people, but no append operation is provided.An example of an API where this fragment appears is in Figure 42.In this API this example, we look at the fragment with labels sequence  =,{_ }, in which we can notice that the following operation is done through the PUT method.In this case, when following a person, this new followed person is not appended to a collection of followed people, but instead, the followed person is updated through the PUT operation with the information about a new follower.Ambiguous POST As opposed to updating the content of individual items of the previous variants, in this variant, the API makes it possible to fetch the current state of each item with GET and invoke some arbitrary operation on each of them with POST.
Size: 5 -Occurrence:77 -Distinct Labels: 22 Coming back to the OpenAPI descriptions of the APIs where this variant of fragments appears, we extracted the content of the summary and description fields for the POST method, which we list in Table 15 (Appendix B).Based on the descriptions, we can detect two main use cases for the POST methods on the collection items: (1) Appending an item to the collection: in this case placing the POST operation over the collection items can be seen as a common mistake.
(2) Updating an attribute of an existing item: in this case the POST is mistakenly used to perform the role of the PUT method.

GET/DEL (P1.s2).
Delete without Create This smell provides access to a collection whose items can be read and deleted, without offering clients the possibility to append new items.
This smell only appears with 26 distinct labels.In Figure 5, we can see that it appears 21 times out of 56 with labels represented by the label .This same label appears only once with the GET (P1.v1) variant.Other labels found in conjunction with this smell (e.g., tasks, jobs) would indicate uses for providing access to serverside resources which can only be monitored and eventually removed by clients, which do not have any control over their lifecycle.

• Appendable Collection Variants
The common point between the variants of this primitive is that they all only allow the client to append on a collection, and to perform different operations on the items.Starting from the variant that allows all of GET/PUT/DEL operations, until the one that only allows reading the items.For this primitive, we have detected a design smell, where the client is not allowed to perform any read operation, neither of the collection nor on its items.
GET/PUT/DEL (P2.v1).This variant allows clients full control over the items they have appended to the collection, as they can read, update and delete them.
Size: 6 -Occurrence: 194 -Distinct Labels: 64 For understanding the reason for the absence of a read operation on the collection, we extracted natural language descriptions of the GET operation.We want to verify whether the designers mistakenly considered that the GET operation on the item would serve also for listing all the content of the collection.In Table 19, we list some of the summaries and descriptions associated with the GET method.We can see from the descriptions that the GET operation is indeed used to retrieve specific elements from the collection.
Figure 45 shows and an example of use of this variant, where it is combined with the GET/DEL (P2.v2) variant.In this case, the PUT operation is used to update a sign in record.
GET/DEL (P2.v2).This variant only allows to read or delete individual collection items.It occurred 145 times, however with only 43 distinct Labels Sequences.
A concrete usage example of this primitive is in Passman API (visualized in Figure 46), an open-source developers API for Passman extensions.In the case of this example, the GET/DEL (P2.v2) variant is used in order to allow uploading and attaching a file to an item by means of the POST operation in the /file path.The client is also allowed to delete or get the content of a specific file, using, respectively the DELETE and GET operations allowed in the path /file/{file_id}.Another example is in Figure 45, where it is used beside the GET/PUT/DEL (P2.v1) variant, allowing to to create a team member (user) record, to retrieve the information associated with a user's account, and finally to delete a team member's user record.
Size: 5 -Occurrence: 145 -Distinct Labels: 43 GET (P2.v3).This variant only allows the client to add elements to the collection, and then read each one, but it does not provide the ability to edit or remove items.Collections featuring this primitive contain resources which are garbage collected on the server-side, such as jobs, queries, or sessions.Another example is the appendonly shopping cart in which clients can only add items without ever removing them.
Size: 4 -Occurrence:202 -Distinct Labels: 127 Write-Only Instead of a GET operation, this variant introduces a PUT.However, it does not occur as frequently as the variants GET/DEL (P2.v2) and GET/PUT/DEL (P2.v1).
We have analyzed the 50 occurrences to attempt to determine how such a write-only API fragment would work since it appears it is only possible to append new items, update or delete them.Indeed, no occurrence supports the ability to enumerate the content of the collection, nor it allows clients to read from its items.
Size: 5 -Occurrence:50 -Distinct Labels: 31 Write-Only Same as PUT/DEL (P2.s1), this variant does not provide the client the possibility of performing GET operations.Neither on the containers nor on the items.It only allows to append new items to the collection and delete them.
Such unreadable, write-only collection can still be useful, for example, to manage asynchronous jobs, or subscriptions or messages submitted into the API which can be only canceled from the clients.Since the collection cannot be enumerated, this works only if the address of the newly created items is returned to the client who created it using POST.
Nevertheless, we tag this variant as a smell, because of the strong limitations imposed by offering a write-only collection.

Collection (P3)
Also known as.Enumerable-Appendable Collection Summary.Use the container resource to enumerate its content and add new items.
Problem.How to make the collection items discoverable by clients?How to let clients add items to the collection?Solution.Group together related items under the same prefix.Allow clients to enumerate the items within the collection by applying the GET method to the container resource.Clients can use the POST method on the same container resource to add new items.

• Collection Variants
We present different variants featuring different method combinations on the collection item, starting from the one having four methods, all the way to fragments with a single method attached to the collection item.
In this primitive, we have detected two Design Smells (Figure 28), both are related to the Create without Delete smell.Even the simplest variants with only one operation on the item to delete or update them would appear to lack the ability to directly reading individual collection items.While this is the case, as opposed to the previously discussed Appendable Collection smells, clients can still fetch the content of the entire collection using the GET operation provided by the container resource and then extract the values for individual items from the result.Although this variant includes most HTTP verbs and thus is the most expressive in terms of which operations clients can perform on collection items, it is far from being the most frequently used in practice.An example of use of this variant is in ID Vault API ( Figure 47), where it appears 6 times.
Size: 8 -Occurrence: 328 -Distinct Labels: 159 GET/PUT/DEL (P3.v2).Fragments of this variant combine both the POST and GET operations on the collection.With more than one thousand occurrences, this variant (Figure 30) is the most occurring we have mined, not only within the variants of this collection but also among all the fragments having more than 3 distinct methods in their leaves.Several instances of this primitive can be found with different labels in the Apacta API (Figure 9).Size: 7 -Occurrence: 1123 -Distinct Labels: 574

GET (P3.v4
).This is the simplest variant of this collection.The client cannot perform any operation on the collection items, except to read their content.We have found examples of account collections, whose content cannot be modified by clients.Likewise, this is a common structure for long-running operations [15], which are started with a POST request used to transfer the input of the computation, while the status of the ongoing job and its result can be retrieved from the corresponding item.
Size: 5 -Occurrence: 323 -Distinct Labels: 168 PUT/DEL (P3.v5).This variant makes the client unable to individually read each item of the collection.However, it is possible to list them all, insert new items, delete or update them.Examples of such collections with unreadable items would contain simple items whose address indicating their identity and existence is sufficient to control their lifecycle (e.g., using the PUT operation to control the video or audio track playback).Likewise, to set the quantity of individual order line items or remove them from the order altogether one does not need to be able to retrieve any information about them.Also because such information can be fetched when enumerating the content of the entire collection.
Size: 6 -Occurrence: 169 -Distinct Labels: 84 GET/DEL (P3.v6).This variant only allows to read or delete individual collection items.This is one of the most frequently found variants, with a collection storing a wide variety of items.For example, once blog posts, comments, or questions are published, they cannot be updated but just removed.Likewise, it appears there is no need to update the ingredients of a recipe.

DEL (P3.v7
).This is a simpler variant, where it is possible to the client to list the elements of the collection and insert elements into it.Once the items have been added, it is only possible to remove them.In addition to bookings, this variant has been frequently used for collections of blog post comments, product reviews, or favorite bookmarks, whose content can be shown when retrieving the entire collection, but for moderation purposes, it may be necessary to be able to remove individual items.
Size: 5 -Occurrence: 201 -Distinct Labels: 87 Create without Delete This smell provides only one operation to update individual items of the collection, but lacks the affordance for deleting individual items.This is used with collections of items whose state should be controlled by clients, for example to configure or simply switch on or off devices, gateways or services through a management API.
Size: 5 -Occurrence: 78 -Distinct Labels: 48 Create without Delete Also, this smell does not allow clients to delete an item from the collection, however it allows them to insert items, read them and update them.
It has been used to design APIs which provide access to collections of users, user accounts, customers, employees, or withdrawals.These are resources which once they are created may need to be preserved forever for legal reasons, due to data preservation or retention regulations.
Size: 6 -Occurrence: 63 -Distinct Labels: 47   The two variants with PUT do not really seem to be used to replace the content of the entire collection (only a few exceptions).Instead, the PUT method is used to edit or update individual items, addressed by a query parameter that identifies the item to be replaced.(Query parameters are not shown in the figure).For verifying the purpose of use of the DELETE method over the collection, we have extracted the descriptions associated with that method in the OpenAPI specification of the APIs where that fragment occurs (Table 17).Indeed, in this case, the DELETE operation is used to delete to the whole collection.This "DELETE all" variant could be promoted to a separate primitive named "Erasable Collection".Size: 9 -Occurrence: 74 -Distinct Labels: 52 Ambiguous PUT This structure provides both read and append operations over the collection, in addition to a PUT operation.While the only operation that the client can perform over the collection items is a delete.For verifying the real purpose behind having a PUT operation over the collection resource we extracted in Table 20 the descriptions associated with this method in the OpenAPI documents.The text shows that, in reality, in almost all cases, the PUT is used to update an item of the collection.The address of the item is provided as a request parameter, as opposed to using the resource path as with most other primitives.Only in a few cases, it is actually used as one would expect, for updating the whole collection with a single batch operation to replace its content (Example D-13).
Size: 6 -Occurrence: 48 -Distinct Labels: 18 Ambiguous PUT The fragments of this variant combine all of GET, PUT and POST operations on the collection.An example of the most occurring fragment in this variant is in Figure 41.This fragment occurs 102 times with 56 distinct Label Sequences.
Also, in this variant the PUT method on the container resource is used mostly to update the content of individual items (Table 18), thus leading to some ambiguity as it should have been associated with the collection item resource.Figure 44 show an example of API where this variant is used twice.

FROM PRIMITIVES TO LARGER STRUCTURES AND API RESPONSIBILITY PATTERNS
This section gives two examples of how the mined primitives presented in the previous sections can be used during API design and API reviews.First, we discuss primitive composition.Next, we briefly outline how the structural primitives from this paper relate to previous work on API design patterns and interface description languages.

Composing Primitives
The basic collection primitives can be composed to form larger API structures in two ways: (1) Unrelated collections can be added to the API by adding the corresponding container resource on the same level as shown in the fragments of Table 13; (2) Related collections can be nested inside one another, by adding a sub-container resource within each item of the main collection, as shown in the fragment of Table 14.
In general, we found that both side-by-side composition and nesting can be used together in the same API.The Invostra API shown in Figure 48 is an example of an API entirely composed of two primitives.

Relation to Architectural Patterns and Interface Description Languages (IDLs)
In the patterns community, technology-and platform-neutral interface representation and service design patterns have been mined and published.The Microservice API Patterns (MAP) language [29], for instance, focuses on the design of remote APIs -including but not limited to service-oriented architectures.MAP has two categories that complement the API primitives and fragments carved out in this paper, structure [30] (of request and response message representations, not HTTP resource tree structures as covered in this paper) and architectural responsibility [26].
The HTTP methods found in the resource trees in Sections 2, 3, and 4 map to the MAP language as this: • HTTP GET methods are "Retrieval Operations" [26].
• HTTP POSTs can be "State Creation Operations" but also "State Transition Operations" (partial update variant) [26].The remaining operation pattern from MAP, "Computation Function", can be mapped to HTTP GET (if its request parameters are simple) or POST (if request parameters are complex).
The collection primitives that we derived from URI structures in Section 4 correspond to the MAP endpoint pattern "Information Holder" and its specializations [27]: • "Master Data Holders" expose many GET retrievals and only a few bulky create POSTs and update PUTs.They usually are enumerable, and also appendable (at least for certain clients).• "Operational Data Holders" typically are Enumerable and Appendable Collections, often also Mutable Collections.• "Reference Data Holders" are read-only and therefore Enumerable Collections.• "Data Transfer Resources" are Mutable Collections decoupling multiple application clients.• A set of related "Link Lookup Resources" forms a Collection as well; each item in such collection is mutable and so is the entire collection.
Microservice Domain-Specific Language (MDSL) [10] is an emerging abstract service contract language that exposes the Microservice API (MAP) patterns used above as decorators.As an example, let us model the example from Section 2 in MDSL (including the data contracts specifying request parameters and response representations, as well as error reports): API description ApactaAPI version "v1" overview "See previous paper sections and APIs.guru" data type CityCollection { "data":City*, "pagination":PaginationDetails, "success":Metadata<bool>} data type City { "created":Metadata<string>, "deleted":Metadata<string>, "id":ID<string>, "modified":Metadata<string>, "name":Data<string>, "zipCode":Data<int>} data type PaginationDetails { "count":Metadata<int>, "current_page":ID<string>, "has_next_page":Metadata<bool>, "has_prev_page":Metadata<bool>, "limit": Metadata<int>, "page_count":Metadata<int>} data type ErrorNotFound { "data": { "code":Data<int>, "message":Data<string>, "url":Link}, "success":Metadata<bool>} endpoint type CityEndpoint serves as REFERENCE_DATA_HOLDER and COLLECTION_RESOURCE exposes operation getListOfCities with responsibility RETRIEVAL_OPERATION expecting payload "zipCode":Data<string>? delivering payload CityCollection reporting error searchFailed "404": ErrorNotFound operation getCityDetails with responsibility RETRIEVAL_OPERATION expecting payload "cityId":ID<string> delivering payload City reporting error noSuchCity "404": ErrorNotFound In MDSL, such abstract service contract, possibly discovered via the the stepwise fragments and primitives mining approach established in previous sections, can be translated to not only to HTTP but also gRPC, GraphQL schemas, and other IDLs.The translation to an HTTP resource API is not straightforward and therefore benefits from an explicit binding (defaults exist, but usually are not sufficient a) to establish resource trees that contain parametric path segments and b) to map endpoints that contain more operations than the unified verb interface of HTTP can bear): API provider CityEndpointProvider offers CityEndpoint at endpoint location "http://tbc.tbc.tbc:8080"via protocol HTTP binding resource CityCollection at "/cities" operation getListOfCities to GET element "zipCode" realized as QUERY parameter report searchFailed realized as 404 with "Not found" resource CityDetails at "/cities/{cityId}" operation getCityDetails to GET element "cityId" realized as PATH parameter report noSuchCity realized as 404 with "Not found" The above specification snippets merely demonstrate that it is feasible to model HTTP resource APIs in MDSL and to leverage the primitive names from previous sections as well as MAP decorators while doing so.A thorough introduction to MDSL can be found in [10], and the language reference is available at https://microserviceapi-patterns.github.io/MDSL-Specification.

RELATED WORK 6.1 Model Clustering
In our work, we analyze and mine a large data set of real-world OpenAPI specifications.We extracted selected building blocks from these specifications, focusing on the resource URI tree structures and HTTP method verbs in particular.We cluster these model blocks both structurally and semantically.Hence we have the same goals as the authors of [4], but in a different context.This paper presented an approach for clustering models using n-grams in order to incorporate the structural context of the models in this task, and also to see the impact of using n-grams on the resulted clusters.They applied the approach on a dataset of Ecore metamodels collected from AtlanMod Metamodel Zoo using different sizes of n-grams, where they found that the clustering accuracy does not increase monotonically along with increasing the size of the n-grams.In our case, we involve all the label sequences extracted from fragments models in the semantic clustering task, except the names of the method, as we consider them part of the structure rather than being part of the semantic context.

Structural Analysis of Web APIs
Many works tried to statically extract structural characteristics and functional properties of APIs from their textual descriptions (both informal or formal documentation).Similar to our approach, in [8], the authors performed a structural analysis over 286 real-world publicly available API specifications (retrieved from apis.guru).The authors defined a set of quantitative metrics related to the resources and the HTTP methods supported by the API and identified the challenge of extracting recurring API fragments.
An earlier study [12] manually collected and analysed a set of 222 non machine-readable publicly-available APIs descriptions gathered from programmableweb.com by selecting randomly APIs from each category, hence covering 18% of the APIs listed on the website at that time.
From it, Maleshkova.M et al. [12] extracted some metrics such as type of Web API, input parameters, output formats, invocation details and if the API has complementary documentation.While [8], and [12] focused more on metrics extraction and providing static results about the sample under study, the authors of [13] performed an in-depth analysis in order to detect five REST design patterns and eight anti-patterns.To do so, they defined and implemented detection heuristics.Likewise, the authors of [19] also focused on verifying the compliance of REST APIs with REST constraints.However, the analysis presented in [19] was performed over a large data set of 78GB of HTTP requests corresponding to one full day of Mobile Internet traffic, collected by Italy's biggest Mobile Internet provider.This work reached the conclusion that only a few of the analyzed APIs comply with the best practices and constraints of the REST architectural style.[16] is another work questioning the design quality of web APIs, focusing mainly on REST APIs for cloud computing.They defined a catalog of 73 best practices in the design of REST APIs, in terms of understandability and reusability, starting from a literature review.And they applied it over a set of well-known APIs in the Cloud Computing area, where they found that Google Cloud follows 66% (48/73), OpenStack follows 62% (45/73), and OCCI 1.2 follows 56% (41/73) of their best practices.
While these works focus on evaluating the quality of the design of different Web APIs collections, our approach is one of the first systematic and quantitative studies to recover common structural design decisions adopted by Web APIs creators.
More patterns for various styles of distribution have been mined previously, including Messaging Patterns [9], Remoting Patterns [23], Patters of Enterprise Application Architecture [6], and Service Design Patterns [5].Many of these works discuss APIs, interface representation, and service design in the particular field of distributed system technologies and architectures they focus on.

THREATS TO VALIDITY
Basing empirical studies on resources collected for public repositories shared publicly on version control system is prone to error, because of the fact that not all what is shared is accurate.During our analysis, we encountered 2534 (38,28%) invalid OAS documents, which contain errors related to the conformity to the OpenAPI metamodel.In this work, we transformed the exact content of the OAS models into our API tree model, which also reflects the errors that may be encountered in the OAS document.Such errors may have reduced the number of occurrences of some structures, as the ones containing mistakes can't be matched to other error-free structures.
The study performed in this work is based on a data set of OAS documents mined from GitHub.To avoid having a biased data set full of duplicates, we only inserted in the OAS DB distinct documents.However, there can be forks that did not introduce changes to the structure of the API, which could have inflated the values of the occurrence and the popularity metrics of some fragments.
While not all OAS documents we found on GitHub describe Web APIs offered in production, including partially developed artifacts in our analysis can still provide evidence of reuse of common API fragments.

CONCLUSION
In this paper, we presented a data-centric pattern mining approach.We applied it to find recurring primitive structures within Web API descriptions.
To do so, we extracted recurring fragments from a large collection of OpenAPI specifications gathered from open source repositories.Reflecting the hierarchical nature of HTTP-based APIs, these fragments are represented as trees.These trees are built out of resource identifiers; they can be traversed to obtain all paths that are present in the original OpenAPI specification.The leaves of the API trees refer to the HTTP methods that invoke the corresponding operations.
From a population of thousands of fragments, we selected those that a) frequently occur, b) have a relatively small size, and c) are centered around the notion of resource collection.As shown in Figure 14, we distinguish the following cases a) collections only offer operations on their items or on the collection level as well, b) their content can be enumerated, new items can be created, and/or both enumeration and creation are supported, and c) batch removal and update are provided.
For every primitive, we presented a selection of variants together with the corresponding label clusters and, in some cases, descriptions associated with the operations.A few variants can be also seen as design smells, for instance, if they use the HTTP method semantics incorrectly or inconsistently.
Our results are a collection of pattern primitives, which can be (and have been) composed to build larger API structures.For instance, this becomes evident when connecting the syntactical patterns that we mined here automatically with semantic architectural patterns previously mined manually by knowledge engineers [26,27].We merely gave the first examples of primitive composition in this paper, and only outlined these connections; in our future work, we plan to investigate these topics more thoroughly.This work is partially funded by the SNSF, with the API-ACE project nr.184692.

APPENDICES
We attach to this paper four appendices with additional information regarding: • Tree visualizations of some real-world known uses of the pattern primitives variants in Appendix A, using the notation described in Table 1

A API TREE VISUALIZATIONS
This appendix contains visualization of some real world APIs selected as examples for this study.They are the APIs where some variants and smells of the primitives we have described in Section 4 can be found with high occurrences.

A.1 TvMaze user API
In the TvMaze API 2 we can find six occurrences of Enumerable Collection -GET/PUT/DEL Variant (P1.v4), combined with one read operation in the path /vote/shows.

B EXTRACTED DESCRIPTIONS OF HTTP METHODS
This appendix contains tables for some selected methods description, used to verify the Design Smells discovered in our study.We detect a design smell when the natural language descriptions associated with the API feature are not consistent with the standard semantics of the chosen HTTP method.

C API FRAGMENTS OVERVIEW
To give an overview of the API Fragments, we measure their size, their number of occurrences, and the number of unique label combinations they have.
Figure 49 shows that the larger the fragment, the less likely it is to reoccur multiple times.The largest fragments occur only once.Our constraints in the search of API fragments that can be used as pattern candidates is to both avoid unpopular fragments while also ensuring to find "interesting", large-enough fragments.
While it is difficult to see in the chart, each dot represents not only a fragment of the given size and occurrences, but since there can be multiple fragments in the same coordinates we also color each dot by the number of fragments found in that position.Most of the fragments crowd into the origin area and have few repeated occurrences.While the chart shows there is a lot of variability in the structure of APIs, it also shows that -this is an extreme case -there are four small fragments that reoccur more than 3000 times.
More in detail, we are interested to observe the actual labels on the API fragment since they give an indication of the API semantics by giving names to its features.Labeled API fragments provide potential known uses for the candidate patterns.ot include fragments with a single occurrence, thus the X-axis range is also reduced accordingly.

D LABELS USAGE
The goal this Appendix is to provide an additional overview about the labels usage through a set of guidance heat maps to help in plotting the most used labels for each structural primitive.
We calculated the occurrences of each label in each structural primitive.The labels are sorted first by their total number of occurrences, then alphabetically so that it is easy to find at the top of the table which are the most used labels in each primitive.The most used label for a specific variant is found by looking at the row which has the darkest color in the variant's column.
To know in which variant a specific labels sequences is used the most, it is enough to horizontally scan the row for the label looking for the highest value, ignoring the last column in which the total number of occurrences across all variants is reported.The heat maps only include the representative labels that we obtain using our labels merging approach described in Section 3.1.2.
We merged labels sequences with common container labels such as the ones in Figure 52.These labels are used in different variants of the Collection (P3) primitive and they share the same meaning, thus we decided merging them in order to have concise guidance tables.In our labels processing approach, we ignore the case of the container resource label (e.g :  is equivalent to  ).We also remove all the special characters and the spaces (e.g: , −, _ are considered equivalent).Moreover, ignore the singularity and plurality of the labels (e.g: _ is equivalent to ). In the example of Figure 52 the label  originally appears in different formats (e.g:  ,  ,   ), our merging algorithms pick the most occurring form as a representative label of all the forms.The reason behind this clustering is to give more insight about the labels usages and merge the ones that represent similar concepts in order to avoid redundancy.

D.1 Enumerable Collection (P1)
In this primitive, the GET (P1.v1) is the one that appears with the highest number of distinct labels.We notice that most of the labels are only used by this variant, while other few ones are used also by the other variants (occurs 202 times).This primitive is less frequently used.The intersection between the labels used with the "Remove All" variant and the two smells is rather small, resulting in the two labels: secrets and tags.

Figure 1 :
Figure 1: API Analytics Pipeline: From API Specifications to Patterns Figure 2: API Collection Sample (sorted by Number of Paths)

Figure 3 :
Figure 3: Yearly distribution of the age of the OAS files crawled from GitHub

Figure 6 :Figure 7 :
Figure 6: API Method Combination Overview vs. API Size the previous path, this path also provides only one Get HTTP method.So the API Tree corresponding to the whole OpenAPI description example is: Tree Structure for this API Tree is simply obtained by removing all path labels: R GET GET

Figure 10 :Figure 11 :
Figure 10: Example of a repetitive fragment with complex parametric path segments labels

( 3 )
The maximum distance between a couple of sequences.(4) The number of clusters that sequences were grouped by.(5)The threshold defining the maximum distances between all observations of two sets.This value was defines based on the the distribution of the values in the distance matrix.

Figure 13 :
Figure 13: Overview: API Structural Primitives and their variants and design smells

Figure 15 :Table 4 :
Figure 15: Enumerable Collection -Overview of Variants and Design Smells

Figure 26 :
Figure 26: Appendable Collection -PUT/DEL Design Smell (P2.s1)DEL (P2.s2).Write-Only Same as PUT/DEL (P2.s1), this variant does not provide the client the possibility of performing GET operations.Neither on the containers nor on the items.It only allows to append new items to the collection and delete them.Such unreadable, write-only collection can still be useful, for example, to manage asynchronous jobs, or subscriptions or messages submitted into the API which can be only canceled from the clients.Since the collection cannot be enumerated, this works only if the address of the newly created items is returned to the client who created it using POST.Nevertheless, we tag this variant as a smell, because of the strong limitations imposed by offering a write-only collection.

Table 12 :
Mutable Collection (P4) -Labels found in each variant/smell GET/PUT/DEL/PATCH (P4.v1).This variant provides only a delete operation on the collection.This variant occurs 74 times.It provides besides the DELETE and GET operations on the collection items, also PUT and PATCH operations.The second label for all the labels sequences is {} except one sequence  = { , { }}

Figure 43 :
Figure 43: Tree Visualization of Columba API

Figure 45 :
Figure 45: Tree visualization of an API for the COVID-19 Tracking QR Code Signin Server

Figure 47 :Figure 48 :
Figure 47: Tree Visualization of ID Vault API

Figure 52 :
Figure 52: Label sequences with container label "users" in Collection primitive

Table 2 :
Overview of distances between all the labels sequences of each primitive and its variants/smells.The smells are colorcoded.Design Smells:Create without Delete, Delete without Create, Ambiguous POST, Ambiguous PUT, Writeonly

Table 3 :
Known Uses of Selected Fragments: Number of Distinct Label Sequences (nDLS) and their Occurrences within APIs.

Table 11 :
Yearly distribution of the API specifications where the Mutable Collection (P4) variants and smells appear • HTTP PUTs are "State Transition Operations" (full replacement variant).
• PATCHes correspond to "State Transition Operations" (partial update variant).• DELETE methods are represented as variants of State Transition Operations.

Table 13 :
API Fragments Composing the Read-only Collection and the Collection primitives side by side

Table 14 :
API Fragments Composing the Read-only Collection primitives with nesting Size: 8 -Occurrence: 32 -Distinct Labels: 13Label clusters of sample known uses:(O: Occurrence, C:Cohesion) . • API Fragments overview in Appendix C, we plot the different correlations: Fragment size vs.Occurrences, Number of unique label combinations vs. Occurrences and Fragment size vs.Number of unique label combinations.• Textual descriptions of some HTTP methods in selected variants in Appendix B, extracted from the original specification of the APIs where a specific variant or smell appears.• Most frequent labels, for each primitive variants and smells in Appendix D.

Table 15 :
Extracted description of the POST method for Enu-

Table 20 :
Extracted description of the PUT method of variant DEL (P3.v7)

Table 21 :
Enumerable Collection -Most occurring labels D.2 Appendable Collection (P2) and  are the most used labels in this primitive.While most of the rest of the labels appear less than 10 times.GET/PUT/DEL (P2.v1) and GET/DEL (P2.v2) are very recurrent variants (occurrences respectively 194 and 145), however we notice that they are not used with a high number of distinct labels such as GET (P2.v3), which is the most occurring variant in this primitive.

Table 22 :
Appendable Collection -Most occurring labelsD.3Collection(P3)In this primitive, there is a dominant label which is.It occurs 223 while the second most occurring labels,    and ℎ_ are only used 61 time.

Table 24 :
Mutable Collection -Most occurring labels