Quality-aware analysis and optimisation of virtual network functions

The softwarisation and virtualisation of network functionality is the last milestone in the networking industry. Software-Defined Networks (SDN) and Network Function Virtualization (NFV) offer the possibility of using software to manage computer and mobile networks and build novel Virtual Network Functions (VNFs) deployed in heterogeneous devices. To reason about the variability of network functions and especially about the quality of a software product defined as a set of VNFs instantiated as part of a service (i.e., Service Function Chaining), a variability model along with a quality model is required. However, this domain imposes certain challenges to quality-aware reasoning of service function chains, such as numerical features or configuration-level Quality Attributes (QAs) (e.g., energy consumption). Incorporating numerical reasoning with quality data into SPL analyses is challenging and tool support is rare. In this work, we present 3 groups of operations: model report, aggregate functions to dynamically convert QAs at the feature-level into the configuration-level, and quality-aware optimisation. Our objective is to test the most complete reasoning tools to exploit the extended variability with quality attributes needed for VNFs.


INTRODUCTION
The softwarisation and virtualisation of network functionality is a new trend in Industry 4.0, especially for emergent mobile technologies such as Beyond 5G networks (B5G). The objective is to turn networks into general-purpose platforms providing smart connectivity to a plethora of devices. Software-Defined Networking (SDN) [32] and Network Function Virtualization (NFV) [26], are widely accepted paradigms to address the new structure of network architectures. SDN aims to introduce network programming capability and NFV is an innovative, but complementary paradigm that promotes the virtualisation technology to disengage network functions from dedicated hardware appliances and transform them into software components, so-called virtual network functions (VNFs). Then, user applications demanding a network service turn out to be a request for running a set of VNFs at the application plane on servers. These application services (i.e., VNFs) can be tailored for certain applications family (e.g., virtual reality, video delivery or distributed games), domains like IoT, or allocated to a class of customers, or certain mobile network operators.
Next-generation networks such as 6G promise to provide a large set of agile services, custom-made and providing user-defined Quality of Service (QoS), such as latency or energy consumption. Indeed, there is a growing interest in energy-efficient orchestration of VNFs, being this the main goal of the DAEMON project that supports this work 1 . Therefore, to reason about the variability of network functions and especially about the quality of a software product defined as a set of VNFs instantiated as part of a service (i.e., Service Function Chaining), a variability model along with a quality model is required. Unfortunately, the heterogeneity in the network complicates the relationship between variability and quality of VNFs configurations [42]. In addition, this domain imposes certain challenges to quality-aware reasoning of service function chains, such as numerical features or configuration-level Quality Attributes (QAs) such as energy consumption. Incorporating numerical reasoning with quality data into SPL analyses is challenging and tool support is rare.
While the majority of works in SDN area focuses mainly on applying Artificial Intelligence approaches, such as deep learning, reinforcement learning or control theory to proactively adapt VNFs chains to network workload and current resources, little work focus on customizing a set of VNFs considering different alternatives providing variable QoS [11] [21]. In this work, we apply SPL technologies to the quality-aware reasoning, customization and optimization of VNFs chains for specific user services. Typical quality attributes considered in the DAEMON project are latency and energy efficiency, so we argue we need to incorporate feature-level and configuration-level QAs, the two main approaches for quality reasoning of variability models according to [39,47]. Feature-level QAs, are modelled as attributes directly linked to single features, such as response time or any cost associated with each feature. Second, configuration-level QAs, which cannot be quantified at the feature-level and must be measured and associated at the configuration-level. Two examples of this are performance or energy consumption [39,47]. However, existing works mainly focus on feature-level QAs using attributes [6,33,46,52] so they do not apply to energy-efficient SDNs, and only a few of them deal with both [46,57]. Further, these works do not fully support real-world SPLs and advanced reasoning, and neither are easy to extend. For example, FAMA framework [6] supports extended FMs with attributes as quality information. However, it cannot support generating the optimal configuration within a range of values for a specific quality (e.g., minimise a VNF's energy consumption above 1 Watt) [51]. With the objective of modelling and reasoning flexibility, we proposed a Category Theory (CT) [3] framework for SPLs. There, we unified FMs and Quality Models (QMs) [35] in a single model and tool. It has the potential to support complex relationships and the quality-reasoning of numerical FMs. We consider quality-reasoning to any Automatic Analysis of Feature Models operations with QAs on their resolution. For instance, request VNFs consuming 1 Watt and with a cost under 10$.
In this work, we highlight three groups of operations needed for VNFs' quality-aware reasoning. The first one is a model report, which covers type and number features, type and number of constraints, size of the configuration and measured spaces, and QAs metadata. The second group are the aggregate functions, which define how to convert QAs in the feature-level into the configurationlevel in the form of addition, product, mean, and approximation arithmetic equations. The last one is optimisation like maximum, minimum, (weighted) multi-objective, and range objective. In section 3, we analyse the support that current tools provide to those groups of operations. Additionally, we define and implement in our CT framework the necessary reasoning algorithms in Section 3. For the evaluation Section 5, we selected the 4 most complete tools: CQL IDE [35], ClaferMoo [39], AAFM Python Framework [18], and SATIBEA [23]. Alongside our new CT reasoning algorithms, we validated those tools for 5 different real-world SPLs with different QAs and a variety of reasoning operations of the three groups. Finally, we compare their capabilities and performance when generating results. Our main contribution is to provide a set of tools to perform the necessary quality-aware operations for VNFs orchestration.

QUALITY-AWARE REASONING OF SDN
VNFs orchestration to compose SFCs has complex requirements for low latency, energy and security among others [32]. In this section, we analyse with an SPL perspective the quality-aware reasoning operations that could guide such orchestration processes. We summarised the operations in 3 groups: model analysis, aggregation of attributes values, and optimal search. For illustration purposes, we will provide several examples based on a reduced SPL of our Virtual Network Orchestration System represented by the VN S model of Figure 1. To include a single model with variability alongside all types of QAs, we draw an olog based on our CT framework. In other words, ologs are the categorical counterpart of completely extended FMs [48].
VN S is a real-world problem that we are solving in the context of the DAEMON project and contains different virtual network managers, containers for virtualisation software (e.g., Kubernetes) and 3 common VNFs, following the proposed standard reference architecture for the Management And Network Orchestration (MANO) of VNFs [26]. Figure 1 shows a simplified version that comprises 18 boolean and 1 numerical features, 1 propositional and 1 arithmetic constraints, and 3 feature-level and 2 configuration-level QAs: hardware Dependency, Usability, Security, Energy and Time respectively. The performance and energy metrics are obtained after several runs with specialised tools like Watts Up? Pro and multimetres, while the other 3 are consensus from industrial partners.
We can find many reasoning operations in the SPL literature [4,25]. We made a selection based on DAEMON requirements; they can be summarised on an analysis of the virtualised network properties and an optimal orchestration of VNFs: • Type and number of the features: A reasoner counts every feature in the model per type and domain. Classic features have a boolean domain [27] indicating their existence in a configuration. Numerical features expand that domain to integer, real, etc. [36], and identify many states of a feature (e.g., Throughput). In Figure 1 the features are boxes identified by F and N F respectively. • Type and number of model constraints: A reasoner counts every model constraint. Classic constraints are propositional logic comprising connectives (i.e., and, or), negations, implications and exclusions [27]. Our SPLs also need arithmetic constraints like inequalities with additions, subtractions, multiplications, divisions and modules. Figure 1 contains one of each summarised at the bottom. Non-linear constraints are theoretically possible as a combination of them (e.g., power) [36]. • Number of configurations: A reasoner counts every valid complete configuration of a model; in other words, it counts the configuration space that the model represents. While the fastest reasoners perform it by pure model counting (e.g., sharpSAT and boolean decision diagrams), the most common alternative is to construct and enumerate them (e.g., Clafer) [36]. The configuration space represented by Figure 1 is 63 configurations.
To this set of operations, DAEMON requires specific ones for QAs. These are novel, as the integration of variability and quality models are mainly unexplored as already discussed in Section 1: • Number of quality measured configurations: A reasoner counts how many configuration measurements are for all QAs. As Figure 1 is completely measured for its 5 QAs, its measured configuration space is 63 * 5 = 315 measurements. • Number, name, value and domain of feature and configuration levels QAs: A reasoner counts every QA present in the model and details their type, range and domain. QAs can be grouped into 8 different types, where the most common ones are performance, usability and security [9]. Configurationlevel QAs in a model provide the range of measurements in the configuration space. However, feature-level QAs provide a range of values for their respective feature space, and additionally, aggregate functions. Finally, any QA must define its domain (i.e., metric). As in Figure 1, where VN S comprises 5 QAs, hardware Dependency (D), Usability (U), and Security (S) at the feature-level, and Time and Energy at the configuration-level.

Aggregate Operations
To compute a configuration-level value of a QA based on featurelevel values, we need one aggregate function per QA. Hence, aggregations are functions for approximating the quality of configurations [55]. Feature-level attributes have clear advantages like smaller space (feature versus configuration space) and can be used in prediction functions, but they come at the cost of accuracy, manageability and maintainability (e.g., energy consumption [37]). This means that, while in theory any QA can be represented by attributes and functions, in the real world that niche is shared between the two spaces. Consequently, feature attributes with aggregate functions have their place in SPLs.
The most trivial type of aggregation is the addition (e.g., calculating the final additive cost in $ of individual components [4]), but we could define any sort of arithmetic function, including nonlinear functions like Gauss approximation or predictive performance model functions [50]. For instance, the configuration value of VN S dependency is the Maximum value of the individual values in each feature. On the other hand, Minimum calculates VN S security value. Finally, features' usability is calculated as the aggregate Mean. Not every QA is a numerical metric [14]. If we take a closer look to Figure 1, we can see that dependency and security range is a non-numerical scale (i.e. Low, Medium and High), on the contrary, usability is numeric (i.e., [0,10]), and hence the aggregations must consider this in their definitions and implementations.
To close the discussion about VN S QAs, we clarify that time in seconds and energy in joules are configuration-level QAs, and therefore they do not need aggregation functions, as these values were obtained directly by experimentation. Indeed, time and energy are two examples of complex QAs, which the most accurate values are obtained by experimentation because they are difficult to calculate with aggregation functions.

Optimal Search Operations
In SPLs, optimisation problems are the ones of finding, from the quality measured space, the best configurations with certain quality values [39]. To define and guide the search, we must define objective functions. The most common objectives in the literature are maximise and minimise functions. If an objective considers more than one QA, we are dealing with multi-objective optimisation. In this type of complex optimisation is common that no QA can be better off without making another one worse off. Hence, we are in the field of calculating a set of similar high-quality configurationsthe Pareto frontier.
To prevent confusion, aggregation takes feature-level QAs as input and transforms them into configuration-level QAs by calculating each corresponding value with an aggregates function. While objective function acts more like a configuration space filter where its inputs and outputs are configuration-level QAs.
However, that is very interesting reasoning-wise, as we can apply function composition, allowing us to include feature-level QAs into optimisation problems by pre-aggregating QAs values. For instance, in Figure 1 dependency aggregation is the maximum, which provides the dependency of a VN S configuration. But then, we can apply the same function as an objective (i.e., maximise) to obtain the configurations with the highest dependency. That would not have been possible without aggregating before optimising.
Considering that composition, we could perform multi-objective optimisation considering both, feature-level and configuration-level QAs, in the same function. For example, only after aggregating, we can define the following objective search for VN S: Maximise Usability and Security while Minimise Dependency, Time and Energy.
Additionally, we can define and use new quality domains based on the ones that already exist in the model. We elaborate on this with an example in VN S. As we know that the energy rate of a system is its energy consumption divided by its runtime (e.g.,

=
), we could redefine the previous multi-objective as Maximise Usability and Security while Minimise Dependency and Energy Rate, with Energy Rate = .

RELATED WORK
This section summarises our search for proper reasoning solutions for virtualised networks, including QoS, SPL tools with support for quality reasoning, and optimisation algorithms.

QoS in Software-Defined Networking
QoS is the capability of a network to provide the required services for selected network traffic. Quality assurance of network services is based on measuring QAs, where the key factors are: path length, throughput, latency/performance, security, hardware dependency, capacity/usability, and energy [34]. To make things worst, those QAs are negatively influencing one another. For example, a high latency (e.g., due to incorrect ordering of VNFs) leads to failure of packet handling policies, thus increasing the vulnerabilities, which degrades security as it creates incidents. Similarly to SPL quality-aware reasoning, there are works in the literature that define QoS-aware approaches to solve these issues. Likewise, hybrid QoS approaches are multi-objective optimisations. Most of the approaches rely on machine learning, like [43] where the authors guide the orchestration of VNFs via Deep Reinforcement Learning. Another example of deep learning from DAEMON partners is vrAIn [1]. An alternative is heuristic algorithms for near-optimal orchestration. An example is [19], where heuristic formulas are based on linear programming. Another alternative is statistic approaches like [8] where a dynamic statistic multiplexing governs the network orchestrator. Finally, we find network intelligence as the new trend [2], where an example from DAEMON partners is Nuberu [20] based on Bayesian algorithms.

Tools Supporting Quality-Aware Reasoning
Existing SPL tools [25] provide at least the basic features and constraints defined in FODA (Feature-oriented Domain Analysis) [27]. Additionally, each tool supports a different set of extensions, such as numerical features, attributes, and complex constraints [25], and a different set of reasoning operations [4,5]. We are interested in tools that allow performing some quality reasoning. Regarding the techniques to generate optimal configurations, we included a subset of them, since they are mostly based on the same algorithms (e.g. based on IBEA [22,23,44]) and offer similar operations.
Quality Modelling. Typically, an SPL engineer would like to obtain the configurations with a QA below a threshold (e.g., SFC configurations that consume less than 3 Joules) or generate the best-qualified configuration (e.g. trade-off between energy consumption and performance). We have already discussed the differences between feature-level and configuration-level QAs. These QAs are classified in [46] as feature-wise and variant-wise respectively. Feature-level QAs are the most common in the literature, and are supported by ClaferMoo [39], FAMA [6], FeatureIDE [33], pure::variants 2 , SPL Conqueror [46] and STEAM [52]. In QAM-Tool [57] authors use an alternative representation and extend the FM by incorporating QA-specific features in a sub-tree. Another alternative is to have some external storage to relate features and quality measurements as usually done in genetic algorithms (SATIBEA [23], MILPIBEA [44], MO-DAGAME [40]). An exception is the GIA algorithm [38], defined to be applied to an attributed FM that also uses the Z3 solver. Only a few approaches, such as QAMTool [57] and HADAS [37], support QAs at the configurationlevel. SPL Conqueror supports them only partially by calculating an approximated value for the feature attributes based on the set of measured configurations during the generation of the product configuration. Our CT framework [35] supports both the feature-level and configuration-level QAs.
Formalising and Solving Variability Models with Qualities. SPL tools (labelled with T: in Table 1) that only support feature-level QAs (ClaferMoo, FAMA, pure::variant) commonly use a declarative paradigm (e.g. CSP, BDD, SAT) to represent the FM and reason about its quality. In other cases, an external quality model is defined (e.g., a goal model), and the QAs measurements are usually linked to the configurations through a database. The FM is still represented using a declarative paradigm, but an additional structure is used to store and reason about configuration-level QAs. This is the case with the SPL Conqueror, HADAS and QAMTool tools. SPL Conqueror creates a performance model by using sampling and aggregation techniques and uses this model to approximate a near-optimal configuration. The HADAS tool uses Clafer plus a relational database, and the QAMTool uses the NFR framework [56] to externally represent QAs in a goal model. For algorithms generating optimum configurations (labelled with A: in Table 1), a genetic algorithm is usually complemented with a representation of the FM as genes and with a measurements database with the feature-level QA measurements. In some cases, a declarative solver is also used, as in the SATIBEA algorithm, which is defined as a combination of an SAT solver and the IBEA genetic algorithm and the GIA algorithm that uses a Z3 solver. In [35], authors discuss the benefits and drawbacks of approaches to defining an external quality model with two important conclusions: (1) most existing solutions are not directly compatible with automated quality-reasoning, and (2) SPL reasoning lacks a "unified" model that appropriately supports quality metrics. STEAM uses abduction and deduction reasoning. Our CT framework defines a unified model with native support for quality reasoning, although the algorithms must be provided at run-time, as they are not pre-established in CT tools. Automatic Quality Reasoning. All SPL tools (labelled with T: in Table 1) offer some level of model analysis operations. Clafer-Moo, FAMA, FeatureIDE, pure::variants and STEAM provide implementations of all or a subset of the operations defined in [5] (e.g. satisfiability, type and number of features, type and number of model constraints, number of configurations). Others (e.g. SPL Conqueror, QAMTool, HADAS) use a third-party variability modelling language that provides such support. Algorithms (labelled with A: in Table 1) focus on optimisation, and these model-analysis operations are out of their scope. Regarding quality-aware operations, current approaches do not natively support the complete set of quality-aware operations. Native support would mean that the variability model implements quality-enriched operations as primitives. Regarding the aggregation function and the optimal search operations, the support is variable, as shown in Table 1. The operations supported by ClaferMoo are almost as complete as in our approach. It supports both addition and product aggregation functions, and it supports all the optimisation operations under consideration in this paper. pure::variants also supports addition, product and mean aggregation functions, although approximation arithmetic equations are not supported, and thus, reasoning about the combination of several quality attributes is not possible. Neither optimal search operations are supported. SPL Conqueror supports addition, product and some equations and allows optimal search operations with maximum and ranges. FeatureIDE, QAMTool and HADAS do not provide any support for optimisation. Regarding the genetic algorithms, they approximate optimal configurations using sampling strategies and considering feature-level QAs. They all support the addition aggregation function and the maximum, minimum and multi-objective search operations. They do not support range optimisation. Again, our CT framework has the potential to support all the quality-aware operations discussed in Table 1, but the reasoning algorithms must be implemented in CT and provided at run-time.
There are other works related to the quality of SPLs

QUALITY-AWARE ALGORITHMS FOR CATEGORY THEORY FRAMEWORKS
Our objective with this section is to provide a running alternative to the tools and algorithms of Table 1 covering as many operations as possible. Consequently, we use our CT framework for SPLs to define the algorithms necessary for the operations discussed in Section 2. Although this part of our work can be applied to any quality-enriched SPL, we highlight that this work was developed in the context DAEMON project, and other projects that apply SPLs to networking, IoT and Edge-computing systems (see the acknowledgement section). The contributions of this work make possible practical use of SPLs for energy-aware orchestration of VNFs, which had been impossible or at least much harder with current SPL approaches with a limited capacity for quality-aware reasoning of configuration-level quality attributes like energy footprint.

Foundations of Category Theory
Category Theory (CT) is an algebraic theory of mathematical structures [3]. It allows to capture and relate similar structures while abstracting from the individual specifics of their dissimilarities. A category C represents spaces as a collection of objects with functional relationships via arrows (i.e., morphisms). The key concepts of CT are: • Object: a structured class ∈ Ob(C), graphically depicted as a node • . • Arrow: a structure-preserving function depicted • − → •.
-Identity: for every ∈ Ob(C), we have the arrow • Functor: a process F between categories and depicted • − → •, which preserves identity and function composition.
Also, we shall introduce algebraic data integration CT concepts [7]: • Path: a finite sequence of composed arrows: where is a select "unit" object.
• Instance: a set-valued functor assigning values to elements.

Unifying Variability and Quality in a Categorical Model
In [35], we detail a CT framework that unifies numerical VMs with QAs as a category where features and QAs are objects, and data types, hierarchical relationships, and quality and feature constraints are arrows. We use this model to represent SPLs as categories. The transformation is graphically represented in Figure 2, being the basis for the algorithms for quality-aware operations that we detail in the next subsection. Concretely, our framework comprises 3 data-type objects (i.e., Boolean, Integer, and String for characters sets) and 5 structured objects. Figure 2 helps an tiny example based on the SPL represented in Figure 1 model.
The most important one is the Schema, which defines the unified variability and quality model structure: elements, properties, hierarchical relationships and structural relationships. We can think about them as arrays of variables without set values. Naturally, Features represents any feature domain of FMs, Feature Level Qualities represents extended FMs attributes, and Qualities the items present in quality models (i.e., similar to features in an FM). The rest of the elements are sets of identifiers or a set of related identifiers (Binary Relationship). Those 3 last elements are necessary to relate configuration-level QAs to the respective set of features forming a specific Configuration.
The rest are Instances of certain Domains of the elements of the schema. In other words, they populate the schema. For simplicity, the schema would be blank FM, and the instances are the names of the features, cardinalities, etc.

Quality Operations in Category Theory
Considering that we can use the variability and quality modelling framework within CT and that we analysed the quality-aware reasoning operations, we need to find a flexible CT reasoner that supports the implementation. Consequently, we choose the CT state-of-the-art tool: the Categorical Query Language (CQL) IDE 3 . CQL is a functorial language used for functional programming based on lambda calculus. While for low-level details we kindly point the readers to [29], we present now an overview of the CQL main assets: • Basic data types and functions are defined as global objects and arrows (e.g., B for boolean domain). • A structured category is a schema of objects and different types of arrows (e.g., Figure 1). • A functor is a query over an input schema to an output schema. For composed reasoning, the input schema of an intermediate functor must be the same as the output schema of the previous functor. • A literal instance generates variables and assigns the values.
• The reasoning is an eval instance of a schema literal.
Having all the necessary background, we can now implement categorical reasoning in CQL IDE. We repeat the same sequence, hence starting with model analysis operations.
In Algorithm 1 we merged all the self-analysis operations in a single operation called Model Report. Its inputs are certain categorical objects of the model: the features (i.e., F s), the complete configurations identifier (i.e., CCs), the QAs identifier (i.e., QAs) and their relationships in the Quality Measured Configurations (QMC). Its outputs are calculated with a composition of 9 lambda functions, which sequentially are: (1) Number of boolean and numerical features given by instantiated elements in F s. While we defined model report as operations directly performed on the model (i.e., on the feature space), some reasoners perform them on the configuration space, once configurations are generated. While the feature space is more complex, is also more scalable.

Figure 2: Unification of Extended Variability Models and Quality Models into a Category
The flexibility to functionally configure a reasoner is one of the main advantages of CT tools, and that alongside compositions is a declared need for advanced reasoning of SPLs [52]. Next in line is Algorithm 2 where we approximate the configurationlevel value of each feature-level QA based on its specific aggregation function. The method consists of going over every configuration identifier in CCs, and retrieving their respective features, attributes and functions (originally located in F s). That retrieved information is the input of a lambda function, which simply runs aggregate functions with their related attributes (i.e., ( )).
Finally, we present Algorithm 3 where we search for configurations with desired QAs. To cover all types of QAs in this algorithm, we pre-composed Algorithm 2, whose results are temporarily stored in the extended QMC (i.e., eQMC) structured object. Provided that, the algorithm goes through every configuration identifier where the lambda function filters them based on the provided objective function. For clarity reasons we simplified the resulting data as only generating identifiers, but if we needed feature names and final QA values, we would need to provide and access F and QA objects within the algorithm.

EMPIRICAL VALIDATION
In this section, we are going to test Section 4 algorithms, comparing their capabilities and reasoning times with the most complete solvers, in the context of VNFs orchestration and configuration. The concrete research questions are: RQ 1: Which is the level of empirical support that the state-ofthe-art currently has to represent and provide reasoning to qualitymeasured VNFs?
RQ 2: Is our CT framework for SPLs a feasible alternative to analyse and optimise VNFs orchestration? RQ 3: How do the alternative tools scale for the complete set of quality-aware reasoning operations present in SDNs systems?   Table 2 shows the 5 SPLs we have used for validation. They are all real-world SPLs with different properties (e.g., size) as a means to reinforce the results and conclusions. As far as we know, our approach is the only work that applies SPLs to SDN/NFV domain, so we had to use models from other domains as validation objects. Also, quality-measured virtualised networks use cases are not available in the literature, so 3 of the SPLs are from different domains, but, they are third-party use cases, well-known in the literature, and share certain quality-reasoning requirements like Cost in $ and Battery minimisation. We present them ordered by the size of their configuration space. The smallest is P , taken from [28] and including the QAs additive Cost in $ at the feature-level and Time in seconds at the configuration-level. With 6 times its configurations space, we have T , which we extracted from [45] and extended with random values of a multiplicative Size in squared metres as a featurelevel QA. A larger case with a measured space 449 times bigger is J H , which already included the QAs: Usability, Battery, and memory Footprint as additive QAs at the feature-level, and textit-Compileable, a binary QA (i.e., yes or no) at the configuration-level. The largest model is the complete version of the one represented in Figure 1, the DAEMON's VN S SPL. For completeness, we also included the reduced version represented in Figure 1. We distinguish them as VN S and VN S respectively. VN S comprises 40 boolean and 3 numerical features, 64 logic and 4 arithmetic constraints, and space of 2+ million configurations. Its QAs are the same as in Figure 1; consequently, its QAs space is of 10+ million configurations measurements. Considering our analysis summarised in Table 1, we selected the most complete open-source tools for each group of reasoning operations: (1) ClaferMoo [39] due to its support of feature-level QAs and certain flexibility to define functions; (2) AAFM Python framework for its speed at the cost of not supporting QAs, (3) SATI-BEA [23] as the representative of IBEA based genetic algorithm for optimisation due to its documentation and user support, and finally (4) CQL IDE, the state-of-the-art CT tool. The different models and data-sets are available at: https://github.com/danieljmg/SPLC22

Methodology and Setup
We ran the presented SPLs and tools on a desktop computer comprising an Intel(R) Core i7-4790 CPU@3.60 GHz processor with 16 GB of memory RAM and an SSD running an up-to-date Windows 10 H22H1 X86_64 with the latest supported versions of the tools and shared libraries (e.g., Java JDK 18.0.2). Besides double-checking the internal statistics of each tool, we measured reasoning time with the Windows PowerShell Measure-Command {. . .}. Initial JAVA virtual machine overhead was purposely removed, and it is not affecting the time results.

Self-analysis and optimisation operations results
In Table 3 we present the first set of results in the form of reasoning time in seconds; as SATIBEA does not currently support these operations, it is not present in this comparison. We grouped them into 4 operations which should return similar information to what has been presented in Table 2. While AAFM Python framework reasoning is fast, it did not support QAs. Likewise, ClaferMoo did not support QAs at the configuration-level. When an operation is completely unsupported, we show in the table Unsupported. However, if the support is partial, they are tagged with an asterisk. In Table 4 we present the second set of results that involves the aggregate reasoning. Each column is a variation of the aggregate reasoning, and they are ordered from the simplest to the most complex. The first row is a direct aggregation without constraints of all the QAs of that SPL. The Constrained row implies reasoning by randomly excluding one feature. Similarly, the Range row limits the values of a (feature-level) QA. Note that we are not constraining based on the aggregated total value but the individual feature-level QA value. In these results, an asterisk means that the reasoner did not support the specific aggregate function. As detailed in Table 1 of the related work in Section 3, ClaferMoo only supports addition and product in Z domain, while SATIBEA just addition. Hence, they did not support any VN S aggregate function (i.e., maximum, mean and minimum). In those cases, we swap the domains to Z Addition to allow time comparisons. The last set of results is in Table 5, where we perform an optimisation operation with different types of objectives. We first compose the aggregations for the feature-level QAs. In the first row, we Min/Maximise individual QAs and average the runtime results. In the second row, we defined the following multi-objectives: • P : Minimise Time and total Cost. • T : Minimise total Size 1 * Size 2 .

• J H
: Maximise compileable and total Usability ∧ Minimise total Battery and Footprint. • VN Ss: Maximise total Usability and Security ∧ Minimise total Dependency, Time and Energy. As we can see, in the case of T we duplicated its single QA Size into Size 1 and Size 2 in order to be able to perform a multi-objective test. The weighted objectives for the third row were similar but included random weights for the different QAs (e.g., Minimise VN S 0.3*Time, 0.7*Energy). Finally, for the last row we defined the following objectives: • P : Minimise total Cost per second.
In this case, an asterisk indicates that the reasoner did not support modelling configuration-level QAs. In that case, we provided random values to allow some level of comparison. Nevertheless, ClaferMoo and SATIBEA do not currently support weighted and new domain objectives. SATIBEA sampling parameters are configured as suggested in the documentation: 20000 evolutions [23].

Discussion and Scalability Results
In this subsection we answer the RQs by considering the results of Tables 1 to 5. The goal of RQ1 is to assess if we can successfully apply SPL tools to reason about quality-aware orchestration of VNFs, and in general to the SDN/NFV domain [21]. Only those SPLs tools that include numerical features, complex constraints and support reasoning and optimization of configuration-level QA apply to this domain. While in the current situation we find academic tools for very specific and basic reasoning, in practice the network industry will discard them as they are not enough by themselves. Additionally, those tools are not directly configurable or extendable, and there is where CQL IDE with our CT operations highlights. The RQ1 answer is that the best current tools could be viable if they are extended like we are doing with CQL IDE, that is, they provide a unified solution beyond boolean features, logic constraints, additive attributes, and max/minimise optimisation.
The RQ2 answer is that our algorithms feasibly extend CQL IDE for all the quality-aware operations, and hence it provides support for managing QoS in VNFs orchestration. Further, we should mention that CT knowledge is uncommon in the industry, which is required to properly adjust such flexible tools.
Regarding run-times, if we are only analysing SDNs variability, AAFM Python Framework is the fastest with a worst performance of 60 seconds. Similarly, for simple aggregation and optimisation functions, SATIBEA is the fastest. This was expected, as it always works with the same number of samples, and increasing them is not translated to higher accuracy as is stated in the literature [23]. ClaferMoo tends to be the slowest, sometimes due to how their reasoning algorithms work, like counting by enumeration [36]. However, it covers more operations than the average. Nevertheless, we consider CQL IDE the proper alternative for the DAEMON project, as it is among the fastest ones while supporting all sorts of quality-aware reasoning. In summary, and answering RQ3, all the solutions scale linearly, but without being always the fastest, CQL IDE shines for its large application domain.

Threats to Validity
Internal Validity. To control randomness, we repeated the experiments 97 times and averaged the results for a confidence level of 95% with a 10% margin of error [49]. Additionally, we are aware of the need to extend this validation with tool's accuracy based on their specific reasoning techniques. Nevertheless, the aim of this work is to provide the means to tackle the complete VNFs reasoning casuistic, and not to provide a more accurate alternative to partially capable approaches. Consequently, we consider the evaluation sufficient for its purpose. External Validity. One could argue that not all the evaluated SPLs are NFVs, but well-known models with registered complex quality measurements are rare in the SDN literature. Consequently, by choosing the real-world SPLs of Table 2 we pretended to cover a variety of properties, QAs and functions commonly found in VNF cases. Nonetheless, we are aware that they do not cover every possible casuistic individually. While one could claim that large spaces are not enough, and colossal spaces should be tested, we should mention that larger spaces are very rare for VNF orchestrators. The problem in SDN systems is the complexity of the reasoning and not the size of it. Testing our algorithms with just one CT reasoner could be another threat. The problem is that CT tools are rare due to the intrinsic abstraction and knowledge requirement.

CONCLUSION AND FUTURE WORK
The domain of SDN and NFV, Edge computing and IoT is challenging for quality-aware reasoning of configurations. AAFM provides reasoning tools and algorithms that we can apply to improve the quality of service, being energy efficiency the most critical in those domains. However, we found limitations when applied to the context of VNFs orchestrations in the DAEMON project. In short, there is a lack of understanding, methods, and tools designed explicitly for advanced quality-aware analysis and optimisation that consider interactions between feature and configuration-level QAs.
In this work, we start by uncovering the quality-based reasoning operations necessary in the DAEMON project and grouped them into: model analysis, aggregation functionality, and optimisation based on objectives. We follow by analysing the state-of-the-art of AAFM methods and tools that supports any share of those operations and summarised the outcomes in Table 1. As we found the need for a complete alternative, we defined and implemented in CQL IDE the quality-aware reasoning algorithms of those operations for our CT framework for SPLs. Next, we empirically tested the state-of-the-art with our proposal for 5 different real-world SPLs with several QAs and up to 20 quality-aware reasoning operations.
For RQ1 we conclude that current tools could be viable if they are extended like we are doing with CQL IDE, that is, they provide a unified solution. For RQ2 we state that a CT tool like CQL IDE has the flexibility and potential to cover all the operations, but its feasibility depends on having CT knowledge in the team -as in our case with the DAEMON project. Finally, in RQ3 we highlight that the selection of the reasoning tool will depend on the set of operations that the SDN system needs; while all the tools scale linearly, some of them are faster than others for specific operations (e.g., SATIBEA for basic near-optimal search). As a final statement, if the objective is that VNF orchestrators automatically rely on AAFM, all the tools in the current literature need to extend their support beyond boolean features, logic constraints, additive attributes and Pareto optimisation. Our CQL IDE algorithms solves that.
As an extension, we plan to analyse the trade-off between scalability and accuracy in optimisation operations. Additionally, we also plan to implement sampling and learning techniques in CQL IDE, as well as exploit other tools. Finally, to minimise CT and variability expertise, which is uncommon in SDN domain, we are developing a modelling front-end for CQL IDE.