Automated optimal firewall orchestration and configuration in virtualized networks

Emerging technologies such as Software-Defined Networking and Network Functions Virtualization are making the definition and configuration of network services more dynamic, thus making automatic approaches that can replace manual and error-prone tasks more feasible. In view of these considerations, this paper proposes a novel methodology to automatically compute the optimal allocation scheme and configuration of virtual firewalls within a user-defined network service graph subject to a corresponding set of security requirements. The presented framework adopts a formal approach based on the solution of a weighted partial MaxSMT problem, which also provides good confidence about the solution correctness. A prototype implementation of the proposed approach based on the z3 solver has been used for validation, showing the feasibility of the approach for problem instances requiring tens of virtual firewalls and similar numbers of security requirements.


I. INTRODUCTION
Software-Defined Networking (SDN) [1] and Network Functions Virtualization (NFV) [2] are new technologies, designed to introduce flexibility in networking.SDN enables the runtime definition of the paths that traffic flows must cross, while NFV enables virtualized network functions, installed on general-purpose servers in the cloud.These features allow service designers to define the intended network services by means of Service Graphs (SGs) representing the involved service functions and their interconnection.
In a virtualized network, security automation is becoming more feasible thanks to the intrinsic agility of this environment, and to the full software-based control of each network component it allows.Nevertheless, automation of security defenses is still only partially addressed in literature [3].
A commonly time-consuming and error-prone security task that could be heavily eased by an automatic approach is the placement and configuration of the Network Security Functions (NSFs) [4] that must be introduced in order to satisfy some Network Security Requirements (NSRs) -i.e., the security constraints the network behavior must respect -.For example, isolation of a compromised network node could be required after an attack, and this new requirement can be fulfilled by properly placing and configuring NSFs that enforce it.As performing this task manually can lead to incorrect or non-optimal results, automation can be exploited not only to save human effort, but also to get provably correct and optimal solutions.Formal correctness of the achieved solution represents an added value, because it does not require further manual checks by the user, who can rely with a high level of confidence on a solution generated according to this approach.Optimality, on the other side, leads to a minimization of the employed computational resources and to a maximization of performance.In view of these motivations, in this paper we propose a new security automation methodology for virtualized networks, and we provide its validation.The main aim is to automatically define the optimal allocation scheme and configuration of virtual firewall instances, by refining a SG provided by the service designer so that it fulfills a set of given NSRs.The internal definition of this problem as a partial weighted Maximum Satisfiability Modulo Theories (MaxSMT) problem provides at the same time formal assurance about the correctness of the solution and optimality.To the best of our knowledge, this is the first time an approach with all these features together -i.e.automation, optimality, and formal correctness assurance -is proposed.
The focus of the proposed methodology is on packet filters, which represent the most common firewall technology and the most frequently exploited security defense in networks.
The remainder of the paper is structured as follows.Section II explains how our methodology is designed, Section III presents some performance tests carried out on a prototype implementation.Finally, Section IV and Section V contain the related works and the conclusions.

A. Problem statement and solution strategy
The problem addressed in this paper is to automatically compute, in a formally correct way, the optimal allocation scheme and configuration of packet filtering firewalls in a SG, in such a way to satisfy a set of NSRs.
According to our approach, optimality means: 1) minimize the number of allocated virtual firewall instances in the SG, in order to minimize the resources consumed by the SG; 2) minimize the number of rules inside each firewall configuration, in order to reduce the memory required to store the rules and, at the same time, improve the performance of the filtering operations.The second goal has lower priority than the first, since deploying a new virtual machine requires more memory and introduces more overhead than adding a filtering rule in an already deployed firewall.Here we do not address the problem of optimal allocation of the SG onto physical servers because we already addressed it in a previous work [5].Our strategy to reach the goals of formal correctness and optimality consists of adopting a formal model of the virtual network behavior and a procedure to search for the optimal solution which guarantees correctness by construction according to this formal model.More precisely, our idea is to formulate the problem as a partial weighted Maximum Satisfiability Modulo Theories (MaxSMT) problem, which receives two different sets of clauses, hard and soft, for which partial satisfiability has to be achieved.Hard clauses need necessarily to be satisfied, representing essential commitments which nonetheless contribute to the reduction of the space of all the possible solutions.On the other hand, soft clauses are relaxable constraints, because their satisfiability is not strictly required; in this sense, their presence makes the SMT problem partial.Moreover, in a partial weighted MaxSMT problem each soft clause is characterized by a weight, and the problem consists not only in establishing partial satisfiability but also in finding, among the assignments that satisfy a partial set of clauses, one that maximizes the sum of the weights of the satisfied soft clauses; consequently, when an optimizer engine tries to solve a partial weighted MaxSMT instance, it assigns priority to the soft clauses characterized by higher weights.In the remainder of the paper, for simplicity, the word MaxSMT stands for partial weighted MaxSMT.
Given this context, the inputs of the problem that must be formally modeled as hard clauses are: 1) a graph, describing the network functions of the virtual network and their interconnection; 2) a set of Network Security Requirements (NSRs), describing which traffic flows must be allowed or blocked in the virtual network.The positioning of firewalls in the graph and their configuration, instead, have to be encoded as soft clauses, because they are subject to optimization.
If at least one of the hard clauses cannot be satisfied, then the outcome of the MaxSMT problem is UNSAT, i.e. partial satisfiability does not hold.If instead all the hard constraints can be satisfied, then the outcome is the optimal firewalls allocation scheme and the automatically computed configuration of their Filtering Policy (FP), that is the set of the mappings between the conditions to check on the packet fields and the actions to perform about forwarding.These outputs are achieved by targeting the two optimization objectives that have been stated at the beginning of this section.
The remainder of this section provides more details about the inputs and outputs of the methodology and their First-Order Logic (FOL) formulation in the MaxSMT problem.Finally, a complete example is provided to clarify how our automated approach can heavily impact the design of a network security service.

B. Service and Allocation Graph
The SG is a directed graph G S = (N S , L S ), where N S is the set of vertices representing the network nodes, while L S is the set of edges representing the directed connections between the nodes.In particular, the vertex set is modeled as N S = E S ∪ S S .E S is the set of end points that can directly correspond to a terminal, a physical server or a subnetwork in the substrate infrastructure where virtual instances of the functions will be allocated.Instead, S S is the set of service functions; the elements in S S are simple Network Functions (NFs), that do not offer any security protection from cyberattacks, but are simply exploited to create an end-to-end service.
The SG is fed to an automatic tool that generates an intermediate internal representation, called Allocation Graph (AG), which is obtained from the SG by adding new nodes called Allocation Places (APs).Each AP is a tentative position in the graph where a firewall instance could be allocated.In general, it is necessary to insert an AP for each edge of the SG in order to explore all the possible placements and hence have the assurance of eventually reaching the optimal solution.However, the service designer can introduce some placement constraints as additional inputs, clearly relying on his security knowledge.In particular, these constraints can be divided into two categories: (i) the user can explicitly forbid the generation of an AP, thus reducing the solution space (ii) the user can force the allocation of a firewall on an AP, for example because the corresponding VNF is already deployed in the virtualized network.Fig. 1 shows an example of AG, automatically generated from a SG where the service designer forbids the creation of the APs between the function f 9 and the end points e 5 , and e 6 .
Formally, the AG is another directed graph G A = (N A , L A ), where N A is the set of vertices representing the network nodes, while L A is the set of links interconnecting them.The main difference with respect to the SG is that the vertex set is defined as N A = E A ∪S A ∪P A , where E A = E S and S A = S S , while P A is the set of the APs, that are absent in the SG.In G A , each n k ∈ N A is identified by a unique index k, so that each l ij ∈ L A , with i = j, represents the directed link from node n i to node n j .
For each NF in the AG, a formal model of the NF forwarding behavior is defined.The forwarding behavior is the only aspect of NFs that is really relevant for the definition of firewall placement and configuration in the SG.Other aspects of the packet processing performed by the NFs that do not influence the forwarding behavior can be safely neglected.More precisely, in order to keep the models simple, we track only the possibility of the various forwarding actions taken by each NF for each packet, rather than representing all the details of the decision algorithm.Hence, these models are based on two predicates, that represent the possibility that each packet is received or forwarded by any node.These predicates are defined for each n i , n j ∈ N A and p 0 ∈ P , where P is the set of all packets: (i) recv(n i , n j , p 0 ) which is true if node n j can receive a packet p 0 from node n i ; (ii) send(n i , n j , p 0 ) which is true if node n i can send a packet p 0 to node n j .
In view of this consideration, some of the NFs have an extremely simplified model, according to which each packet can be forwarded to each out-port without modifications.For example, traffic monitors belong to this class of NFs, because they forward all packets to their out port without modifying them.Another example is a load balancer, because, even though it implements a specific algorithm by means of which it distributes the traffic to different servers of a cluster, it is not possible to establish beforehand how each flow will be effectively managed.On the other hand, other functions, such as a NAT, have more complex behaviors that require specific models.This kind of modeling of NFs, expressed by means of sets of FOL clauses, has already been proposed and validated in literature for network verification [6]- [8].The same approach is reused in our work, by feeding the FOL formulas that represent each NF forwarding behavior to the MaxSMT solver as hard clauses.For firewalls, the model is slightly different, because each firewall can be present or not.This will be detailed in section II-D.Finally, graph edges are also expressed as hard clauses involving the send and recv predicates.

C. Network Security Requirements
Concerning the security requirements to be enforced in the network service, our methodology focuses on connectivity requirements, i.e., the specification of which traffic flows must be allowed (or prohibited) between any pair of end points in the SG.These security constraints represent the second input of the framework and are characterized by two elements: (i) a general behavior representing the default rule applied to traffic flows for which the user does not specify any further indication; (ii) a set of specific Network Security Requirements (NSRs), each one specifying whether a traffic flow must be allowed (reachability requirement), or must be blocked by a firewall (isolation requirement).
There are three approaches a service designer can adopt for the definition of the security constraints.Two are based on the traditional whitelisting and blacklisting methods, i.e., all traffic flows must be blocked (in the former) or allowed (in the latter) with the exception of the communications for which the user explicitly defines some reachability (in the former) or isolation (in the latter) requirements.In the third available approach, called specific, the service designer must explicitly formulate only the requirements -both isolation and reachability specific properties -he is interested in.Therefore, the optimal solution will be computed in order to satisfy exclusively this specific set of constraints, while for the unspecified cases the system will automatically decide whether to allow or forbid the flow.Within this approach, we assume that the entire set of security requirements is conflict-free, since this can be easily obtained from a general set by means of well-known conflict analysis techniques proposed in literature [9]- [11]; in this way, the security constraints do not require a priority criterion in their formulation.
Formally, if R is the set of all the NSRs that must be fulfilled, each r ∈ R is modeled as a 6-tuple r = (type, IPSrc, IPDst, pSrc, pDst, tProto) where type is the requirement type, which can be isolation or reachability, while the other elements are the typical IP 5-tuple values (source and destination IP addresses, source and destination port numbers, transport-level protocol) that specify a packet flow.
The NSRs contribute to the definition of the hard clauses of the MaxSMT problem.Before presenting their FOL formulas, however, two notations must be introduced.The first one, addr(e k ), is the function that maps an endpoint e k ∈ E A to its IP address if it is a single host or to its IP address range -e.g. 10 Having introduced these notations, the hard clauses for enforcing the NSRs in the AG are defined as follows.On one side, if r ∈ R is an isolation property, all the pairs of end points e i , e j ∈ E A such that addr(e i ) ⊆ r.IP Src ∧ addr(e j ) ⊆ r.IP Dst are identified.For each pair of nodes e i and e j thus identified, then the following constraints must be satisfied: Both clauses are needed to enforce an isolation property: (1) imposes that the source can send at least one packet matching the requirement to every neighbor; (2) imposes that all the packets that can be received and accepted by the destination do not match the requirement.On the other hand, if r ∈ R is a reachability property, for each pair of nodes e i , e j ∈ E A such that addr(e i ) ⊆ r.IP Src ∧ addr(e j ) ⊆ r.IP Dst, the following constraints must be satisfied: These two clauses introduce different commitments than those defined for an isolation property: (3) imposes that at least a packet that can be sent by the source to one of its neighbors matches the requirement; (4) imposes that at least a packet that can be received and accepted by the destination matches the requirement.

D. Firewalls allocation and configuration
In case of success, the first outcome must be the optimal allocation scheme of the firewalls in the AG.This result is achieved by considering the possibility to allocate an instance in each available AP. Since the best solution would be to allocate the least number of firewalls, in the MaxSMT problem a soft constraint is formulated for each p k ∈ P A so that the optimal value of the allocated(p k ) predicate -which is true if a firewall is allocated in p k -is false.This clause is formalized by (5), where the notation Sof t(x, c k ) specifies a soft constraint with formula x and weight c k .
The second expected outcome is the automatic configuration of the allocated firewalls; in this context, the firewall FP is characterized by a default action and a set of more specific 5-tuple-based rules.
First, the default action is established so that the number of filtering rules is minimized, as it has been explained beforehand, according to the approach the service designer exploits for the formulation of the NSRs.Then, for each firewall allocated in p k ∈ P A , a set of placeholder rules Π k must be identified, i.e., the maximum number of rules that could be needed in its FP is established with respect to the input security requirements.This step, which is critical to achieve good scalability, is performed by means of a number of pruning strategies.The main two ones are: (i) given a specific NSR, in a firewall policy a corresponding placeholder rule is not needed if the traffic flow related to this requirement cannot cross the AP on which the packet filter is tentatively allocated; (ii) given a specific NSR whose traffic flow can cross the AP, a placeholder rule is not needed anyway if the default action of the firewall that would be allocated there already enforces the requirement (e.g. a whitelisting firewall guarantees the satisfiability of an isolation property, if the specific rules are properly configured, as they are in an optimal configuration).
Moreover, the wildcards feature has been introduced to further reduce the cardinality of the maximum set of placeholder rules.This feature allows us to represent both an IP address and the netmask in a joint expression: for instance, the 10.0.0.* statement refers to the network 10.0.0.0/24.Besides, it can be also applied to transport-level ports and protocols.In view of this consideration, if some NSRs that would effectively require corresponding filtering rules in the same firewall can be merged in a single one by means of wildcards, then it is possible to assign a single placeholder rule for all these requirements, as long as this decision does not have any impact on the satisfiability of the other ones.
After identifying the maximum number of placeholder rules by means of the aforementioned algorithms, two different classes of soft clauses are defined for policy configuration.First, in order to minimize the total number of configured rules, for each placeholder rule π i ∈ Π k of a firewall that can be allocated in p k ∈ P A , a soft constraint is defined so that the optimal value of the function configured(π i ) -which returns true if π i is configured in the policy -is false: In order to enforce the wanted priority between the two minimization objectives, the weights of these soft clauses are decided so as to satisfy constraint (7).
A second class of soft clauses is, instead, introduced to specify that using wildcards has to be preferred for each single component of each filtering rule; in fact, wildcards are useful not only to reduce the number of placeholder rules, which is done in the pre-processing phase, but also to reduce the number of rules in the solution of the MaxSMT problem.( 8) and ( 9) define the soft clauses related to wildcards usage in each one of the four components of IP addresses in quaddotted notation.Similar soft clauses are defined also for the transport-level ports and protocol.
In our approach, the use of wildcards for each rule component has lower priority than the absence of rule itself.Consequently, constraint (10) must be respected for each The set of clauses so built is analyzed by the MaxSMT solver.If partial satisfiability holds, the optimal allocation and configuration of firewalls is returned.Instead, if partial satisfiability does not hold, a non-enforceability report is returned.A possible reason for this condition is that the APs are not sufficient because of additional constraints introduced by the user to prohibit their creation.This report can then be exploited for a next run of the tool, after the inputs have been properly updated.

E. Clarifying example
The most relevant features of our approach can be clarified by means of a sample scenario, where a manual configuration would be easily prone to human errors.For this purpose, let us consider Fig. 1 as the AG generated from an input SG.It is worth mentioning that the end points e 3 and e 4 are not single hosts, but subnetworks.Table I illustrates: (i) how each SG node is mapped to an equivalent single IP address or address range and the function type of the node; (ii) the NSRs to satisfy, defined through the specific approach.
First of all, let us focus only on the first two constraints of the NSRs list in Table I, that are the isolation requirements for the end points e 1 and e 2 , shadowed by the NAT f 7 .Since the service designer requires that they are isolated from the two services e 5 and e 6 , at least a firewall is needed.Considering for the moment only the APs p 10 , p 11 and p 12 and supposing that no other requirements are specified, then our methodology would place a single whitelisting firewall on p 12 , since it would be able to filter all the packets coming from the NAT, thus reducing the number of firewalls -the non-optimal alternative which a service designer may instead consider to adopt would be to place two firewalls, one in p 10 and the other one in p 11 .
Then let us consider also the other NSRs, except the last one of the list in Table I.On one side, e 3 must be able to reach the HTTP web server e 5 at the TCP destination port 80, e 4 must be able to reach the POP3 mail server e 6 at the TCP Fig. 2: Final Service Graph with allocated firewalls destination port 110; all the other traffic between these pairs -i.e.TCP with different destination port or UDP -must be blocked.Since the paths from e 3 and e 4 towards the server intersect in p 15 with the paths from e 1 and e 2 , the optimal solution would be to allocate a firewall in that position.
Finally, according to the last NSR of Table I, e 4 must not be able to contact e 3 .However, neither the path between them crosses p 15 , where the previous discussion led to the decision to place a firewall, nor it is possible to identify a single other intersection between all the paths that the possible traffic flows that must be considered pass through.Consequently, the only solution is to allocate an additional firewall, either in p 13 or in p 14 , which would block the packets from e 4 .
In this process, if each decision is taken manually by the designer, several mistakes can be made while defining firewall allocation and configuration.For example, since among the same pairs of end points different constraints are defined for different traffic flows, a manual approach could likely introduce shadowing or correlation anomalies [9], which would lead to an incorrect security service.Moreover, even though the designer manages to reach a correct solution, it could be a non-optimal one.Instead, using the framework we developed according to the methodology we proposed, it is possible to reach the optimal and formally correct solution.For the sake of completeness, Fig. 2 shows the final logical topology of the SG computed by our approach, whereas Table II describe the FPs of the two introduced firewall instances.In these table, the D letter is used to identify the firewall default action.

III. IMPLEMENTATION AND VALIDATION
We implemented our approach by means of a Java framework, which exploits the APIs offered by the z3 theorem prover [12] to formulate and solve the MaxSMT problem.The framework offers a REST APIs, so that it can be easily integrated as a component of more complex architectures.
The validation of the developed framework has been performed by means of scalability tests, which have been run on a machine with Intel i7-6700 CPU running at 3.40 GHz and 32GB of RAM.The parameters that have been considered are the ones that mostly affect the complexity of the problem (i.e. the number of clauses): (i) the number of APs where the firewall instances can be allocated (ii) the number of NSRs.We cannot compare our approach with alternative existing approaches because, as explained in section IV, no other approach solving the same problem exists.
The charts in Fig. 3a and 3c present the results of the tests performed to evaluate execution time versus number of APs and number of NSRs.The security requirements considered for the tests are defined in the context of a specific approach and only functions that do not modify packets are considered, so that the validation is focused on the two metrics of interest.Besides, for each test case with a given number of APs and NSRs, we compute the median computation time on 30 runs, where the service and the requirements are the same but only the IP addresses are different.This is motivated by the experimental observation that computation time can vary if the IP addresses are changed, which is due to how z3 internally manages the integer theory.For this reason, we also show the experimental results by means of the whisker plots in Fig. 3b  and 3d.The number of NSRs in Fig. 3b, and the number of APs in Fig. 3d are fixed to 30.
The most evident result from this validation is that, even though the MaxSMT problem belongs to the NP-complete class in terms of worst-case computational complexity, the framework can scale to SGs of tens of APs and a number of NSRs that is expected in a service of this dimension.It is also possible to notice that an increment of the NSRs number produces a computation time comparable to the one produced by the same increment of the APs number.Furthermore, the two whisker plots show how most of the values are gathered around the expected median value.Moreover, memory consumption is not an issue because, in the worst case which has been considered, it is only 10.3 MB.All these positive results that have been showed in this section are mainly due to the correct tuning of the optimization parameters and the pruning strategies we adopted, which reduce the solution space.In fact, even though some possible solutions are not evaluated by the MaxSMT solver because of these strategies, nevertheless they would not be considered optimal.

A. Automatic configuration and verification of firewall policies
In literature, the automatic configuration of firewalls and the formal verification of their policies represent a central research area in the network security field.A milestone is represented by Firmato [13], a firewall management toolkit that performs a refinement of high-level filtering requirements.Other similar works are [14] and [15], which can automatically generate rule sets also in distributed firewall architectures.Despite the relevance of these works, they are mainly targeted to traditional networks, rather than to NFV environments.Moreover, they do not provide formal correctness assurance.
Formal methods have been exploited in more recent works, such as [16]- [19].However, they have a number of limitations.[16] lacks optimality but also generality, since it is bound to IPChains and Cisco PIX.[17], [18] and [19] can only fix firewall misconfigurations rather than allowing the creation of rules from scratch.Furthermore, [17] and [18] do not focus on virtualized networks, while [19] works at an abstraction level higher than the actual policy rules.
Finally, in all the works mentioned so far, the decision about where to allocate the packet filters in the logical topology is not made by the tool, but it is assumed as input.

B. Automatic synthesis and refinement of a Service Graph
Other works address the automatic creation and refinement of a SG, according to a set of constraints, before its deployment.This topic is becoming central because of the growing interest in operational resilience based on NFV and intent-based networking [20] [21].Among the works regarding automatic synthesis of SGs, [22]- [25] define methodologies for intent-based generation of network services in virtualized environments.However, not only they lack formal correctness assurance of the achieved solutions, but the approaches described in [22] and [25] do not even target optimality.
The most relevant works that provide optimal or sub-optimal automatic placement of firewalls in a SG are [26] and [27].However, none of them can also optimally synthesize the rules of each placed firewall.[26] approximately minimizes the maximum number of rules for each firewall by means of a heuristic algorithm, without providing formal correctness assurance, while [27] computes the optimal placement using a formal model, also taking other aspects into account, but using an iterative approach where the constraints are tuned after each failed attempt.

V. CONCLUSION AND FUTURE WORKS
This paper presents a new methodology for automated firewall allocation and configuration that can be used to exploit the flexibility provided by virtualized networks.The proposed approach suits the work of a service designer, replacing manual tasks, and contributes to achieving a correct security configuration, by means of its formal approach, also finding the optimal solution among all the possible ones.Up to our knowledge, this is the first time an approach with these features is proposed.From the validation of the framework developed according to the described methodology, the approach has been shown to be feasible for problem instances requiring tens of virtual firewalls and similar numbers of security requirements.
Our purpose for a near future is to further refine the methodology, addressing the automatic allocation and configuration of other security functions, such as web application firewalls, anti-spam filters and VPN gateways.Besides, we are planning to improve the performance, by pursuing a trade-off between optimality of configurations and required computational complexity.Finally, we are planning experiments with real SGs.

Fig. 3 :
Fig. 3: Results of scalability tests on APs and NSRs .1.* .* -if it is a collection of end points.The second notation, r.match(p) is the predicate that is true if and only if packet p ∈ P matches requirement r ∈ R, i.e. if each requirement component positively matches or includesdepending if it is a single value or a range -the corresponding packet field.

TABLE II :
Policy rules