SCHE2MA: Scalable, Energy-Aware, Multidomain Orchestration for Beyond-5G URLLC Services

The evolution of Software-Defined Networking (SDN) and Network Function Virtualization (NFV) in the telecommunications industry have intensified the issues of network management at large scales. Dynamic service orchestration and adaptive resource allocation became a necessity for network operators to manage the rapid growth of users and data-intensive applications. The impact of network automation on energy consumption and overall operating costs is often overlooked. Guaranteeing strict performance constraints of Ultra-Reliable Low Latency Communication (URLLC) services while enhancing energy efficiency is one of the major critical problems of future communication networks, given the urgency to reduce carbon emissions and energy consumption. In this work, we study the problem of zero-touch Service Function Chain (SFC) orchestration for multi-domain networks, targeting the latency reduction of URLLC services while improving energy efficiency for beyond-5G networks. Specifically, we propose SCHE2MA, a Service CHain Energy-Efficient Management framework based on distributed Reinforcement Learning (RL), that can intelligently deploy SFCs with shared VNFs per se into a multi-domain network. Finally, we evaluate SCHE2MA through model validation and simulation while demonstrating its ability to jointly reduce average service latency by 103.4% and energy consumption by 17.1% compared to a centralized RL solution.

demand of the new high-performing communication networks. The networks grow at a rate that creates an urgent pressure for Infrastructure Providers (InPs) to adopt technologies for intelligent automated network management and orchestration. The lack of intelligent automation will cause additional costs and make it unfeasible for operators to guarantee the performance required by new 5G and Beyond-5G (B5G) network services.
The introduction of technologies such as Software-Defined Networking (SDN) and Network Function Virtualization (NFV) enabled the deployment of multiple virtual networks over a physical network infrastructure, with distinct performance and service agreement constraints. Moreover, this architecture enables the decoupling of service providers from the physical InPs. The virtualization of the network also introduced more flexibility and more possibilities to optimize energy efficiency. Mobile Edge Computing (MEC) enabled near real-time low latency but introduced additional computational resources in multiple computing domains, spread across diverse geographic locations. The deployment of MEC servers has allowed for partial or complete migration of the network services at the edge of the network [1]. Having a considerable number of MEC servers introduces additional issues of increased costs related to installation, operation, maintenance and energy consumption. Flexible service provisioning has the potential to significantly reduce not only the Capital Expenditures (CAPEX) and the Operational Expenditures (OPEX), but also the total energy consumption of the network.
The presence of virtual networks in B5G, enabled by SDN and NFV, has sparked the introduction of a new variety of services, making it possible to deploy multiple network services within a virtual network. In this context, a typical network service is built using a series of interconnected Virtual Machines (VMs) or Containers that perform a specific function, called Virtual Network Functions (VNFs). The VNFs are usually chained together, forming complex structures called Service Function Chains (SFCs) with data flows among them. The physical network of the InPs is spread in a large geographical area comprised of multiple domains across the physical network. This forces the SFCs to be deployed over massive distances with their VNFs located in distant data centers.
The deployment of SFCs is rather complex, as VNFs must be placed in an optimal setting to ensure maximum performance and minimum energy consumption while performing the service requests. However, the underlying physical network can have multiple geographically distributed domains, which affects services that have a spatial distribution that extends large distances. It is safe to conclude that the placement of service VNFs in the network holds a significant role in the performance, the quality of the offered service, the energy consumption and finally, the cost of operation and maintenance.
As the size of cellular networks is expanding and the demand for computing resources increases, the necessity for automated service management and orchestration becomes essential for InPs to offer superior services to the users. The autonomous placement of SFCs is a fundamental aspect of the Zero-touch network and Service Management (ZSM) in B5G networking. The European Telecommunications Standards Institute (ETSI) has generated guidelines regarding ZSM and self-management on large-scale networks [2]. Researches from both academia and industry are proposing and developing state-of-the-art solutions following the guidelines by ETSI [3].
Performance optimization with ZSM orchestration is a topic well studied in modern literature. However, performance scaling is often overlooked, especially for complex trade-offs such as latency and energy minimization. Single-agent centralized solutions, not only exhibit a centralized point of failure but also bottleneck automation scaling. Large 5G networks are divided into geographically distributed domains to satisfy various remote regions. This natural separation can be utilized to create a superior distributed and decentralized SFC management and orchestration system to enable scalability. By dividing the problem into multiple local regions that cooperate, it avoids affecting the performance of a wide variety of Key Performance Indicators (KPIs) as various aspects of the network grow over time.
In a previous work [4], we have studied the problem of VNF orchestration for VNFs in single-domain networks and proposed a Deep Deterministic Policy Gradient Reinforcement Learning (RL) solution for End-to-End (E2E) service latency minimization. Afterwards, we have proposed a distributed RL-based orchestration framework capable of orchestrating multiple SFCs in multi-domain networks and optimizing the placement for Ultra-Reliable Low Latency Communications (URLLC) services [5].
To this end, in this work, we are studying the problem of energy-aware latency minimization for multi-domain networks, via dynamic SFC placement for URLLC services with strict performance requirements. We are proposing and developing SCHE2MA, short for Service CHain Energy-Efficient MAnagement. It is an innovative, multi-agent, energy-aware, distributed RL-based, service orchestration framework aiming to minimize service latency in multi-domain B5G networks. The proposed work is decentralized, eliminating any central point of failure and enabling scalability. The contribution of our work is threefold by: • Jointly minimizing the average service latency and network-wide energy consumption in multiple traffic scenarios by optimizing the placement of service VNFs and minimizing the number of transmissions between servers. • Demonstrating scalability, parallelization, and decentralization in the SFC placement dynamic decision-making process, by dividing the VNF orchestration between the local technological domains, avoiding that way costly network-wide VNF configurations. • Evaluate SCHE2MA through model validation and simulation to demonstrate its ability to jointly reduce average service latency by 103.4% and energy consumption by 17.1% compared to existing solutions from the literature. The remainder of this paper is organized as follows. Section II provides an extensive discussion of the literature and related works. Section III offers an overview of the System Model architecture and introduces the formulation of all network components. Section IV is an in-depth analysis of the problem statement and definition. Section V presents in detail the distributed RL-based modules and operation. Section VI showcases the experimental setup and evaluates the performance of SCHE2MA against other solutions from the literature. Finally, Section VII provides a brief conclusion of this work and our future intentions regarding this work.

II. LITERATURE OVERVIEW
Studying and developing energy-aware optimization requires the understanding of SFC management and orchestration as it dynamically transforms network dynamics.

A. Centralized SFC Orchestration
Although a considerable amount of the literature examines the automated SFC placement, the majority of these works consider exclusively centralized solutions. In [6], the authors attempt to maximize the revenue by optimizing the placement of the VNFs, by modeling and solving the problem with Integer Linear Programming (ILP). Similarly, authors in [7] develop a heuristic algorithm to optimize the network bandwidth consumption by taking advantage of the SFC placement.
On the other hand, more recent works study the SFC placement problem using Deep RL (DRL) techniques. They proved to be an important breakthrough and attracted a lot of attention lately as automated decision-making tools. They are able to outperform classical optimization algorithms, and even humans in many industries [8], thanks to lower computational complexity. In [9] the authors use DRL to perform the placement of Virtual Network Function Forwarding Graphs considering the constraints of the underlying infrastructure. Similarly, Pandey et al. in [10] propose EdgeDQN for efficient SFC placement in the edge-cloud to tackle the resource scarcity issue while maintaining low end-to-end delay. They use an hierarchical network model to avoid the expansion of the action space as the network grows over time. In addition, their reward model incorporates energy consumption and the complexity of SFC. They evaluate the results using a simulator and a testbed to demonstrate the proposed algorithm. In [11] and [12], Leoni Santos et al. propose an energy-aware SFC placement algorithm based on Proximal Policy Optimization DRL named Cand-RL. They compare Cand-RL with an Advantage Actor-Critic-based algorithm using simulations based on Brazil's National Teaching and Research Network backbone. Although these works provide satisfactory solutions, they do not consider the complex, multi-domain nature and scale of modern infrastructure. They utilize centralized algorithms that overlook and manage the entire network, limiting scalability and decreasing the performance as the network grows over time, from the number of users or services to the network itself.

B. Multi-Domain SFC Orchestration
Centralized, single-agent algorithms compute the placement of the SFCs in the entire network. It is a costly and non-scalable solution for large networks as it requires enormous problem and actions spaces. In contrast, multi-domain orchestration takes place between multiple independent administrative domains that the SFC is divided upon. In [13], the authors propose an ILP-based solution that addresses the VNF deployment problem in multi-domain networks by reducing the overall energy consumption while maximizing the number of VNFs. Similarly, authors in [14], study the problem of optimal network service deployment across multiple SDN domains with the target of saving energy while achieving the load balancing of multi-domain networks. They demonstrated through simulation that the proposed heuristic service deployment algorithm is efficient and outperforms comparison algorithms in terms of energy consumption and load balancing degree.
Recent works in the literature, study the multi-domain SFC management and orchestration with partially observable DRL techniques. The algorithmic state space can be shared into multiple partially observable states to agents that can solve the issue by cooperating. Authors in [15] and [16] propose a model that tackles the multi-domain SFC placement policy generation by splitting the network into independent divisions with limited visibility of the local infrastructure. They leverage Deep Deterministic Policy Gradient that operates on partially observable states and using rewards that express the quality of the obtained placements using Linear Physical Programming. Through experiments they were able to demonstrate rejection rates of under 2%, cost and latency close to optimal.

C. Unexploited Gap in the Literature
We can safely conclude that there is a gap in the literature regarding the energy-aware SFC placement for service latency minimization awaiting to be covered. To the best of our knowledge, we are among the first to propose a distributed RL-based approach for multi-domain SFC orchestration for latency and energy efficiency optimization in this regard.

III. SYSTEM MODEL
For our solution, we consider a 5G network consisting of multiple clouds that are geographically distributed in a wide area, as can be seen in Fig 1. The geographically distributed clouds are called Domains, denoted with n, and consist of interconnected computing servers with computational resources and specific energy consumption, following the standard of modern virtualization NFV software such as OpenStack and Kubernetes. Domains located at the edge of the network provide low latency access to the end-users who generate service requests. Services are comprised of a chain of VNFs called SFC. The need for distributed orchestration is apparent in this setting. The terms service and SFC will be used interchangeably for the rest of this work.
As it is apparent by Fig 1, our proposed algorithm SCHE2MA is designed to operate in a completely decentralized manner. It utilizes the distributed notion of network domains to operate locally. This action enables parallelism, avoids unnecessary VNF migrations between the domains or costly re-orchestration of the entire network. An entity disconnected from the decision function, called the Auction Mechanism, is introduced to enable inter-domain VNF migration. SCHE2MA is a distributed decision engine with multiple agents, eliminating a centralized point of failure as the Auction Mechanism can be instantiated anywhere in the network.

A. Network Graph
The physical network infrastructure is represented as a graph G with nodes and edges: where V and E represent the sets of the nodes and links respectively. To accommodate the functions of an SDN-NFV enabled network, V has two types of nodes. One is dedicated to network switching that is responsible for forwarding the service traffic and the other one has the ability, not only to forward but also to instantiate, terminate or migrate VNFs which are represented by the set M, up to its physical capacity. Both types of nodes handle the routing of the SFC traffic. Finally, U denotes the set of users. The terms node, server and host, will be used interchangeably.
The parameters u, v ∈ V represent two nodes and uv ∈ E represents the physical link that connects nodes u and v.

B. Network Resources
The network has a finite number of resources that can be measured and tracked through metrics. The following metrics were used to represent the different network resources, in particular: • The metric C bw uv represents the capacity of a network link uv, whereas w bw uv its utilization ratio. • C cpu u the total number of a server u CPU cores, while w cpu u its utilization ratio. • C ram u the total amount of Random Access Memory of a server u, whilst w ram u its utilization ratio. • C hdd u the total amount of storage space of a server u, whereas w hdd u its utilization ratio. The instantiation and run-time of a service VNF in the network servers u uses a portion of the aforementioned computational and network resources. These metrics are vital for the decision making process as it will be presented in Section V-B.

C. Service Function Chains
We represent the SFCs as graphs G s = (V s , E s , U s ). The edges of this graph begin from the source VNF to the destination VNF of the function chain, thus creating a service s. The flow of data between the VNFs has a predefined order. The notation V s denotes the source, the destination and intermediate servers that the service s traverses through the instantiated VNFs. u s , v s ∈ E s represent two nodes in G s . E s indicate the links u s v s ∈ E s that connect adjacent VNFs, served by the server nodes u s and v s in G s . Finally, as U s we define the set of users that utilize the service s.
The dataflow of service s can be also represented as a list source → m 1 . . . m j → destination, in which source and destination are the source and the destination nodes of the service. The service traffic need to traverse in between them through the intermediate nodes m 1

D. Network Resource Constraints
We define services s and categorize them into three distinct classes: • As S we describe the instantiated and running services. • As T the terminated services. • As R the redirected services. Services can be redirected due to inappropriate VNF placement that violates the user SLA κ delay s or rejected placement due to insufficient resources to host the service s.
Due to the hardware limitations of the network elements, at any time interval the total allocation of bandwidth and computational resource consumption cannot exceed the available resources in links ∀uv ∈ E, nodes ∀u ∈ V and VNFs ∀m ∈ M as described in the following constraints:

E. Maximum Tolerated Delay
The E2E delay of each instantiated and running service s ∈ S is highly dependent on its VNF placement. It is calculated as the sum of all link and other network element delays that are traversed by the service chain graph G s as described in Section III-C. As d uv we denote the delay of link uv, d u the delay caused by the server hardware and d m the delay caused by the VNF. If the E2E delay exceeds the maximum tolerated service delay κ delay s , then an SLA violation is registered. The SLA violation rule is indicated below: The inability to achieve the inequality of the equation moves the service s ∈ S from operating to redirected s ∈ R before the end of the current iteration. Additionally, it triggers a cost penalty used in the reward function to adjust the future actions of the local RL agents, as discussed in a later section.

F. Service Deployment & VNF Orchestration
Fully deployed and operating services s ∈ S have a particular way of traversing the network, which cannot be split or reversed. Traffic is flowing from the source to the If a physical link uv is traversed by the service s, then the nodes connected u and v must be traversed too as follows: In addition, the aforementioned links uv and nodes u must be traversed by the service s traffic due to the VNF m placement: Finally, (10) ensures that the VNFs m can be placed only in one node u capable of hosting VNFs:

IV. PROBLEM STATEMENT & DEFINITION
In this section, the SFC orchestration will be examined and formulated as an energy-aware, low latency VNF placement problem.

A. Problem Description
Modern network consist of multiple computing domains spread in a wide area. These local computing domains, denoted as n, connect through an infrastructure that we model as a set of inter-domain links uv and servers u. The users U s of service s connect to the local domains through a wireless connection, and they request one or more URLLC services from the network. The offered services instantiated as SFCs consisting of multiple VNFs, hosted in various network domains that the traffic flows within. As a result, the network resources are also scattered over a large distances, posing local deprivation of computing resources and increasing the number of hops between the servers needed for the data to travel to complete the service traversal. If there are available resources on the network, it accepts the incoming service requests.

B. Delay Model
The E2E service delay D s that the service s can offer due to its placement in the network during the current interval can be calculated as follows: The total service delay D of the current iteration can be expressed as the average E2E service delay D s of all services s ∈ S:

C. Bandwidth Allocation Model
The total allocated bandwidth B s of service s occupies from the network link resources due to its placement is calculated as follows: The average allocated bandwidth B of the current iteration can be expressed as the average E2E service delay D s of all services s ∈ S:

D. Energy Consumption Model
The total energy consumed by the network during the operation of service s branches into two distinct segments.
First, the energy consumed by the utilization of computational resources E m while hosting the VNF m on node u is calculated based on the work of Mao, et. al in [17] as defined below: where F indicates the computational capacity of the node computing unit measured in CPU cycles per second, C the CPU cycles required for computing one data sample at each CPU core, φ cpu s the number of utilized CPU cores by the VNF and D s the amount of processed data expressed in bits. The constant μ expresses the effective switched capacitance of the CPU architecture.
Second, the link energy consumption E uv can be calculated as the data sent between the VNFs hosted in server u and v, divided by the data rate and multiplied by the dBm of the link transmitted optical power ρ uv . The transmission energy is calculated as follows: where the variable r u expresses the transmission data rate of server u in Gigabits per second, ρ uv the transmitted optical power of the link uv in dBm and t u the amount of transmitted data of a server expressed in bits. The total service s ∈ S energy consumption E s for the operation of the service s in the current iteration is defined as follows: The total energy consumption E of the current iteration can be expressed as the sum of energy consumption E s of all operating services s ∈ S:

E. Problem Objective
In this respect, we formulate the problem of energy-aware low latency SFC management and orchestration as a local long-term constrained optimization task. This enigma translates into solving the local multi-objective constrained task, where we jointly seek to minimize latency while reducing energy consumption and guaranteeing sufficient allocated bandwidth. We formalize the cost function as follows: where w B , w D and w E are the weights used to adjust the gravity of each variable B, D and E respectively of the cost function C. It enables fine-tuning to achieve the desired trade-off between allocated bandwidth, latency and energy efficiency.

V. DESIGNING A DISTRIBUTED REINFORCEMENT LEARNING SOLUTION
This section presents the ZSM energy-aware orchestration problem and proposed solution. The definition of the Distributed RL algorithm and the structure of the proposed Auction Mechanism, will be provided.

A. Solution Overview
The ISPs are required not only to maintain but also to improve the service performance and reduce the operating costs by optimizing the placement of VNFs in the network. The division of the network infrastructure, as previously described in the mathematical analysis through equations, in multiple domains remains the main obstacle in resource management optimization due to the high complexity required to be addressed.
In this work, we introduce an intelligent framework to tackle the energy-aware SFC orchestration problem in multi-domain networks called SCHE2MA. It employs multiple RL agents instantiated in each domain n that perform VNF orchestration locally and provides a system for inter-domain migration, offering a quid pro quo between local and global SFC orchestration benefits. We build SCHE2MA based on the defined network infrastructure system model presented in the previous section.

B. Markov Decision Process Environment
The intra-domain VNF placement and orchestration problem is formalized as a Markov Decision Process (MDP) which consists of a State, an Action Space and a Reward Function. The problem space of a domain n is defined as follows: 1) State Space: Consists of the domain n computational resources of the domain nodes u that host the VNFs m of the services s ∈ S. A local domain state S n is defined as a set where each variable represents the user SLAs and VNF currently in auction requirements of each domain n utilization ratio: (20) 2) Action Space: The Confidence Vector A n , a vector that contains a bid ∈ [−1, 1] for each domain server u n to receive and host the VNF currently in auction by the Auction Mechanism. The maximum of this local action indicates an internal VNF migration: 3) Reward: Consists of cost function expressed in (22). The objective of all domains n is to maximize the sum of function R shared in every iteration and thus, converging in a common state-action that enables cooperation among the domains: In addition, we define a penalty function that is applied when the SLA described in (6) of the service s is violated: The penalty inclusion assists the algorithm to converge faster by accumulating massive constraint costs during the exploration phase of the distributed RL agents.

C. Distributed Reinforcement Learning Agents Structure
The problem of this work includes multiple MDP environments with local RL agents that are located in all domains n. The environments consist of the local computational and network resources of the domain n as defined in the previous section.
To build the domain n RL agents, we leverage Deep Q-Network agent learning as defined by the work of Mnih et al. in [18]. The goal of the agents is to select an action as defined in 21, VNF placements in this case, at every state 20 that maximizes the accumulated reward, as described in (22). We use a Deep Neural Network (DNN) to approximate the optimal action-value function, also known as Q-value function: where the Q-value function can be defined as the maximum sum of all rewards r t , discounted by the parameter γ at each time-step t. The maximum sum of all rewards r t is achieved by a behavioral policy π = P(a|s), after an Observation s and taking an Action a.
To avoid instability during the training of the agents, we employ the Experience Replay technique, which randomizes the Observations and removes the correlation between them during the early training phase to force the agent to embrace exploration. We store the experiences e t = (s t , a t , r t , s t +1 ) of the agent at each time-step t in the set D t = e t , . . . , e t , that we later use to retrieve them. We apply Q-learning value updates on mini-batches of experience (s, a, r, s ), U (D), drawn uniformly at random from the set D t of stored experiences to perform learning for the agent. The Q-learning update during iteration i utilizes the loss function L i (θ i ), which is defined as following: (25) where γ denotes the discount factor that determines the agent's horizon, θ represents the parameters of the Q-network during iteration i and θ − i are the network parameters used to compute the target value at iteration i .
The Confidence metric which represents the domain agent bid during the Auction Mechanism auction procedure is obtained after applying the arguments of the maxima, also known as the argmax function, which is described with the following equation: where f (x) is the set of inputs x from the DNN output D that achieve the highest function value. The Confidence metric bid is extracted from the set D as max(D), whereas f (x) denotes the intra-domain placement, pointing at the server with the highest Confidence.

D. Auction Mechanism Architecture
In this work, we also introduce the Auction Mechanism, a system that enables inter-domain VNF migration in a distributed multi-domain network. As shown in Fig. 3, the Auction Mechanism enables scalability and parallel operation.
The operation of the Auction Mechanism can be described in the following steps: 1) Auction Initiation: The Auction Mechanism chooses the next service s VNF m and showcases to the distributed domains n the requirements of the placement.
2) Distributed Operation: The distributed RL agents of the domains n generate their local action A n or Confidence Vector to propose a local placement for the showcased VNF. The argmax function Confidence Vector the Confidence Metric of each domain is sent to the Auction Mechanism, ensuring minimum data transfers.
3) Global Operation: The Auction Mechanism receives the Confidence Metric of each domain and selects the highest bidder or the domain with the maximum Confidence Metric as a candidate to receive the VNF currently in auction. The Auction Mechanism notifies the candidate domain with an acknowledgment response. 4) Orchestration: If the candidate domain is different from the current domain that hosts the VNF in the auction, the inter-domain migration is initiated. Contrariwise, the domain agent performs an intra-domain migration to the node with the highest Confidence Metric of the local Confidence Vector with a much lower cost in terms of both energy, time and overall cost. If the VNF is already instantiated in the same node, the procedure of migration is declined.

5) Iteration:
The procedure is repeated indefinitely. It is apparent that the Auction Mechanism acts as the auctioneer and it is only responsible for the inter-domain communication making it non-essential for the local domain orchestration. The Auction Mechanism can be deployed quickly at any node of the network eliminating the single point of failure in the system.

VI. SIMULATION RESULTS & EVALUATION
In this section, we conduct a simulation study with a diverse variety of scenarios on a realistic multi-domain network to prove the performance superiority of the proposed scheme.

A. Simulation Environment
The equations and models presented in the previous sections were simulated with Python and a custom-made OpenAI Gym environment [19] was used to make it accessible to the RL agents. The RL agents were developed using TensorFlow [20] and the high-level Keras API open-source library [21]. The network environment is a fork of Containernet [22], an advanced branch of Mininet [23] network emulator used for evaluation by many works in related literature. It simulates a realistic virtual network, VM hosting, switching, and application code for developing and experimenting with SDN-NFV networks. The simulated network topology is a variation of the 2005 Nordu European network, from The Internet Topology Zoo [24], adjusted to accommodate multiple computational domains and fit the requirements of the study. Each node of the topology corresponds to a domain and more nodes are introduced according to the scale of the experiment. The initial and non-variable simulation parameters are presented in Table I. The discreet sets signify multiple value options to accommodate the experiments, whereas ranges indicate a randomly selected value from the given set.

B. Baseline Scenarios
The performance of the proposed SCHE2MA solution is compared with two references from the literature scenarios: 1) Centralized RL: An RL-based orchestration algorithm located in a central location, a common type of baseline approach in the related research literature, such as in [6], [7], and [9]. The central orchestration algorithm overlooks the entire network as opposed to our proposed distributed orchestration scheme, the VNFs are serially placed and the VNFs are migrated to the node with the highest action value.
2) Static Placement: A typical VNF placement strategy, which is adopted by many providers even today, as the default baseline [25]. In this strategy the VNF placement is static and the VNFs remain hosted in the initial node throughout the experiment.

C. Results Analysis
The performance of the baseline scenarios is normalized to the SCHE2MA performance, and the plots show the relative gain or loss for each metric. The analysis shows that the performance of SCHE2MA in both average energy consumption and average service latency. The energy consumption curves of all figures are normalized based on the SCHE2MA performance to improve legibility. The values are expressed in millijoules (mJ) under the curve of SCHE2MA.
In Fig. 4a, we depict the average energy consumption of the examined network of 500 simulations for a varying number of users, normalized based on the SCHE2MA performance (%,mJ). We observe that the energy consumption increases almost linearly with the number of users due to the massive number of transmissions. The reason is that introducing more users to the network generates additional requests that consume more energy during each transmission. Therefore, the overall energy consumption of the network is higher. It is  evident that the proposed solution is able to maintain lower energy consumption in all scenarios, reaching almost 17.1% reduction in the case of 100 users. The reason for this behavior is the ability of SCHE2MA to cluster VNFs into the servers, minimizing the costly communication between servers. Fig. 4b presents the performance of the most critical metric in URLLC services, the average service latency. We observe that the average service latency increases due to insufficient computing resources in servers within the domains as the number of active users grow. However, it has to be noted that SCHE2MA outperforms both baselines by offering a 103.4% reduction in latency for the case of 100 users without increasing the energy consumption, which is a considerable performance improvement while also maintaining lower energy consumption than both baselines. That is possible due to VNF clustering in servers, which minimizes the number of transmissions in physical media. SCHE2MA demonstrates a clear indication of its ability to conceive better VNF placements that satisfy the latency and energy consumption trade-off. Fig. 5a presents how the energy consumption fluctuates during the operation of each algorithm, specifically for the scenario of 3 domains and 500 users within a simulation cycle. We observe that the maximum difference in energy consumption is 15.91% between the Static solution and SCHE2MA. The reason for this is that, as can be seen in Fig. 8a SCHE2MA tends to consolidate multiple SFC VNFs in hosts id-est hosts 2 and 5, to minimize both energy consumption and latency by turning physical link connections into virtual that yield minimal losses. In Fig. 5b, we plot the average energy consumption per domain in order to evaluate the scalability of the given solutions. We observe that the energy consumption steadily increases as we introduce more domains into the network, hence increasing the number of data that need to be considered when planning a VNF placement. It can be seen that the Centralized RL fails to converge due to the larger state space. SCHE2MA is able to reduce the energy consumption by 14.85% compared to the baseline solutions comparing to scenarios id-est the 9 domain network with 500 users. This behavior is due to the flexibility and scalability of SCHE2MA's distributed architecture where the decision-making takes place locally in multi-domain agents that communicate through the Auction Mechanism, dividing and sharing that way the immense problem space. Fig. 6a outlines the average energy consumption per SFC deployed in the network. We observe that in the case of 25 SFCs, the average energy consumption per SFC of SCHE2MA is reduced by 6.36% compared to the Static solution. The reason is that compared to the baseline scenarios, SCHE2MA is capable of operating with less energy, as we have previously discussed and analyzed in Fig. 4. Fig. 6b illustrates the average number of rejected services in a 3 domain scenario with a varying number of users. When the number of users increases in a network with finite resources, the number of rejected services increases. Given that the SCE2MA can re-organize the VNFs, a number of resources can be released. We can conclude that the improvements can be attributed to the VNF consolidation abilities of SCHE2MA.
Finally, Fig. 7a illustrates how the service latency oscillates during the operation of each algorithm for the scenarios with  500 and 1000 users. We observe that SCHE2MA was able to achieve 73.52% less service latency than the baseline scenarios in the case of 5 domains, depicted in Fig. 7b. That is possible by devising VNF placements that minimize the number of transmissions through local intra-domain orchestration. The Centralized RL is hugely affected by the number of users, as the deviation in the figure suggests. Fig. 8a shows the average number of hosted VNFs divided by the number of total service VNFs to indicate the occupancy of the hosts of the the first domain. We can see that SCHE2MA gravitated towards consolidating the SFC VNFs to reduce the number of hops to the end-user. Additionally, Fig. 8b illustrates the total number of migrations of the local agent originally depicted in Fig. 8a that was applying an identical placement for a sustained period to avoid inter-domain SFC re-configurations and additional data transmissions that lead to higher energy consumption and latency.

VII. CONCLUSION & FUTURE WORK
In this work, we have studied the problem of energy-aware latency minimization for multi-domain networks, via dynamic SFC placement for URLLC services. We have proposed SCHE2MA, an innovative, multi-agent and distributed RL-based service orchestration framework. We have introduced the Auction Mechanism that the local domains use to exchange VNFs between the domains. The results confirm superior performance in multiple scenarios, maintaining the high levels of efficiency in multi-domains scenarios compared to a Centralized RL agent solution.
As future work, we intend to implement the Auction Mechanism in a totally decentralized way by employing Blockchain technology. The domain agents will place bids in the new auction process through Smart Contracts and a distributed and immutable public ledger.