In-operation Network Planning

.


INTRODUCTION
Network capacity planning requires the placement of network resources to satisfy expected traffic demands and network failure scenarios. Today, the network capacity planning process is typically an offline activity, and is based on very long planning cycles (yearly, quarterly). Generally, this is due to the static and inflexible nature of current networks. This can be said for both the transport -optical and Ethernet-layer, as well as for the IP/MPLS layer, which should be inherently more dynamic compared to the underlying transport infrastructure. The latter might use automated Traffic Engineering (TE) techniques to place IP/MPLS traffic where the network resources are.
Increasing transport capacity (bandwidth) to meet predicted IP/MPLS traffic changes and failures does provide limited network flexibility, but due to the fixed and rigid nature of provisioning in the transport layer, network planning and TE still require significant human intervention, which entails high Operational Expenditures (OPEX). In addition, to ensure that the network can support the forecast traffic and all the failure modes that need to be protected against, operators add spare capacity (over-provision) in different parts of the network to address likely future scenarios. This entails inefficient use of network resources and significantly increases network Capital Expenditures (CAPEX).
Notwithstanding, optical transport platforms are designed to facilitate the setting up and tearing down of optical connections (lightpaths) within minutes or even seconds [1]. Combining remotely configurable optical cross-connects (OXCs) with a control plane provides the capability of automated lightpath set-up for regular provisioning, and realtime reaction to failures, being thus able to reduce OPEX. However, to exploit existing capacity, increase dynamicity, and provide automation in future networks, current management architectures, based upon Network Management Systems (NMS) need to be radically transformed.
In a scenario where lightpath provisioning can be automated, network resources can be made available by reconfiguring and/or re-optimising the network on-demand and in real-time. We call that as in-operation network planning. We propose to take advantage of novel network reconfiguration capabilities and new network management architectures to perform in-operation planning, aiming at reducing network CAPEX by minimising the overprovisioning required in today's static network environments.
In this article, we highlight current standardisation work and propose a control and management architecture based on the Application-Based Network Operations (ABNO) model [2], which is capable of performing in-operation planning. We illustrate this novel network planning technique by studying two use cases: 1) virtual topology reconfiguration as a consequence of network failures or catastrophic event (disaster), and 2) re-optimisation to improve network resource efficiency and utilisation.

2.1
Static network operation Operation of the currently deployed carriers' transport networks is very complex; multiple manual configuration actions are needed for provisioning purposes (e.g. hundreds of thousands of node configurations per year in a mid-size network). In fact, transport networks are currently configured with big static fat pipes based on capacity overprovisioning, since they are needed for guaranteeing traffic demand and Quality of Service (QoS). Furthermore, network solutions from different vendors typically include a centralised service provisioning platform, using vendorspecific NMS implementations along with an operator-tailored umbrella provisioning system, which may include a technology specific Operations Support System (OSS). Such complicated architectures (Fig. 1a) generate complex and long workflows for network provisioning: up to two weeks for customer service provisioning and more than six weeks for core routers connectivity services over the optical core. Fig. 1b illustrates the fact that such static networks are designed to cope with the requirements of several failure scenarios, and predicted short-term increases in bandwidth usage, thus requiring capacity over-provisioning and significantly increasing CAPEX. It shows a simple network consisting in three routers connected to a central one through a set of lightpaths established on an optical network. Two different scenarios are considered, although the same amount of IP traffic is conveyed in each of them. In the scenario A, router R3 needs three lightpaths to be established to transport its IP traffic towards R4, whereas R1 and R2 need only one lightpath. In contrast in the scenario B, R1 and R2 need two lightpaths whilst R3 needs only one lightpath. In static networks, where lightpaths in the optical network are statically established, each pair of routers has to be equipped with the number of interfaces for the worst case, resulting in 14 interfaces in total. However, if the optical network can be dynamically reconfigured setting up and tearing down lightpaths on demand, each router can be dimensioned separately for the worst case, regardless of the peering routers. As a result, only 10 interfaces are needed, thus saving 28.5% of interfaces.

Migration towards in-operation network planning
The classical network planning life-cycle typically consists of several steps that are performed sequentially. The initial step receives inputs from the service layer and from the state of the resources in the already deployed network and configures the network to be capable of dealing with the forecast traffic, for a period of time. That period is not fixed and actual time length usually depends on many factors, which are operator and traffic type specific. Once the planning phase produces recommendations, the next step is to design, verify and implement changes in the network. While in operation, the network capacity is continuously monitored and that data is used as input for the next planning cycle. In case of unexpected increases in demand or network changes, nonetheless, the planning process may be restarted.
As technologies are developed to allow the network to become more agile, it may be possible to provide response to traffic changes by reconfiguring the network near real-time. In fact, some operators have deployed Generalized Multi-Protocol Label Switching (GMPLS) control planes, mainly for service set-up automation and recovery purposes. However, those control only parts of the network and do not support holistic network reconfiguration. This functionality will require an in-operation planning tool that interacts directly with the data and control planes and operator polices via OSS platforms, including the NMS.
Assuming the benefits of operating the network in a dynamic way are proven, the classical network life-cycle has to be augmented to include a new step focused on reconfiguring and re-optimising the network, as represented in Fig. 2a. We call that step in-operation planning and, in contrast to the traditional network planning, the results and recommendations can be immediately implemented on the network.  To support dynamicity, however, the current network architecture depicted in Fig. 1a will need evolve to include a functional block between the service layer and the network elements to support multi-service provisioning in multivendor and multi-technology scenarios; two standard interfaces are required. Firstly, the north bound interface that, Page 3 of 9   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59 among other tasks, gives an abstracted view of the network, enabling a common entry point to provision multiple services and to provision the planned configuration for the network. Moreover, this interface allows coordinating network and service layer according to service requirements. Secondly, the south bound interface covering provisioning, monitoring, and information retrieval.
Finally, operators should require some human-machine iteration, and new configurations have to be reviewed and acknowledged before being implemented in the network.

2.3
ABNO architecture and required functionalities for in-operation planning Standardisation bodies, especially the IETF, have been working to address all the above requirements, and as a result, the ABNO architecture [2] is now being proposed as solution. The ABNO architecture consists of a number of standard components and interfaces which, when combined together, provide a method for controlling and operating the network. A simplified view of the ABNO architecture is represented in Fig. 2b. It includes: • The ABNO controller as the entrance point to the network for NMS/OSS and the service layer for provisioning and advanced network coordination. It acts as a system orchestrator invoking its inner components accordingly to a specific workflow. • The Path Computation Element (PCE) [3] defined as an entity to serve paths computation requests.  [5] coordinates Virtual Network Topology (VNT) configuration by setting up or tearing down lower-layer LSPs, and advertising the changes to higher-layer network entities. • The Provisioning Manager is responsible for the establishment of LSPs. This can be done by interfacing the control plane or by directly programming the data path on individual network nodes using Network Configuration Protocol (NetConf) or acting as an OpenFlow controller [6]. • The Operations, Administration, and Maintenance (OAM) handler is responsible for detecting faults and taking actions to react to problems in the network. It interacts with the nodes to initiate OAM actions such as monitoring and testing new links and services. Directly connected to the ABNO architecture, the in-operation planning tool can be deployed as a dedicated back-end PCE for performance improvements and optimisations. The back-end PCE is accessible via the PCEP interface, so the ABNO components can forward requests to the planning tool.
Furthermore, in-operation network planning can only be achievable if planning tools are synchronised with the state of network resources, so new configurations can be computed with updated information, and those configurations can be easily deployed in the network. In the proposed architecture, the back-end PCE gathers network topology and current state of network resources, via the ABNO components, using protocols designed to convey link-state and traffic engineering information, such as BGP-LS [7].
There are several architectures utilising IETF components, which are suitable for providing in-operation network planning. These are described Table 1 with the corresponding strengths and weaknesses.

Architecture
Strengths Weaknesses

Stateless PCE
• Path computation can be off-loaded onto a dedicated entity capable of complex computations with bespoke algorithms and functions. • Has a standard and mature interface and protocol. • Supports simple optimisation, such as bulk path computation [8].
• Is unaware of existing LSPs and has no view of the current network resource utilisation and key choke points. • Cannot configure by itself any LSP in the network. • Delays need to be introduced to sequence LSP set-up [9].

Stateful PCE
• Maintains a database of LSPs that are active in the network, i.e., so that new requests can be more efficiently placed optimising network resources. • Supports optimization involving already established LSPs.
• More complex than a stateless PCE, requires additional database and synchronization.
• No existing LSPs can be modified, e.g.
for network re-configuration purposes. • Capable of responding to changes in network resource availability and predicted demands and reroute existing LSPs for increased network resource efficiency [10]. • Supports complex reconfiguration and reoptimization, even in multilayer networks.
• No new LSPs can be created, e.g. for VNT re-optimisation purposes. • Requires protocol extensions to modify and/or instantiate (if the capability is available) LSPs.

ABNO
• Provides a network control system for coordinating OSS and NMS requests to compute paths, enforce policies, and manage network resources for the benefit of the applications that use the network. • New LSPs can be created for in-operation planning. VNTM in charge of VNT reconfiguration. • Supports deployment of solutions in multitechnology scenarios (NetConf, OpenFlow, control plane, etc.) • Requires implementation of a number of key components in addition to the PCE function. • Some interfaces still need to be defined and standardized.

USE CASE I: VIRTUAL TOPOLOGY RECONFIGURATION
In our multilayer network scenario, the virtual network topology is supported by a set of optical connections. Thus, the IP/MPLS layer can operate independently from the optical one. However, in some cases the virtual topology needs to be reconfigured based on changing network conditions and demands. Many network operators prefer that such reconfiguration is supervised and approved by a person in change before it is implemented in the network. This human intervention allows the application of additional policies and considerations and for integration with existing business policy and operation support systems.
In this section we analyse the virtual topology reconfiguration under two of such scenarios: i) after a failure and ii) disaster recovery.

3.1
Virtual topology reconfiguration after a failure Fig.3a represents a multilayer network consisting of four OXC in the optical layer and three routers in the IP/MPLS layer. IP/MPLS routers are connected through 10 Gb/s lightpaths, which create a VNT. Three bidirectional IP/MPLS LSPs have been set-up on the virtual topology. After a failure occurs, either in the optical or in the IP/MPLS layer, Fast Reroute (FRR) [11] can be used to recover part of the affected IP/MPLS LSPs immediately after the failure. In addition, the state of the network after the failure can be updated in the control plane also within seconds. However, the capacity of some IP/MPLS LSPs might be reduced or even remain disconnected as a consequence of high congestion in some links in the virtual topology and a virtual topology reconfiguration needs to be performed to groom traffic and distribute resource utilisation away from overcrowded areas. An example is presented in Fig.3b, where the LSP R2-R3 gets its capacity reduced from 8Gb/s to only 2Gb/s. To cope with virtual topology reconfigurations our approach relies on the ABNO architecture presented in Fig.3d, where the process is also represented.
When the network operator wishes to perform a network-wide virtual topology reconfiguration, a request is sent from the NMS/OSS to the ABNO controller (1), who then forwards it to the PCE via the VNTM (2). For performance reasons, a back-end PCE which contains an active solver is responsible for computing new virtual topology layout taking into account the current state of the network. Therefore, the PCE sends a request towards the in-operation planning tool running in that back-end PCE to compute the new virtual topology (4). The tool considers all the surviving resources, which may include router interfaces and transponders connected to failed optical connections, spare interfaces that typically exist in the network for normal growth, and possibly some spare routers that have been installed ahead of time; all those resources can be stored in an inventory database. The tool must consider how to implement the desired IP/MPLS connectivity over the optical layer. To this end, it needs to know which optical links and nodes are up and which connections are optically feasible, considering optical impairments. When a result is obtained (4), the set of lightpaths is replied in a Path Computation Reply (PCRep) message (5) towards the originating PCE. In case that an operator needs to approve the new (virtual) layout, the PCE forwards it to the ABNO controller (6). The computed layout is then presented to the operator for final approval (7). When the operator acknowledges the new optimised layout (8), it is passed to the VNTM (9) which computes the sequence of operations to carry out in terms of re-routing existing new LSPs minimising disruption.
The sorted sequence of actions on existing LSPs is passed to the provisioning manager (10), which is able to interact with each head-end node. The provisioning interface, by which the provisioning manager is able to suggest re-routing of LSPs is based on the PCEP protocol and interface, using PCUpdate messages [12]. The new allocated resources are reported back to the provisioning manager and ultimately the VNTM, using PCReport messages. Note that after a successful re-optimisation the LSP-DB is updated accordingly. In our example in Fig.3c, a new lightpath is created between R1 and R2 and as a result the IP/MPLS LSP R2-R3 can be rerouted and its initial capacity restored.  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60   F  o  r  R  e  v  i  e  w  O  n  l  y   3.2 Disaster Recovery While most networks are typically designed to survive a single failure without affecting SLAs, they are not designed to survive large scale disasters, such as earthquakes, floods, wars or terrorist acts, simply because of their low failure probability and the high cost of over-provisioning to address such events in today's network. Since many systems might be affected, large network reconfigurations are necessary during large scale disaster recovery.
The disaster recovery process is similar to that of the virtual topology reconfiguration after a failure. However multiple optical systems, IP links, and possible routers and OXCs (assuming central offices are affected) may be taken offline during the disaster. Several additional planning and operation requirements in response to large scale disasters are highlighted below.
• Consideration of potential IP layer traffic distribution changes, either using MPLS-TE tunnels, or by modification of IP routing metrics, and evaluating benefits based on the candidate topology. • It may be impossible to reach the desired network end state with one step optimisations. Therefore, it may be necessary two or more step optimisations. For example, to reroute some other optical connections to make room for some of the new connections. • The system must verify that the intermediate configuration after each such step is robust and can support the current traffic and possibly withstand additional outages. • Based on pre-emption and traffic priorities, it might be desirable to disconnect some virtual links, so as to reuse the resources for post-disaster priority connections and traffic. We have described the creation of one disaster recovery plan, but in a real network there may be several possible plans, each with their pros and cons. The tool must present all these plans to the operator so that the operator can select the best plan, and possibly modify it and understand how it will be behave.
To summarise, the above process consists of several steps: 1. Immediate action by the network to recover some of the traffic.

USE CASE II: RE-OPTIMISATION
Algorithms in the control or management planes compute routes and find feasible spectrum allocations for connection requests taking into account the state of network resources at the time each connection is requested. Nonetheless, as a consequence of network dynamics, some resources may not be released so that better routes could be computed and thus, re-optimisation could not be applied to improve network efficiency. For example, imagine an optical connection that due network congestion, is required to circumnavigate optimal nodes and links, so that the endto-end connection requires intermediate regenerators; at some point additional paths become available and the service could be rerouted to use the shorter route and eliminate regeneration. Additionally, other existing services could be rerouted to remove the bottlenecks and avoid network congestion. Or even allow some connections to increase their capacity when needed.
In this use case we study a specific problem that arises in flexi-grid networks [13] and where re-optimisation could bring clear benefits. In such networks, lightpaths can be allocated using variable-sized frequency slots, whose width (usually a multiple of a basic width such as 12.5 GHz) is a function of the requested bit rate, FEC and modulation format. Such frequency slots must be contiguous in the spectrum and the same along the links in its route. As a consequence of the unavailability of spectrum converters, spectrum fragmentation appears increasing the blocking probability of connection requests, making worse the network grade of service.
An example is shown in Fig.4a where the optical spectrum of a link is represented. Three already established lightpaths share that link; each lightpath uses a different frequency slot width. If a new lightpath needing 37.5 GHz is requested, it would be blocked as a consequence of lack of spectrum contiguity. In such scenario, re-optimisation could be applied to the network before a connection request is blocked, by re-allocating already established lightpaths in the spectrum (Fig.4b) to make enough room for the triggering connection requested (Fig.4c). Authors in [14] describe the SPRESSO algorithm to efficiently compute the set of connections to be reallocated. In [15] the SPRESSO algorithm was integrated into an active stateful PCE and reallocations were performed in a hitless manner by using the Push-Pull technique. 9   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59 Fig.4 Example of re-optimisation (a-c) and process (d). Fig.4d illustrates the proposed control plane architecture to support flexi-grid network re-optimisation, which also facilitates human verification and acknowledgement of network changes. When router A needs a new connection to router B, it sends a request to the control plane of the optical network (1). After checking admission policies, a Path Computation Request (PCReq) message is sent to the PCE (2), which invokes its local provisioning algorithm (3). In the event of insufficient resources being available due to spectrum fragmentation, the active PCE recommends the defragmentation of relevant nodes and connections, utilising the right algorithm to provide such re-optimisations. Similarly, as in the previous use case, let us assume that the back-end PCE providing such algorithm will perform the computation (4) upon receipt of a request. When a result is obtained (5), it is sent back to the front-end PCE (6). In case that an operator need to approve implementing the computed solution in the network, a request is sent to the NMS/OSS via the ABNO controller (7,8). When the solution has been verified and acknowledged by the operator, the NMS/OSS informs the PCE via the ABNO controller (9,10), which forwards the solution towards the provisioning manager. Existing connection reallocations are requested using PCUpdate messages (11). Once the dependent connections have been setup, the responsible PCE will invoke the local provisioning algorithm for the original connection request between routers A and B and sends a PCRep message to the originating control plane node (12).

SUMMARY
A control and management architecture of transport networks has been proposed to support in-operation planning. The architecture is based on ABNO and allows carriers to operate the network in a dynamic way and to reconfigure and re-optimise the network near real-time in response to changes, like traffic or failures. Networks life cycle is extended achieving better resource utilization, thus reducing network CAPEX. Moreover, process automation reduces manual interventions and, consequently, OPEX.