Dataplane-Based Fast Failover in SDN-Enabled Wide Area Measurement System of Smart Grid

In the wide area measurement system (WAMS) of smart grid, the real-time monitoring and protection applications have stringent requirements for the end-to-end transmission delays between phasor measurement units and phasor data concentrator, and fast failover (FF) is required to ensure the communication performance after link failures. In this work, the software-defined network (SDN) technology is utilized to enable datapath failover upon a link failure with a global view of the communication network. Then, a novel dataplane-based fast failover (DFF) mechanism is proposed to <italic>directly</italic> reroute the data packet in dataplane without interacting with the SDN controller. Based on the mathematical analysis over the WAMS topology features, the proposed DFF optimizes two procedures of failover: backup path construction and backup path installation. Using the proposed backup path construction algorithms, the 3-approximate and (1+2<inline-formula><tex-math notation="LaTeX">$\varepsilon$</tex-math></inline-formula>)-approximate (<inline-formula><tex-math notation="LaTeX">$0< \varepsilon < 1$</tex-math></inline-formula>) backup paths can be constructed, theoretically guaranteeing the data transmission delays both <italic>during</italic> and <italic>after</italic> failover. Using the proposed LinkID-based FF group table installation method, the conflict of forwarding rules between original and backup paths can be eliminated, while the storage cost is also optimized. The simulation results on six IEEE benchmark test power systems show that the proposed DFF mechanism could achieve lower data transmission delays during and after failover compared with the existing control plane based and dataplane-based failover mechanisms.


I. INTRODUCTION
W ITH the increasing involvement of information and communication technology (ICT) in electrical infrastructure, the traditional power system is evolving into an ICT-enabled grid, which is the basic concept of "smart grid" (SG). As one of the representative ICT-enabled scenarios in SG, the wide area measurement system (WAMS) [1] plays a crucial rule in the monitoring, operation, and protection of a transmission-level power grid [2]. In WAMS, the phasor measurement units (PMUs) installed at important nodes of a power grid report the measured phasor data to the control center at a prespecified sampling rate, and these measurement data require to be first gathered at the phasor data concentrator (PDC). Typically, the real-time monitoring and protection applications running in the energy control center (or directly on PDCs) have stringent requirements for the end-to-end transmission delays between PMUs and PDC, which proposes stern challenges to the communication network responsible for the data transmission between PMU, PDC, and control center.
Specifically, resilience is one of the key indicators of the communication network performance and has attracted focused attention in recent WAMS-related researches [3], [4], [5], [6]: once a link breaks down due to unpredictable physical events, the large data retransmission delays may cause PMU/PDC data incompleteness and further damage on the physical layer [7], [8], [9], and should be minimized to ensure the quality of service (QoS) for WAMS applications. The most commonly used method to handle these kind of link failures is based on the fast reroute (FRR) methodology [10]: retransmit the data packet locally while ensuring reachability. For example, the loop-free alternative (LFA) [11] and remote LFA (rLFA) [12] methods guide the entrance router of the failed link to redirect the data packet to a neighbor or remote node that meets specific requirements; in [13] and [14], additional elaborate tags are required to be added into the packet header to guide the reroute path. The major drawback of the FRR-based methods is that they can only ensure the reachability of the new route but cannot guarantee the performance, which is not preferred in the WAMS scenario with stringent QoS requirements. Other techniques, such as deep learning [15] and blockchain [16], have also been adopted to enhance cyber resilience, but these methods are mainly targeting the data center network and edge computing, which are different from the special PMU-PDC communication patterns in electric power system. The other promising technique to deal with link failures is to leverage the software-defined network (SDN) [17] architecture to find the new route in a global view. In SDN, the control plane is logically and physically separated from the dataplane, where the SDN controller in control plane can install forwarding rules (i.e., flow entries) into the lookup tables (flow tables) of switches in dataplane, so that the forwarding behaviors can be organized in a centralized manner. Therefore, the SDN-enabled SG communication network design has drawn a lot of focus and can be used for resilience enhancement of WAMS [18], [19]. The related research work can be categorized into the following two main methods.
1) The control plane-based failover method uses the SDN controller to compute and install a new route after the link failure is detected. The new route computation is usually based on solving an optimization problem. For example, the optimization objective in [3] is to find redundant communication paths nondisjointly overlapping at links with the minimum number of links, while meeting a multiplicative constraint and a concave constraint (end-to-end delay). In [4], the 0-1 integer linear programming problem is formulated, whereas the number of valid measurements within a timer period, delays, and flow conservation constraints are considered. The other representative works [5], [6], [20], [21], [22] follow the similar logic of solving optimization problems; in summary, different control plane-based failover methods mainly differ from the optimization objectives, constraints, considered meters, and applied solution algorithms. The main drawback is that the control planebased failover needs to inquire the controller to compute new or backup paths, where the path computation and flow entry updating process consume a considerable delay that can adversely impact time-sensitive WAMS applications, regardless the delays of packet-In and packet-Out message generation of the openflow protocol.
2) The dataplane-based fast failover (DFF) method directly forward the packet in the new preinstalled route without consulting the control plane, which can quickly recover the data transmission. The DFF methods in SDN need to leverage the built-in fast failover (FF) group table defined in the openflow 1.1 specification or higher versions. The FF group table can monitor the liveness of the switch ports, and once the liveness of a working port is transiting to "down," the backup ports can be selected to forward the data packet. Using this dataplane-based failover method, the failure location is not required to be identified, since the link failure is only required to be detected by the single switch located at the entrance node of the failed link, and then the output port of the switch is switched to another one according to the preinstalled FF group table entries. Several works have utilized the FF group table to perform DFF [23], [24], [25]. In [24], the loop-free group table installation methods are proposed for SDN-enabled substation communication network, whereas in [25], the FF is based on finding and installing a new path between the two end nodes of the failed link with minimum amount of flow entries. The main drawback of existing works is that they only use the FF group table, but do not optimize the solution, i.e., the FF group table-based backup path is not well constructed to guarantee the end-to-end transmission performance. Based on the abovementioned observations, in this work, a novel DFF mechanism is proposed for the performance guarantee of WAMS data transmission even after a link failure: it leverages the FF group table to redirect the packet transmission in dataplane without inquiring the control plane; at the same time, the new backup path can guarantee a good end-to-end transmission performance. Compared with the traditional FRRbased failover methods, the proposed DFF can find the fast rerouting paths with less end-to-end transmission delays in a global view; compared with the control plane-based failover methods in SDN, the proposed DFF can recover the data transmission within a much smaller delay; compared with the existing dataplane-based failover methods in SDN, the proposed DFF algorithms can give a performance guarantee of the backup path. The main contributions of the proposed DFF can be summarized as follows.
1) Novel backup path construction algorithms are proposed based on mathematical analysis over the special features of WAMS communication network topologies, which can find the 3-approximate and (1+2ε)approximate (0 < ε < 1) backup paths, guaranteeing the data transmission performance both during and after failover.
2) The linkID-based backup path installation method is proposed, which avoid conflicts with the original flow entries while optimizing the storage costs by using the linkID as the identifier and reusing the flow entries in the original paths. To the best of our knowledge, this is the first work that finds the performance guarantee of dataplane-based backup paths for WAMS communication networks. The evaluation was conducted on six IEEE benchmark test power systems, and the results of the forwarding rule storage cost, path failover delay, and the end-to-end transmission delay show that the proposed DFF mechanism can reduce both the failover delays and end-to-end transmission delays after failover compared with the existing control plane-based and dataplane-based failover mechanisms.

II. SDN-ENABLED WAMS DATA TRANSMISSION FEATURES IN AN SG
In the transmission-level power grid, a PMU is installed at a bus node (substation) to measure the key electrical parameters, such as voltages, current, and power periodically, and then, the phasor value is computed and encapsulated into a synchrophasor data packet to be sent to the PDC. The PDC needs to gather the measurements from all PMUs in its monitored area, and then sends the group measurements to the higher level PDC or the super PDC (SPDC) where the control and protection applications are deployed. The communication network in WAMS is composed of forwarding devices, gateways, and links, which is responsible for the data transmission between PMUs, PDCs, SPDC, and controllable electrical equipments, such as breakers and switches. With the SDN architecture involved, the forwarding devices become completely programmable openflow switches (called "SW" in this work), which can be controlled by the SDN controller to perform customized forwarding strategies, as shown in Fig. 1.
In WAMS, the PMU phasor measurement data is timesynchronized, i.e., each phasor measurement data has a timestamp and the PDC should buffer all the data that has the same timestamp from different PMUs before forwarding them to higher level PDCs or synchrophasor-based applications. Therefore, one distinctive feature of WAMS is that the largest end-toend data transmission delay between PMUs and PDC (denoted as T max ) is the "bottleneck" of the overall communication performance. Once link failures happen due to malicious cyber-attacks or unpredictable physical events, even if they only influence the transmission path from one single PMU to PDC, the effect is not negligible. In addition, some WAMS applications have stringent  requirements to end-to-end delays, which imposes further demands and challenges to the FF mechanisms upon link failures. Another important feature of the WAMS data transmission that can be leveraged for the special path construction algorithm design is that all the measurement data flows from PMUs to the PDC (in the view of one PDC monitored area). That means, if using the shortest path algorithm (Dijkstra's algorithm [26]) to generate the paths, there is only one destination node (i.e., the PDC-connected SW), and only one shortest path tree (SPT) is generated. As shown in Fig. 2, the SPT is rooted at the PDCconnected SW, and the PMU-connected SWs are distributed in the SPT. Only the link failure happening on the critical links (denoted as bold lines in Fig. 2) of SPT may impact the PMU-PDC communication. Based on this feature, the special mathematical analysis over the SPT can be performed to guide the backup path construction in each PDC area.

III. PROPOSED DFF MECHANISM
An openflow group is an abstraction that facilitates more complex and specialized packet operations that cannot easily be performed through a flow table entry. Among the group table types, FF group table is specifically designed to detect and overcome port failures or port down status caused by link failures. As shown in Fig. 3, the FF group table has a list of buckets in addition to the list of actions. Each bucket has a watch-port as a special parameter, and the watch-port will monitor the "liveness" or up/down status of the indicated port/group. If the liveness is deemed to be down, then the bucket will not be used, and the FF group will quickly select the next bucket in the bucket list with a watch-port that is up.
Therefore, to respond fast to the failure of a specific link in dataplane, a group table is required to be installed at the entrance SW of the link. Once the failure happens on the link, the "up" status of the SW output port will transit into "down," which triggers the output port change. The new path (or backup path) to load the packet flow not only impacts the packet transmission Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. delays during failover, but also determines the end-to-end transmission performance after failover. Therefore, how to construct the backup path to redirect the packet flow needs to be first investigated.

A. 3-Approximation Backup Path Construction Algorithm
As described previously, since the backup path can only start from the entrance node of the failed link, the backup path is usually not the shortest path from PMU to PDC; thus, the control plane-based fast path recovery algorithm [6] aiming to find the shortest rerouting path starting from the PMU-connected SW is not suitable for this scenario. To guarantee the performance after failover, the backup path construction algorithms are proposed in this section. The notations used in this section are given in Table I.
Upon a link failure happening at a critical link e f = (p, q), the backup path can be constructed between the two ends of the failed link with minimum flow entry [25]. However, this method could not guarantee the performance of the backup path. In this work, it is assumed that the total weight (sum of link weights) of a path represents the "performance" of the path, which is also commonly used by routing algorithms in the current IP-based networks. To obtain minimum end-to-end delays, the link weight could be the metrics, such as delay and distance. We first prove that the simple idea of using the shortest path from p to q in G − e f as the backup path can guarantee a 3-approximation of the optimal path. (Note that the α-approximate path means that for any topologies of the failover problem, the constructed path from source to destination has a cost at most α times of the optimal path.) Lemma 1: If there is a link failure on the edge e f = (p, q) ∈ T , the 3-approximate path from any s to t in G − e f can be obtained by using Path G−e f (p, q) as the backup path.
Proof: As shown in Fig. 4(a), assume there are two PMUconnected SWs (s 1 , s 2 ) in T (up) , and their destination is t. The nodes a and b are some nodes in Path G−e f (p, q). Then, the new path from ∀s ∈ {s 1 Fig. 4(b). The weight of the constructed path Q is (1) can be written as If the Dijkstra's algorithm [26] is used to find the shortest path from p to q in G − e f , the computational complexity of the 3-approximate backup path construction algorithm (denoted as

B. (1 + 2ε)-Approximate Backup Path Construction Algorithm
Although the natural idea of using Path G−e f (p, q) as the backup path can result in a performance guarantee of 3approximation, both the computational complexity and the path performance can be further optimized. To achieve this goal, the (1 + 2ε)-approximate (0 < ε < 1) backup path construction algorithm [denoted as (1+2ε)-BPCA] is proposed based on the following mathematical analysis.
Lemma 2: If there is a link failure on the edge e f = (p, q) ∈ T , the (1 + 2ε)-approximate path from any s to t in G − e f can be obtained by using Path G−e f (p, t) as the backup path.
Proof: As shown in Fig. 4(c), assume that the node a ⊂ T up and b ⊂ T down are some nodes in Fig. 4(c). The weight of the constructed path Q is In fact, the constructed (1 + 2ε)-approximate path is also the optimal solution that can be applied in dataplane-based failover to minimize the end-to-end transmission delay between PMUs and PDC, because the dataplane-based failover is based on the FF group table, and thus, the backup path can only start from the entrance node (p) of the failed link. The constructed backup path by (1 + 2ε)-BPCA is exactly the shortest path from node p to destination node t, which minimizes the transmission delays during failover.
When using the Dijkstra's algorithm to find Path G−e f (p, t), the computational complexity is still O[(n − 1) 2 ]. However, in this case, we find that the original SPT construction can be utilized to reduce the path construction complexity. It can be seen that Since T up ⊂ T is rooted at p and is the subtree of T , then for any a ⊂ T up , Path G−e f (p, a) is just the tree path on the original SPT for G. For the same reason, , t) is also the tree path on the original SPT T . Therefore, Path G−e f (p, t) is the path that has minimum weight of Dist G (p, a) + wt(a, b) + Dist G (b, t). Since  Dist G (p, a) and Dist G (b, t) are known from the original SPT construction process, the only required computation is to traverse all the link (a, b) that connects the nodes between a ⊂ T up and b ⊂ T down , as shown in Fig. 4(d). Assume the set of such links is E (a,b) , and |E (a,b) | = m (a,b) . The (1 + 2ε)-approximate backup path construction algorithm shown in the following has the computational complexity of O[m (a,b) ].

C. LinkID-Based Backup Path Installation
Although the FF group table could be utilized to perform FF in dataplane, the forwarding rules of backup paths may conflict with those of the original paths. For example, if the backup path for (p, q) in Fig. 5 passes through SW c, i.e., p → c belongs to the backup path, then it will conflict with the logic of the original PMU1-PDC path c → p. To solve this kind of problem, the VLAN ID is used in [24], and the identifier (not specified) is used in [25] to identify the redirected data flows. For example, after the link (p, q) in Fig. 5 breaks down, the data flows from p to q will be redirected to another SW, and a new tag will be added to the data packet header by the FF group table action in SW p. However, the existing works did not describe how to add the tag and how to optimize the storage cost.
Observation 1: For each failed link in SPT, one single backup path can recover all the data flows to PDC. As analyzed in Section II, in WAMS, all the measurement data flows from different SWs in the SPT to the root of SPT; therefore, the destination of all the measurement data reaching at p is the PDC-connected SW (denoted as t). Thus, one single backup path is enough to redirect the data flows upon the link failure of (p, q), no matter how many different PMUs are sending measurement data to p. Based on Observation 1, in this work, the linkID-based backup path installation method is proposed: to protect a link (p, q), the linkID can be used to uniquely identify the corresponding backup path. Specifically, the backup path installation methods for the proposed 3-BPCA and (1+2ε)-BPCA are designed, respectively.
Backup path installation for 3-BPCA: To install a backup path for link (p, q), assume that the backup path is (p, a) (a, b), the flow entries use both the dstIP and tag as the match fields to avoid conflicts with the original path. Here, the original path uses "dstIP" as match field since PMU-PDC communication is IP based.
3) at the end SW q of the backup path, the actions also include removing the tags from the packet header. One example of this procedure is shown in Fig. 5(a). Backup path installation for (1+2ε)-BPCA: Since the destination node in this backup path construction method is t but not q, the storage cost can be optimized by utilizing the flow entries in the original path. Assume the backup path is , t), and b is the first PMU-connect SW on the backup path and is also in T down . Since the path from b to t is already installed in the original path, the linkID-based tag can be deleted in b to reduce the storage cost. Therefore, the linkID tag-based match field only needs to be installed on the path Path G−e f (a, b). One example of this procedure is shown in Fig. 5(b).
In order to make a complete protection against a link failure to ensure the PMU-PDC communication performance, the backup paths corresponding to all of the critical links should be installed. It can be observed that to achieve this goal, each SW on the N cl critical links (excluding the root PDC-connected SW) has exactly one single FF group table with two buckets installed. Because in a specific SW p, all the output ports of flow entries are the same (denoted as port "A"); to protect the link (p, q) in SPT, the FF group table in p only has one original watch-port "A." According to Observation 1, the SW p also only needs to monitor another port "B" that is used to redirect the flow once the liveness of "A" is regarded to be down. Therefore, the total number of buckets in FF group tables required to be installed (denoted as N bucket ) and can be calculated as: since there are totally N cl SWs on the N cl links of SPT (excluding the PDC-connected SW). The number of flow entries required to be installed can also be analyzed. To install the original paths, each SW on the N cl links of SPT has one flow entry installed, that is, the number of flow entries installed in flow tables for the original paths is N original fe = N cl + 1.
As for the backup path of link (p, q) in 3-BPCA, which starts from p and ends at q, let the number of nodes on such a backup path excluding p be N 3-BPCA i among the N cl backup paths. Then, the total number of flow entries installed for the backup paths is For (1+2ε)-BPCA, the backup path starts from p and ends to t; however, only the backup path from p to b is required to be installed. Let the number of nodes on the backup path from p to b excluding p be N (1+2ε)-BPCA i among the N cl backup paths, then the number of flow entries installed for the backup paths is

IV. IMPLEMENTATION AND EVALUATION
In this section, the proposed DFF mechanism was evaluated on the topologies of six IEEE benchmark test power systems [27], and the results are compared with existing control plane and dataplane-based failover methods.

A. Testbed Implementation
The testbed was implemented on the Mininet+Ryu SDN environment, where the Mininet was used to generate a real-time virtual network (dataplane) composed of virtual SWs, hosts and linksand the Ryu SDN controller [28] was responsible to monitor the dataplane through the Openflow 1.3 protocol channel, as shown in Fig. 6. The experiment was run on the Intel Core i7-10710 U 1.61GHZ CPU with 16 GB RAM.
Topology Configuration in Mininet: Typically, the communication network topology is not the same as the power grid topology. However, since each bus (substation) needs to connect with remote control center via a forwarding device, a commonly used simplification to study the communication network feature is to regard the communication network topology and power grid topology as the same [29] (including the connection relations and distances between nodes). Based on this simplification, the corresponding communication network topology configuration data were generated for the six test power systems, which can be found in [30]. For example, to emulate the WAMS of the IEEE 24-Bus power grid shown in Fig. 6, each bus is assumed to connect to an SW, and seven PMUs are deployed  Table II according to [31] and [32]. Note that larger systems can further prove the efficacy of the proposed DFF method; however, in this work, only the communication network connecting PMUs and PDC is of concern, because in this network the link failures easily induce the measurement incompleteness that may significantly impact the system-level operation. When the system scale increases, typically multiple PDCs are required to be deployed in different areas of the system to reduce the transmission delays between PMUs and PDCs. Then within one PDC area, the system scale is still not large enough.
SPT and backup path construction in Ryu. The SPT and backup paths for a specific topology were constructed in the Ryu controller [28]. As shown in Fig. 6, the basic Ryu functions include the OFPSwitchFeature that installs a default flow entry to each SW, the SwitchEnter (defined in ryu.topology class) that obtains the topology-related information, such as SW ID, link ID, and SW port number, the OPFPacketIn that receives packets from SWs that do not match the flow entries, and OPFFlowMod that is used for flow table and FF group table configuration. Based on these functions, the SPT for each test power system topology can be computed using the discovered SW topology. For example, the constructed SPT for the IEEE 24-Bus topology is shown in Fig. 6, which is rooted at SW-11 that connects with the PDC. According to the constructed SPT for each topology, the number of critical links for PMU-PDC communication (denoted as N cl ) in SPTs of different test power system topologies is counted and given in Table II. For example, there are 14 such critical links in the 24-Bus topology, denoted as bold lines in Fig. 6. Integrating the above implementation details, for the communication network of a specific test power system, the processing flow of the proposed (1+2ε)-BPCA works as follows.
1) In Mininet, configure the SW and host connections, i.e., the SW graph topology. 2) In Ryu, obtain the SW datapath IDs and connection relations between SWs from the "event.EventSwitchEnter" function defined in ryu.topology class. 3) In Ryu, construct the SPT rooted at the PDC-connected SW based on the discovered SW topology, and the corresponding flow entries are generated and pushed into SWs as the original paths. 4) In Ryu, for each edge e f = (p, q) in the SPT, compute Path G−e f (p, t) according to Algorithm 1, and then, install the backup path Path G−e f (p, t) according to the linkID-based backup path installation method proposed in Section III-C. After installing the flow entries according to the four steps, no more actions are required in the control plane for ensuring single link failure tolerance; when a link failure happens, the data flows can be redirect to the backup path automatically according to the group table functionalities.

B. Forwarding Rule Storage Cost
First, the total number of forwarding rules (N fr ) that is the sum of number of flow entries in flow tables (N fe ) and number of buckets in FF group tables (N bucket ), were evaluated. The comparison was performed with the backup path computation principle that aims to minimize the number of flow entries installed along the backup path from p to q [25] (denoted as minimum flow entry-based backup path construction algorithm, MFE-BPCA), where p and q denote the entrance and export SW of the failed link. As shown in Fig. 7, MFE-BPCA always has smaller storage costs compared with 3-BPCA, because MFE-BPCA uses the backup path from p to q with minimum number of flow entries, whereas 3-BPCA uses the backup path from p to q with minimum path cost. Since (1+2ε)-BPCA reuses the flow entries of the original paths as proposed in Section III(C), it also does not cost much flow table space compared with MFE-BPCA. In the 24-Bus, 30-Bus, and 39-Bus topology, (1+2ε)-BPCA can even save more storage costs than MFE-BPCA. It is observed that the average number of installed forwarding rules in each SW is no more than five, which means there are averagely no more than four extra forwarding rules required for FF upon any single link failures (except for the 57-Bus case with no more than 6.5 forwarding rules and 5.5 extra rules). To guarantee the PMU-PDC end-to-end transmission and failover performance, the extra storage costs are completely acceptable.

C. Path Failover Delay
Second, the key indicator of failover performance-path failover delay (T failover )-was measured, which is the time interval from link failure to rereception of synchrophasor measurements by the PDC, i.e., the port transition time of p plus the transmission delay from p to t of the backup path. Note that since the link failure location also impacts T failover , the result shown in Fig. 8 is the average T failover . For example, there are N cl = 6 links in IEEE 14-Bus topology that may fail, and for each failed link, there is a T failover in experiment. Therefore, Fig. 8 shows the average T failover . The control plane-based failover mechanism (CFM) was also used for comparison: to minimize the path failover delay, all the backup paths were computed in advance and stored in a lookup table. Once the packet-In message is received from the dataplane, the backup path can be directly obtained by searching the lookup table and then be installed. In Fig. 8, the T failover /3 of CFM is used since CFM occupies much higher failover time than the dataplane-based failover mechanisms. It can be seen that the reaction of CFM is always slower than the three dataplane-based failover methods, because although the installation delay of the new path is minimized as much as possible, the delays of packet-In and packet-Out message generation and flow entry update are still considerable. It can also be observed that compared with MFE-BPCA and 3-BPCA, (1+2ε)-BPCA can achieve the lowest failover delays in the six test power system topologies. It is noticed that the average T failover in the IEEE 57-Bus test power system topology is much larger than the other topologies, because there are three links with much larger lengths than any other links in the six test power systems: (SW-30, SW-31) with 1183 km, (SW-31, SW-32) with 1797 km, and (SW-30, SW-25) with 480 km, which may be used in the backup path and, thus, significantly increase the average failover delay. To compare with the traditional distributed control-based failover methods, the commonly used LFA and rLFA methods were evaluated on the six test topologies. After computing the LFA/rLFA node for the entrance node of the N cl links against the link failure, it was found that the related backup node does not always exist; thus, the path failover delays could not be included in Fig. 8 to make a fair comparison. As given in Table III, for the six test topologies, there are 2, 5, 6, 13, 17, and 20 links where the failure cannot be solved by LFA, and 0, 1, 1, 4, 16, and 1 links where the failure cannot be solved by rLFA. One major reason is that a forwarding device is unable to install forwarding rules to other devices, whereas a complicated backup path usually needs elaborate collaboration between several forwarding devices with distinctly different types of forwarding rules, which is difficult to implement with global level consistency in a distributed control fashion.

D. End-to-End Transmission Delay After Failover
Finally, the end-to-end transmission delays (T trans ) from PMUs to PDC after failover were measured to further show the advantages of the proposed DFF mechanism. Here, T trans is also the average value: as analyzed in Section II, the largest end-to-end transmission delay (T max ) from all the PMUs to PDC is the key indicator of performance; so, after failover upon a specific failed link, the corresponding T max for the impacted PMUs was measured. There are N cl links may fail, then T trans is the average value of the measured N cl values of T max . From the results in Fig. 9, it can be seen that the proposed (1+2ε)-BPCA can find the paths with lower end-to-end transmission delays compared with those of MFE-BPCA and 3-BPCA. The average T trans of (1+2ε)-BPCA in the 14, 24, 30, 39, 57, and 118-Bus test power system topologies is about 33%, 31%, 21%, 25%, 8%, and 29%, respectively, lower than those of 3-BPCA. Compared with the optimal paths (OPT) from PMUs to PDC, (1+2ε)-BPCA can even achieve similar end-to-end transmission performance in the 14, 24, 30, 39, and 118-Bus test power system topologies.
Since the proposed DFF is a generalized method not limited to a small scale, a larger power grid model can further show the advantages and generalities. For this purpose, six synthetic topologies were generated by connecting 10/20/30/40/50/60 of IEEE 118-bus networks together, with an average of ten random external links coming out of each 118-bus network [30]. Without violating the real world WAMS setup that a single PDC area does not have a large scale, only the lengths of the constructed backup paths were numerically computed, as shown in Fig. 10. It can be observed that the differences between OPT and (1+2ε)-BPCA are always small, whereas the backup paths generated by 3-BPCA and MFE-BPCA are much longer than OPT when the topology scale increases to a 5900-node level, which shows the efficiency of the proposed failover method in large-scale communication network topologies.

E. Impacts on the Power Grid
To further highlight the value of the proposed method, a PSCAD/EMTDC-based case study was performed to simulate the transmission delay impacts on the IEEE 39-bus power system [33]. In the test power system shown in Fig. 11, 12 PMUs measure the phasor data values of the 12 buses (as given in Table II) and report the measurement data to the PDC located at Bus-16. The path failover delay and end-to-end transmission delays evaluated in the abovementioned sections were used to simulate the behavior of grid.
At simulation time 2 s, a line fault happened on the power line connecting Bus-29 and Bus-28, which triggered the line trip. Due to the same reason, the communication link between Bus-29 and Bus-28 also failed. After the PDC connecting with SW16 detects the overpower warning (warning threshold is set as 30% larger than the rated power, i.e., 130 MW) on the line between Bus-29 and Bus-26 according to the phasor data from the PMU connecting with SW29, it needs to send control commands to the controllable load (CL) located at Bus-29 to prevent the power line trip (the trip threshold is set as 200 MW) as the remedial action scheme (RAS). The original path from SW29 to SW16 is shown in Fig. 12 as the solid line. After the communication link failure, the backup paths installed by 3-BPCA and  MFE-BPCA are the same: SW29→ SW26→ SW28→ SW26→ SW27→ SW17→ SW16, whereas the backup path installed by (1+2ε)-BPCA is SW29→ SW26→ SW27→ SW17→ SW16. The end-to-end transmission delay of 3-BPCA and MFE-BPCA was measured as 6.1 ms, and end-to-end transmission delay of (1+2ε)-BPCA was measured as 3.5 ms. Assume that the RAS control command generation delay in PDC is 20 ms, and the delay of CL action is 10 ms, as shown in Fig. 13(a). For CFM, since the failover time (29 ms) is much larger, when the PDC detected the over-power warning at 2.015 s, the data transmission was not restored. So, the warning threshold time in Fig. 13(a) is 2.029 s for CFM and 2.015 s for 3/MFE/(1+2ε)-BPCA.
From the results shown in Fig. 13(b), it can be seen that the peak value of the line power using CFM is 204.4 MW, larger than 201.8 MW using 3/MFE-BPCA and 196.3 MW using (1+2ε)-BPCA. Although the total delay difference between 3/MFE-BPCA and (1+2ε)-BPCA is only 5.2 ms, in the real-time protection scenario where the electrical parameters may change rapidly, the final impacts on the power grid may be largely different: using 3-BPCA or MFE-BPCA, the line trip between Bus-29 and Bus-26 is triggered, which may induce subsequent impacts on the other lines and loads; while using (1+2ε)-BPCA the line can be protected. From the results in Fig. 10, it can also be expected that in a larger power grid, the benefits of the proposed method would be greater. It should be noted that the delay values obtained from the software-based (Mininet+Ryu) simulation may not be able to precisely reveal the transmission delays in the real-world SDN system, but the delay impacts on the IEEE 39-Bus system can demonstrate the advantages of the proposed (1+2ε)-BPCA to a certain degree. In future work, hopefully the proposed DFF method can be applied in a real-world SDN-enabled WAMS to further show the practical effectiveness.

V. CONCLUSION
To achieve FF in the case of the link failure in SDN-enabled WAMS, in this work, a novel DFF mechanism is proposed to directly recover data transmission in dataplane. DFF is based on strong mathematical analysis over the backup path construction, which can find the 3-approximate and (1+2ε)-approximate (0 < ε < 1) backup paths that guarantee the data transmission performance both during and after failover; to install the constructed backup paths, the linkID-based FF group table installation method is proposed to avoid forwarding rule conflicts and reduce storage costs. The simulation was conducted on six IEEE benchmark test power systems, and the results of the forwarding rule storage cost, path failover delay, and end-to-end transmission delay show that the proposed DFF mechanism can reduce both the failover delays and end-to-end transmission delays after failover compared with the existing failover methods. Due to the simplicity and efficiency of DFF, it has potential to be practically applied in the SDN-enabled WAMS to achieve resilient data transmission. Corresponding DFF mechanisms for multiple link failures will be investigated in future work.