Software Deﬁned Network Dynamics via Diﬀusions

. Software-Deﬁned Networks (SDN) dynamically modify the paths of Internet ﬂows in response to the quality of service or security needs, and hence frequently modify traﬃc levels at network routers. Thus network routers often operate in the transient regime, rather than at steady-state, with signiﬁcant impact on packet loss probabilities and delay. We, therefore, investigate the time-dependent performance of a small network of routers, modelled as G/G/1/N queueing stations. A diﬀusion approximation is developed to predict the quality of service of the routers in the transient regime. Numerical examples show that the results in the transient regime can diﬀer very signiﬁcantly from the steady-state results, and therefore that the transient analysis must be taken into account in evaluating the performance of routers in a SDN network.


Introduction
The performance of computer networks since its beginnings, was investigated [1,2] via networks of node queues that contain packets, representing interconnected routers that forward packets from source to a destination over several hops or routers. These models are used to compute in steady-state, the network delays and loss probabilities in routers, and to predict or optimise the overall transmission quality of service. The evolution of computer networks has resulted in new architectures, methods and models have been adapted and new parameters introduced.
The increased use of the Internet to carry voice traffic [3], as well as the Internet of Things, Cloud, Fog and Edge computing [4] brings new challenges by increasing the variety of network architectures and the complex stochastic nature of the transmitted flows. Also, the increased use of SDN controllers inside networks [5][6][7] creates frequent changes in traffic patterns and paths, and hence dynamic changes also in the traffic intensity of different paths and of the traffic carried by routers in the network.
it provides numerical results which are difficult to obtain with other techniques [31]. While the approach for steady-state distributions was introduced decades ago [32,33] and applied to numerous problems, including to admission control in industrial telecommunication systems [34], and active queue management for non-integer PID controllers in IP routers [35], the transient analysis is more challenging and requires carefully crafted analtical techniques which we use in this paper.
The features which are in favour of the method are that the diffusion model of a single server allows general interarrival and service time distributions for realistic network data, going beyond conventional discrete Markov models. Results are obtained as queue length and waiting time distributions, simplifying the analysis of QoS parameters such as delay, jitter and loss probability. Also, networks may be hierarchical with any topology and number of nodes, fitting well the diffusion approximation, which is scalable and decomposes network analysis into individual nodes.
The numerical examples that we exhibit show that the transient regime can differ very significantly from the steady-state results. Since SDN controllers frequently change the state of the network paths, including the resulting traffic intensities and the load of each router, our numerical results show that the transient analysis will be indispensable in evaluating the performance of routers in a SDN network.

Contributions of this paper
In this paper, we develop the transient analysis of the diffusion approximation approach for small multi-node networks in which SDN controllers' decisions cause frequent changes of the paths of flows and hence of the traffic carried by different forwarders or routers. We compute the dynamics of queue length distributions, queueing delays, and loss probabilities as a function of changes in traffic intensity, based on solving a system of partial differential equations for the queue length distributions and the delays of several interconnected nodes as a function of time. The analysis we develop allows us to compute a network's transient behaviour, and the time it takes for a network to reach its new steady-state after the input traffic rates change. We can also compute the transient and steady-state packet loss probabilities in cases when they may be small, and hence very difficult to obtain via discrete event simulations.
We compute the diffusion transient state step-by-step in short time intervals with parameters which are specific to each of these intervals. Thus SDN routing traffic decisions can easily be reflected in successive changes to time-dependent and state-dependent diffusion parameters. Transient path delay averages can also be computed from transient node delay averages.
The use of these results are illustrated with two applications: -In all networks, and in particular those carrying IoT traffic, short and intermittent packet sequences carrying measurement data need to be conveyed rapidly towards a destination. At the same time, when SDN controllers are used, long traffic sequences at higher traffic rates may be re-allocated between paths for purposes of traffic balancing and may disrupt the QoS of the short sequences. The question then is to determine when the short sequences should be forwarded, e.g. just after a major long connection ends, or just when it begins. Intuitively speaking, one may wish to wait for the end of the longer and higher traffic sequence in order to obtain a better QoS for the short packet sequence. The transient analysis recommends that the short sequence be superposed on the longer higher rate traffic just as the latter begins, which is counterintuitive, as discussed in Section 4.1. -The second example is given in Section 4.2, and shows how Service Level Agreements (SLA) are strongly affected by the use of precise transient analysis rather than steady-state analysis.

Transient Analysis
The diffusion approximation replaces the number of packets in a queueing system by the real-valued valued diffusion process X(t) ∈ [0, N ] where N is the maximum size of the queue. Following the approach in [8], resulting in equations (1), at the extremities of the interval x = 0 and x = N , two absorbing barriers are placed so that when X(t) reaches a barrier, it stays there for a random time and jumps from x = 0 to x = 1 with intensity λ and from x = N to x = N − 1 with intensity µ. The resulting diffusion equation is: where δ(x) is the Dirac delta function, and f (x, t; x 0 ) = P [x ≤ X(t) < x + dx | X(0) = x 0 ] of X(t); p 0 (t) and p N (t) denote probabilities that at time t the process is in barriers at x = 0, x = N , respectively. The incremental changes of X(t), dX(t) = X(t + dt) − X(t) are normally distributed with mean βdt and variance αdt where β, α are coefficients of the diffusion equation, where β = (λ − µ) and α = (σ 2 A λ 3 + σ 2 B µ 3 ); 1/λ and 1/µ are the mean interarrival and service times, and σ 2 A , σ 2 B are the variances of the interarrival and service time, respectively.
To determine the solution of (1) we use the following appoach from [36]. First we consider a diffusion process with two absorbing barriers at x = 0 and x = N , started at t = 0 from x = x 0 . Its probability density function φ(x, t; x 0 ) has the following form [37]: where , If the initial condition is defined by a function ψ(x), x ∈ (0, N ), lim x→0 ψ(x) = lim x→N ψ(x) = 0, then the probability density function (pdf) of the process is The probability density function f (x, t; ψ) of the diffusion process with elementary returns is composed of the function φ(x, t; ψ) referring to the diffusion process before it reaches any barrier, and of a spectrum of functions φ(x, t−τ ; 1), φ(x, t − τ ; N − 1). The latter functions represent diffusion processes with absorbing barriers at x = 0 and x = N , started with densities g 1 (τ ) and g N −1 (τ ) at time τ < t at points x = 1 and x = N − 1 due to jumps from the barriers: where the densities g 1 (τ ), g N −1 (τ ), as well as p 0 (t) and p N (t), are obtained from the probability balance equations at the barriers. The delay through the queue, including waiting and service time, is then obtained as a first passage time from an initial point taken with probability given by f (x, t; ψ) to the absorbing barrier placed at x = 0.

Transient Analysis of a Network
Consider a network of M stations with general service time distributions and routing probabilities r ij . We first decompose the network by determining the input flows at each station and then apply the single server model to each station separately.
In the transient state the input flow λ i−in (t) of any station i and its output flow λ i−out (t) = (1 − p 0 (t))µ i are different. The traffic equations balancing the flows of stations are where the first term λ 0i represents traffic flow coming directly to station i from the outside of the network. Denote by f Aj (x, t) and f Bj (x, t) the density functions of interarrival and service times at station j; the pdf f Dj (x, t) of the interdeparture times from this node at time t is where * denotes the convolution and i = 1, . . . , M . The first term of (5) represents the interdepature times of packets when the node is busy, and the second term gives the interdeparture times when it is idle. The formula (5), known as Burke's theorem, is exact for Poisson input and approximate in other cases. From (5) we have: Packets leaving the node j according to the distribution f Dj (x, t) choose any node i with probability r ji and the times between packets routed from node j to i has pdf: The variance of f ji (x, t) allows us to determine the variance of the number of customers going from station j to i, and after summing over all stations sending packets to station i we receive where the parameters λ 0i and C 2 0i refer to the flow coming to i from outside of the network, and (6), (8) form a system of linear equations yilding C 2 Ai (t) and, in consequence, the diffusion parameters β i (t), α i (t) for evry node i.
If f Ri (x, t) is the response time pdf at node i, then the response time pdf for the path 1, . . . , n is

Packet Service Time for a SDN Data Plane Router
In SDN, routes are selected by one or more SDN controllers, while the SDN data plane routers are forwarding devices that follow the rules given by the controller. The centralisation of network intelligence and management in the controller enables a global network view, network programmability and deployment of innovative approaches such as smart cognitive routing [27].
Since the input and output hardware of an SDN forwarder is fast, the actual forwarding time between nodes can be neglected. However, the main component of service time that needs to be considered is the time during which the node's hardware identifies -for each successive packet -the flow to which the packet belongs, and in which output port the packet must be placed: -Assume that the identification of the flow is conducted as a linear search in a flow table withK entries (i.e. the number of flows), and that T is the time to check one entry. If p is the probability that the router's flow table does not contain the flow rule for a given packet, this will be discovered after going through all Kpositions, i.e. after time KT which is a constant service time with zero variance.
-Otherwise, with probability (1 − p), the time to find the existing entry is uniformly distributed in [T, KT ] for a simple linear search, since the packet is equally likely to belong to any of the flows, with mean (K+1)T 2 and variance -As a result, the service time S has the following mean and variance:

Numerical Examples
We consider a network composed of 4 forwarders. Host 1 is sending packets to Host 2 through forwarders S1-S2-S4 or S1-S3-S4, with routing probabilities r 12 = 0.15, r 13 = 0.85. The packet traffic rate from Host 1 is denoted λ(t) in the range of 500 to 2500 packets/sec, and the changes in the traffic rate the pattern of is displayed in blue in all the figures. It is the total flow sent by S1; the flows of S2 and S3 are defined by the routing probabilities. The duration of the time interval being considered is 1 second. Traffic data from the CAIDA traces [38] concerning IPv4 packet interarrival times from the Equinix Chicago link, collected during one hour on 18 February 2016, gave C 2 A1 = 1.02 with over 22 million packets belonging to over 1:17 million IPv4 flows. The computations were also carried out for four and eight times larger, values C 2 A1 to see how these variations influence the network's performance. We assume that the switches store K = 950 flows and that the time to examine one flow in the flow table is T = 8 · 10 −7 sec with p = 0. The resulting mean service time is S = 1/µ = 0.038msec or µ = 2, 631.5 packets/sec and C 2 B = 0.33. The buffer capacity per flow is N = 100. These values are compatible with existing equipment but may vary with the type of router.
The transient solution is obtained numerically for 100 successive sub-intervals of the length 10 msec in 100 sub-intervals with fixed diffusion parameters in each sub-interval. At the end of each sub-interval (4), (8) are solved to determine new parameters of the flow for the single station models in the next interval. The density function f i (x) obtained for any station i at the end of an interval gives the initial conditions for the diffusion equation at the next interval. Figure 1 and 2 present f (x, t : ψ) given by (3) for station S1. In the first case, the buffer is relatively empty, i.e. the probability of the queue size being close to N is of the order of 10 −30 or 10 −70 . Note that the probability scale P N for S1->S2->S4 P N for S1->S3->S4 λ 500 1000 1500 2000 2500 λ Fig. 10. Packet Loss Probability with C 2 A1 = 1.02, for paths S1-S2-S4 and S1-S3-S4 paths is logarithmic, and the method has no difficulty to compute such small values, which would be impossible in a simulation model. We see also the impact of C 2 A1 which affects α in the diffusion equation; a greater C 2 A1 increases the probability of larger queue lengths. Figures 3 and 4 show the changes in the utilisation rate i (t) of stations and the interdeparture time C 2 D1 from (6), as λ(t) is varied, showing that (t) and C 2 D1 follow the variations in λ(t) closely. Figures 5 and 6 refer to station S1 and shows the variation of mean queue length for various values of C 2 A1 . The queue length increases considerably with the variance of interarrival times, and the duration of the transient period also increases. Higher utilisation rates due to larger λ lead to a longer transient period. For large λ, the steady-state is not attained before the next change of the input flow rate. The right-hand figure illustrates the significant influence of C 2 A1 on the loss probabilities, which are presented in logarithmic scale. Figures 7 and 8 present the dynamics of the average queue length and delay at stations S1, S2, S4, and the total average delay along the path S1-S2-S4. Again, we see the influence of the traffic rate on the queue length and the duration of the transient period. For higher traffic rates, the transient time is, in general, longer than the periods between changes in λ so that steady-state is never reached. Figures 9 and 10 display the loss probabilities for each station separately, and the total loss probability for two paths S1-S2-S4, S1-S3-S4, which are practically the same and they superimpose. These figures also illustrate the method's ability to compute very small probabilities of the order of 10 −200 .

Scheduling Short Sequences of Packets
Consider a short sequence of a few packets (of very low traffic rate) containing measurement traffic, emanating for instance from some IoT sensor, which is very sensitive to packet loss. Suppose the SDN controller establishes a connection, to be able to forward these packets at time 0.4 seconds along path S1− > S2− > S4. At the same time, the SDN controller also establishes a connection for another flow totalling 1000 packets/sec. Suppose also that when this connection was established this same path was already carrying 1500 packets/sec.
The question then is whether the source of the low data rate measurement traffic should wait for the high data rate traffic to end before forwarding its traffic. Of course, waiting has the disadvantage of incurring the waiting delay, which may be quite long. However, if the data traffic is very sensitive to losses, it may be better to wait for the high traffic rate flow to end.
The analysis that is illustrated in Figure 10 tells us that the low data rate IoT source should not wait at all. It should send its traffic right away as soon as the path S1− > S2− > S4 is established for the IoT traffic at time 0.4 seconds, simply because of the transient behaviour of the packet loss probability which is less than 10 −13 in the period just after time 0.4 seconds. In addition to avoiding the wait for the high traffic rate to end, the instant when the high traffic rate ends at time 0.7 seconds and the following 0.1 seconds will have a much higher packet loss probability of the order of 10 −7 due to the transient effect in packet loss. In addition, the transient analysis of Figure 8 also tells us that the mean wait delay incurred by the packets is of the order of 0.004 seconds, rather than the 0.3 seconds that would be wasted waiting for the high traffic rate to subside.
Obviously, this type of very useful insight about the transients in packet loss and delay can only be obtained via the transient analysis tools developed in this paper.

The Effect of Transients on Service Level Agreements
A high priority customer of the network needs to send a flow of packets every 30 minutes from node 1 to node 4 during a short time window of ∆ = 200 milliseconds. The customer indicates that the traffic rate λ 1 may take a fixed value between 2500 and 3200 packets/second, and that that the inter-arrival time distribution for packets will have a squared coefficient of variation which does not exceed C = 2.
-The customer has stringent QoS constraints so that the network operator must abide by a Service Level Agreement (SLA): the total average packet delay through the path must not exceed W m = 0.01 during the connection, and the total average packet loss over the path must not exceed L m = 150. -Due to the existing network topology, to reach node 4, the traffic must travel through the three identical router sS1− > S2− > S4. The network operator will program a SDN controller to set up a private 3-node path S1− > S2− > S4 for this flow, at the beginning of each successive 30 minute interval, and reserve it (empty of other traffic) for 200 milliseconds. After 100 milliseconds, this flow stops sending packets, and the SDN controller can re-allocate the path to other flows. -The operators current routers have a 2000 packet/second forwarding capacity. The operator would like to know if this is sufficient, or whether she/he should upgrade the hardware to higher available speeds of 2300, or 2600 or 2900 packet/second.
Let t = 0 be the beginning of an interval of length ∆, and N i (t) be the average number of packets from the flow in router i, at t seconds after the beginning of the flow. Let: where W max , L max are the worst case values of the packet delay and loss for the flow rate, starting at t ∈ [0, ∆] Due to the existing network topology, to reach Node 4 the traffic must travel through the three identical routers S1− > S2− > S4. After ∆ = 200 milliseconds this particular flow stops sending packets, and the SDN controller can re-allocate these nodes to other flows. We compute where N is the router buffer size, and obtain W max , L max as a function of λ 1 for the worst case value C 2 A1 = 2, and different router speeds, to see which speed is needed.
We see that the transient analysis allows us to operate safely with routers having a capacity of µ = 2000packets/sec, while the steady-state analysis over-estimates the average delay and loss probability by over 100%, and recommends upgrading the routers to a packet processing capacity of at least 2600 packets/sec.

Conclusions
This paper considers a small network of routers or forwarders in a network controlled by a SDN controller which makes changes to paths through the network, and regularly modifies traffic rates at different routers. Our purpose is to evaluate the effect of transient effects and in particular their importance relative to usual steady-state analyses which have been studied. We, therefore, develop a computationally efficient diffusion approximation method, present its analytical solution, and implement the numerical techniques needed for the transient computations. Using realistic parameters, including relatively low-frequency changes of paths by SDN controllers every 100ms, we show that the system may seldom reach steady-state when the network is moderately to heavily loaded which is the region of interest for performance modelling and optimisation studies. Though at light loads transients are short, at moderate to heavy loads transients are significant for individual node and path delays, and packet loss probabilities. The method we have developed is operational and gives quantitative results for models with realistic parameters. Numerical examples for single and multiple node models provide the dynamics of queue lengths, delays and packet loss and their dynamics in response to changes in the flow intensity and the variance of interarrival times. Through numerical examples, we also show how transient analysis provides insights that are more accurate and cover cases where the steady-state analysis would provide wrong or incomplete results. Thus we conclude that transient analysis can play a major role in the performance evaluation of SDN networks and should be incorporated into SDN controls and that diffusion approximations can be useful and computationally efficient for this purpose.