Optimal direct load control of renewable powered small cells: A shortest path approach

In this letter, we propose an optimal direct load control of renewable powered smaller base stations (SBSs) in a two‐tier mobile network based on dynamic programming (DP). We represent the DP optimization using Graph Theory and state the problem as a Shortest Path search. We use the Label Correcting Method to explore the graph and find the optimal ON/OFF policy for the SBSs. Simulation results demonstrate that the proposed algorithm is able to adapt to the varying conditions of the environment, namely renewable energy arrivals and traffic demands. The key benefit of our study is that it allows elaborating on the behavior and performance bounds of the system and gives guidance for approximated policy search methods.


INTRODUCTION
The fifth-generation (5G) mobile network is expected to support 1000 times more data volume per unit area, 100 more user data rate, 1000 more connected devices, 1/10 lower energy consumption, 1/5 lower end-to-end latency, 1/5 lower cost of network management, 10 longer device battery life, and 1/1000 lower service deployment times than fourth-generation (4G). A new architecture and new network deployments are thus necessary to satisfy such requirements. One of the most promising approaches is to densify the radio access network by deploying smaller base stations (SBSs), which may, in turn, enhance capacity and coverage of the macrocells. This approach implies the use of a high number of devices, which may drain a significant amount of energy from the power grid. This is in contrast with the energy consumption requirement of 5G networks. However, the reduced consumptions of these devices encourage the use of renewable energy sources (RESs) as distributed power suppliers. 1 This approach will allow to reduce (1) the energy drained from the power grid, (2) the carbon footprint, and (3) the cost due to the energy bills. 2 The introduction of RES entails an intermittent and erratic energy budget for the communication operations of the SBSs. Therefore, Demand Response is needed to properly manage energy inflow and spending, based on the traffic demand. In particular, SBSs may install self-organizing agents, which enable intelligent energy management policies, such as Direct Load Control. 3 In our previous work, 4 a two-tier architecture with hybrid power suppliers is introduced: macro base stations (BSs) reside in the first tier to provide baseline coverage and capacity and are powered by the electrical grid, whereas SBSs operate in the second tier to provide capacity enhancement and are supplied by solar panels plus batteries. The data traffic offloaded by the SBSs has higher spectral efficiency and allows a reduction of the energy drained from the grid. In Reference 4 , we have also introduced a distributed Q-learning algorithm to direct control the load of the renewable powered SBSs. However, no proof of optimality is given in the paper. A similar resource allocation problem has been solved in Reference 5 by using a two-stage dynamic programming (DP) algorithm. Although the authors propose an optimal solution, the problem is stated for a single-tier architecture.
This letter is filling the encountered gaps in the literature by the following contributions: (1) We formulate the problem of optimal direct load control of a two-tier mobile network based on DP. DP has the key property to apply optimal control as a trade-off between the present cost and the future expected costs. This feature is fundamental in our scenario to prevent SBSs blackout during periods with low renewable energy arrivals and high traffic demands. (2) We provide a graphical representation of the problem and, we use Graph Theory to model it and the Shortest Path methods to find the optimal ON/OFF policy for the SBSs. (3) We provide numerical results and find the optimal policy considering 2 different traffic profiles. Finally, we compare our solution with a greedy approach.

Network model
We consider the radio access network as a set of clusters. Each cluster is composed of 1 macro BS and C SBSs. The macro BS is connected to the electrical grid and each SBS is powered by a solar panel plus a battery. The SBSs have implemented an intelligent energy management, which automatically decides their operative state. Each SBSs can serve the users in its coverage (also referred to as ON state) or be in an energy saving mode, in which the users in its coverage have been handed over the macro BS (also referred to as OFF state). We as the vector representing the state of the C SBSs at time t. Each element S (i) t , with i = 1, ..., C, is defined as follows: The energy harvested by the SBSs at time t is indicated by the vector , while the amount of energy stored in the SBSs batteries at time t is indicated by the vector The BS energy consumption is approximated by the linear function P = P 0 + , where P 0 is the baseline power consumption and ∈ [0, 1] is the normalized traffic load. Typical values are P 0 = 750W, = 600 for macro BS and P 0 = 105.6W, = 39 for SBSs. This model is supported by real measurements and closely matches the real power profile of BSs. 6 The traffic load vector indicates the traffic level of the SBSs at time t. In particular, we consider a long-term evolution (LTE) radio access network with a transmission bandwidth BW divided into R resource blocks (RBs) of 1 ms each. 7 Each SBS has a set U i of associated users. If the SBS i is OFF at time t, we assume (i) t = 0, and that its users are managed by the macro BS. However, the macro BS may have reached its capacity limit at that time instant (ie, cannot allocate any RB to users) and may drop part of the handed over users. We define this situation as system outage.

Optimization problem
The system evolves in cycles, based on the variation of the traffic demand and the energy arrivals in time. At each cycle t, a controller decides the optimal configuration of the cluster in terms of ON/OFF states of the SBSs. We model the sequential decision-making process as a DP optimization problem, whose objective is to minimize the energy consumed by the macro BS and the traffic drop rate of the system. Considering the linear relation between the energy consumption and the BS load, the objective is converted into the minimization of the macrocell load over a given time horizon, by offloading the traffic to the renewable powered SBSs. The controller must also prevent damages of the storage devices and SBS blackout by maintaining the battery levels above a given threshold.
The optimization problem is formulated as follows: K is the time horizon or the number of times the control is applied and f (S t , t) is defined as follows: where Graph showing the ON-OFF sequence possibilities in the case of a cluster with 2 SBSs. Green nodes represents ON states, red nodes represents OFF states. The two dashed nodes indicate the artificial nodes , is the load of the macro BS given the SBSs states and the time instant t. Its values are normalized.
is the traffic drop rate of the system, given the state of the SBSs and the time instant t. Its value ranges from 0 (when all the traffic is served by the system) to 1 (when all the traffic is dropped by the system).
Finallly, the 2 weights must always sum to 1, that is, w 1 + w 2 = 1. At each decision instant t, the battery levels of the SBSs are updated according to the following formula: where B cap is the maximum battery capacity. This basically means that the amount of energy exceeding the battery capacity cannot be stored and it is wasted.

Graphical representation
We represent the DP optimization problem as a graph. A node i at time t in the graph (N i t ) represents a possible combination of states of the SBSs in the cluster. Each combination returns a different level of the batteries of the SBSs.
In Figure 1, a cluster of 2 SBSs is represented. In the first time step (t = 1), the SBSs can be in one of the four combinations of ON (green) / OFF (red) states. At each cycle t, the energy harvesting and traffic processes are evolving, based on E t and t . Each node N i t generates 4 child nodes N j t+1 , as possible combinations at the cycle t + 1. The battery levels of the child nodes N j t+1 are calculated based on (4) and each arc connecting 2 nodes has a cost given by (3). The number of combinations is then evolving in time till reaching its maximum at time instant K. Two artificial nodes have been added at time step t = 0 and t = K + 1 to have a single initial node and a single terminal node. The cost associated to the arcs connecting the artificial nodes is set to zero.
The cost associated to each arc, f (S t , t), may be interpreted as the length of the corresponding arc. In this case, the problem of minimizing the total cost is equal to the problem of finding the path with the minimum length from the initial to the terminal node.

SHORTEST PATH METHOD
The problem to find the shortest path between the initial and the terminal node involves a very large number of nodes. However, most of these nodes are unlikely candidates for inclusion in the shortest path. Therefore, considering that we deal with a single initial and terminal node, and that each arc has a positive cost, we use the Label Correcting Algorithm described in Reference 8 to achieve an efficient exploration method. We define 3 variables: d i , called label of i, as the length of the shortest path to the node i, OPEN as the list of nodes to be explored and UPPER as the last found minimum-length path.
The graph is explored in a depth-first fashion. The idea is to progressively discover shorter paths from the initial node to the internal nodes i till reaching the terminal node, and to maintain the length of the shortest path found so far in the variable d i . Each time d i is reduced following the discovery of a new shorter path to i, the algorithm checks to see if the labels d j of the children j of i can be corrected, that is, they can be reduced by setting them to d i + a ij , where a ij is the arc(i, j).
The list OPEN contains only the nodes that are candidates for further examination and possible inclusion in the shortest path. More specifically, we exclude from the list all those nodes that cannot satisfy the constraint on the battery and return a minimum path longer than UPPER. This exploration policy avoids to explore the whole graph (ie, 2 C (2 C⋅K −1) 2 C −1 + 2 nodes) and requires relatively little memory, as described in Reference 8 , especially in the case of graphs with a tree-like structure, as in our case.
The algorithm steps are detailed in Algorithm 1.

Simulation scenario
We consider a square area with a side of 1 km. The macro BS is located at the center of the area and SBSs are randomly positioned. The coverage areas of the SBSs do not overlap. Aggregated downlink traffic has been generated based on the profiles defined in Reference 9 . In particular, we have used Resident and Transport profiles in our simulations. User traffic is based on the classification proposed in Reference 10 . Realistic energy harvesting traces are obtained using the SolarStat tool, 11 considering the city of Los Angeles. All the simulations have been performed considering a typical week of March and the cycles of the algorithm are of 1 h. More simulation parameters may be found in Table 1.

Optimal time horizon
Here, we empirically analyze the optimal duration of the time horizon K to achieve the minimum cost. This parameter may give an idea on the temporal correlation among the control actions.   Figure 2 represents the amount of energy drained from the grid by the macro BS in 1 week for different dimensions of the horizon K (Figure 2A) and the algorithm complexity, in terms of number of iterations, over the time horizon K ( Figure 2B) for a single SBS within the coverage area of the macro BS.
The time horizon K = 21 represents a turning point for both grid energy and algorithm complexity: the energy drained from the grid approaches an asymptote and the number of iterations explode to higher values quasi-exponentially. Simulations performed in scenarios with multiple SBSs show the same behavior in terms of energy drained, number of iterations and K. Therefore, we state that a time horizon of about 21 h represents a good trade-off between network performance and algorithm complexity.

Optimal policies
In this subsection, we elaborate on the optimal ON/OFF policies for the SBSs and, in particular, on their adaptation to the variables of the environment, that is, traffic demands and renewable energy arrivals. We also describe the intelligent battery management of the proposed solution.
In Figure 3, the temporal behavior of the ON/OFF policies of 5 SBSs is represented for Resident ( Figure 3A) and Transport ( Figure 3B) profile, respectively. In particular, the traffic offloaded by the 5 SBSs is shown in green and in yellow that served by the macro BS. The dashed-red curves are the renewable energy arrivals.
As a general rule, valid for both traffic profiles under study, the optimal policy attempts to offload traffic peak periods to SBSs and switch SBSs off during deep night hours (2:00 AM-6:00 AM and 1:00 AM-5:00 AM for the Resident and Transport profile, respectively). In fact, during traffic peak periods, the macro BS cannot serve the whole demand and the system may be in outage. On the other hand, when energy arrivals and traffic demands are low (deep night) the macro BS can serve that little amount of traffic without system outage and the SBSs can save the energy in their batteries.
When the renewable energy arrival process is scarce, we notice different behaviors based on the traffic profile. The first day (P 1 in Figure 3) experiences the lowest energy arrivals of the week. We can notice that in the Resident case, the offloaded traffic in the morning (7:00 AM-1:00 PM) is not that much as in the other days. This is due to the fact that SBSs need to be OFF and store the necessary energy in the batteries to serve the traffic in the evening peak period and avoid system outage. On the other hand, in the case of Transport profile, traffic is completely offloaded during the 2 peak periods and then SBSs are gradually switching off starting from 9:00 PM. This allows saving energy and using it during the morning peak of the second day. In the third and the fifth day (P 2 and P 3 in Figure 3, respectively), the renewable energy arrival is scarce, even though not at its minimum. We notice that, in the case of Resident traffic, some SBSs need to be turned off (one in the third day and two in the fifth day, respectively) to save energy in the afternoon and use it to avoid system outage during the evening peak. In the case of Transport profile, the SBSs do not follow the same behavior because of the lower total traffic demand.
Finally, we compare our optimal solution with a greedy algorithm that switches off a SBS when the battery level is below a threshold B th and turn a SBS on when the battery is above B th . While our proposal is able to adapt to the varying traffic and energy conditions without producing system outage, the greedy approach is not able to serve the whole traffic demand at every hour of the the day: from our simulations, it drops up to 67% and 59% of the hourly traffic in case of Resident and Transport profile, respectively.

CONCLUSIONS
In this letter, we have proposed an optimal direct load control of renewable powered small base stations in a two-tier mobile network based on DP. In particular, we have represented the DP optimization using Graph Theory and state the problem as a Shortest Path search. Finally, we have implemented the Label Correcting Method to explore the graph and find the optimal ON/OFF policies for the SBSs. Numerical results demonstrate that the proposed algorithm is able to find the optimal policies for the SBSs for different traffic profiles and renewable energy arrivals. The study presented here lays the basis for understanding the system behavior and its performance bounds. Finally, it also gives guidance for online optimization and approximated policy search methods.