Wake-Up Radio based Access in 5G under Delay Constraints: Modeling and Optimization

Recently, the concept of wake-up radio based access has been considered as an effective power saving mechanism for 5G mobile devices. In this article, the average power consumption of a wake-up radio enabled mobile device is analyzed and modeled by using a semi-Markov process. Building on this, a delay-constrained optimization problem is then formulated, to maximize the device energy-efficiency under given latency requirements, allowing the optimal parameters of the wake-up scheme to be obtained in closed form. The provided numerical results show that, for a given delay requirement, the proposed solution is able to reduce the power consumption by up to 40% compared with an optimized discontinuous reception (DRX) based reference scheme.


I. INTRODUCTION
In order for the emerging fifth generation (5G) mobile networks to satisfy the ever-growing needs for higher datarates and network capacities, while simultaneously facilitating other quality of service (QoS) improvements, computationallyintensive physical layer techniques and high bandwidth communication are essential [2], [3]. At the same time, however, the device power consumption tends to increase which, in turn, can deplete the mobile device's battery very quickly. Moreover, it is estimated that feature phones and smartphones consume 2 kWh/year and 7 kWh/year, respectively, based on charging every 60 th hour equal to 40% of battery capacity every day and a standby scenario of 50% of the remaining time [4]. Also, the carbon footprints of production of feature phones and smartphones are estimated to be 18 kg and 30 kg CO2e per device, respectively, which still is the major contributor of CO2 emission of mobile communication systems [4].
In general, battery lifetime is one of the main issues that mobile device consumers consider important from device usability point of view [5]. However, since the evolution of battery technologies tends to be slow [6], the energy efficiency of the mobile device's main functionalities, such as the cellular subsystem, needs to be improved [5], [7], [8]. Furthermore, since the data traffic has been largely downlink-dominated [9], the power saving mechanisms for cellular subsystems in receive mode are of great importance.
The 3rd generation partnership project (3GPP) has specified discontinuous reception (DRX) as one of the de facto energy saving mechanism for long-term evolution (LTE), LTE-Advanced and 5G New Radio (NR) networks [10]- [13]. DRX allows the mobile device to reduce its energy consumption by switching off some radio modules for long periods of time, activating them only for short intervals. To this end, the modeling and optimization of DRX mechanisms have attracted a large amount of research interest in recent years. The authors in [14] proposed an adaptive approach to configure DRX parameters according to users' activities, aiming to balance power saving and packet delivery latency. Koc et al. formulated the DRX mechanism as a multi-objective optimization problem in [15], satisfying the latency requirements of active traffic flows and the corresponding preferences for power saving. In [16], DRX is modeled as a semi-Markov process with three states (active, light-sleep, and deep-sleep), and the average power consumption as well as the average delay are calculated and optimized. Additionally, the authors in [17] utilized exhaustive search over a large parameter set to configure all DRX parameters. Such a method may not be attractive from a computational complexity perspective for real-time/practical applications, however, it provides the optimal DRX-based power consumption and is thus used in this article as a benchmark.
To improve the device's energy-efficiency beyond the capabilities of ordinary DRX, the concept of wake-up radio based access has been discussed, e.g., in [6], [18], [19]. Specifically, in the cellular communications context, the wakeup radio based approaches have been recently discussed and described, e.g., in [20] and [21]. In such concept, the mobile device monitors only a narrowband control channel signaling referred to as wake-up signaling at specific time instants and subcarriers, in an OFDMA-based radio access systems such as LTE or NR, in order to decide whether to process the actual upcoming physical downlink control channel (PDCCH) or discard it. Compared to DRX-based systems, this directly reduces the buffering requirements and processing of empty subframes as well as the corresponding power consumption. Furthermore, in [20], the concept of a low-complexity wakeup receiver (WRx) was developed to decode the corresponding wake-up signaling, and to acquire the necessary time and frequency synchronization. A wake-up scheme that enhances the power consumption of machine type communications (MTC) is introduced in 3GPP LTE Release-15 [22], which is based on a narrowband signal, transmitted over the available symbols of configured subframes. It is also considered as the starting point of NR power saving study item in 3GPP NR Release-17 [23].
In general, the existing wake-up concepts and algorithms, such as those described in [6], [18]- [21], [24]- [32], build on static operational parameters that are determined by the radio access network at the start of the user's session, and kept invariant, even if traffic patterns change. Accordingly, methods to optimize such parameters that characterize the employed wake-up scheme are needed, to further reduce energy consumption according to the traffic conditions. This is one of the main objectives of this paper. Specifically, the main contributions of this paper are as follows. Firstly, the wake-up radio based access scheme is modeled by means of a semi-Markov process. In the model, we consider realistic WRx operation by introducing start-up and power-down periods of the baseband unit (BBU), false alarm and misdetection probabilities of the wake-up signaling, as well as the packet service time. With such a model, the average power consumption and buffering delay can be accurately quantified and estimated for a given set of wake-up related parameters. Secondly, by utilizing such a mathematical model, the minimization of terminal's power consumption under Poisson traffic model is addressed for a given delay constraint. As a result, a closed-form optimal solution for the operational parameters is obtained. Furthermore, the range of packet arrival rates, for which the wakeup scheme is suitable and energy-efficient, is determined. Finally, simulation-based numerical results are provided in order to validate the proposed model and methods as well as to investigate the power consumption of our proposed solution compared to the optimized DRX-based reference mechanism proposed in [17]. The approach described in [17] is selected as the benchmark since it provides the optimal power consumption in DRX-based reference systems. Furthermore, to the best of our knowledge, virtually all of the DRXbased literature ignores the start-up and power-down energy consumption, and [17] also neglects the packet service time. Therefore, we have modified the approach in [17] slightly in order to consider such additional energy consumption in the optimization.
The rest of this paper is organized as follows. Section II presents a brief review of the considered wake-up scheme and its corresponding parameters, and defines basic system assumptions. In Section III, we model the wake-up based access scheme by means of a semi-Markov process and derive the power consumption as well as the buffering delay. Then, building on these mathematical models, in Section IV, the optimization problem is formulated and the optimal solution for minimum power consumption is found in closed-form. These are followed by numerical results, remarks, and conclusions in Sections V, VI, and VII, respectively. Some details on the analysis related to the modeling of power consumption are reported in the Appendix. For readers' convenience, the most relevant variables used throughout this paper are listed in Table I. Terminology-wise, we use gNB to refer to the basestation unit and UE to denote the mobile device, according to 3GPP NR specifications [10]. minimum feasible wake-up cycle over boundary constraint t * w optimal value of wake-up cycle t * i optimal value of inactivity timer λt turnoff packet arrival rate η relative power saving factor φ power consumption ratio of UE at S2 and S3

II. BASIC WAKE-UP RADIO CONCEPT AND ASSUMPTIONS
In the considered wake-up radio based scheme, or wake-up scheme (WuS) for short, as presented in [20], the mobile device is configured to monitor a narrowband wake-up signaling channel in order to enhance its battery lifetime. Specifically, in every wake-up cycle (denoted by t w ), the WRx monitors the so-called physical downlink wake-up channel (PDWCH) for a specific on-duration time (t on ) in order to determine whether data has been scheduled or not. Occasionally, based on the interrupt signal from WRx, the BBU switches on, decodes both PDCCH and physical downlink shared channel (PDSCH), and performs normal connected-mode procedures. The WuS can be adopted for both connected and idle states of radio resource control [20], and can be configured based on maximum tolerable paging delay that idle users may experience, or alternatively, based on the delay requirements of a specific traffic type at connected state.
The wake-up signaling per each WRx contains a singlebit control information, referred to as the wake-up indicator (WI), where a WI of 1 indicates the WRx to wake up the BBU, because there is one or multiple packets to be received, while a WI of 0 signals the opposite. Each WI is code multiplexed with a user-specific signature to selected timefrequency resources, as described in [20]. When a WI of 1 is sent to WRx, the network expects the target mobile device to decode the PDCCH with a time offset identical to that of start-up time (t su ). Fig. 1 (a) and (b) depict the basic operation and representative power consumption behavior of the conventional DRX-enabled cellular module and that of the cellular module with WRx, respectively, at a conceptual level. As illustrated, the WuS eliminates the unnecessarily wasted energy in the first and second DRX cycles, while also reducing the buffering delay compared to DRX. Due to the specifically-designed narrowband signal structure of WuS, the WRx power consumption (PW 1 ) is much lower than that of BBU active, either due to packet decoding (PW 2 ) or when inactivity timer is running (PW 3 ) [20]. The lowest power consumption is obtained in sleep state (PW 4 ). We consider that during the BBU active states, the power consumption due to packet decoding is larger than or equal to that of running inactivity timer (i.e., PW 2 ≥ PW 3 ), but both are fixed. For presentation purposes, we denote the ratio of such power consumption at BBU active states as φ = PW2 PW3 , where φ ≥ 1. In general, because NR supports wide bandwidth operation, packets can be served in a very short time duration. In addition, in case the user packet sizes are small, packet concatenation in NR is used, so that all buffered packets in a relatively short wake-up cycle can be served in a single transmission time interval (TTI). Accordingly, we assume that radio-link control entity (located at the gNB) concatenates all those packets arriving during the sleep state, and as soon as the BBU is triggered on, the device (UE) can receive and decode the concatenated packets for a service time of t s , which equals to a single TTI. During the serving time, if there was a new packet arrival, the BBU starts serving the corresponding packet by the end of t s . In case that there was no packet arrival by the end of t s , the UE initiates its inactivity timer with a duration of t i . After the inactivity timer is initiated, and if a new PDCCH message is received before the time expiration, the BBU enters the active-decoding state and serves the packet. However, if there is no PDCCH message received before the expiration of the inactivity timer, a sleep period starts, the WRx-enabled cellular module switches to sleep state, and WRx operates according to its wake-up cycle [20]. For reference, in case of DRX, the BBU sleeps according to its short and long DRX patterns [15], [33].
The introduction of a PDWCH has two fundamental consequences, namely misdetections and false alarms [20]. In the latter case, WRx wakes up in a predefined time instant, and erroneously decodes a WI of 0 as 1, leading to unnecessary BBU power consumption. The former, in turn, corresponds to the case where a WI of 1 is sent, but WRx decodes it incorrectly as 0. Such misdetection adds an extra delay and wastes radio resources. We denote the probability of misdetection and the probability of false alarm as P md and P f a , respectively. The requirements for the probability of misdetection of PDWCH are eventually stricter than those of the probability of false alarm [20].
One of the new features of 5G NR networks to reach their aggressive requirements is latency-optimized frame structure with flexible numerology, providing subcarrier spacings ranging from 15 kHz up to 240 kHz with a proportional change in cyclic prefix, symbol length, and slot duration [10]. Regardless of the numerology used, the length of one radio frame is fixed to 10 ms and the length of one subframe is fixed to 1 ms [2], as in 4G LTE/LTE-Advanced. However, in NR, the number of slots per subframe varies according to the numerology that is configured. Additionally, in order to support further reduced latencies, the concept of mini-slot transmission is introduced in NR, and hence the TTI varies depending on the service type ranging from one symbol, to one slot, and to multiple slots [2]. In this work, in order to provide consistent and exact timing definitions, different time intervals of the wakeup related procedures are defined as integer multiples of a TTI. Additionally, according to 3GPP, the packet service time (t s ) is one TTI, in which multiple packets can be concatenated. Furthermore, for the sake of clarity, a TTI duration of 1 ms is taken as the baseline system assumption for the WuS configurations, which then facilitates applying the proposed concepts also in future evolution of LTE-based systems.
Finally, it is important to note that from a system-level point of view, the configurable parameters of the WuS are the wake-up cycle (t w ) and the inactivity timer (t i ), whose values we will optimize in Section IV. The remaining parameters (t on , t pd , t su , t s ) depend on physical constraints and signal design, and accordingly we assume them to be fixed, i.e., the optimization will be done for fixed (given) values of t on , t pd , t su and t s .

III. STATE MACHINE BASED WAKE-UP SYSTEM MODEL
For mathematical convenience, the performance of the wake-up based system is studied and analyzed in the context of a Poisson arrival process with a packet arrival rate of λ packets per TTI. In the Poisson traffic model, each packet service session consists of a sequence of packets with exponentially distributed inter-packet arrival time (t p ) [33].
The power states of WuS are modeled as a semi-Markov process with four different states that correspond to WRx-ON (state S 1 ), active-decoding (state S 2 ), active-inactivity timer (state S 3 ), and sleep (state S 4 ), as shown in Fig. 2  to S 4 , otherwise (WI=1) it transfers to S 2 . At S 2 , UE decodes the packets for a fixed duration of t s ; if the device is scheduled before the end of t s , it starts decoding the new packet, and remains at S 2 , otherwise the device transfers to S 3 . At S 3 , t i is running, and if the device is scheduled before the expiry of t i , it enters in S 2 , otherwise the device transfers to S 4 . At S 4 , the device is in sleep state, and cannot receive any signal, as opposed to being fully-functional at S 2 and S 3 . Moreover, at the end of a wake-up cycle in sleep period, the UE moves to S 1 . As noted already in the previous Section II, each state is associated with a different power consumption level, PW k , k ∈ {1, 2, 3, 4}.
Transition probabilities: The transition probability from UE state S k to S l (P kl ) is defined as where S(τ n ) is the UE state at the τ n jump time 1 . When the UE is at S 1 , it moves to S 2 either because of false alarm or correct detection; otherwise, it moves to S 4 . Accordingly, P 12 and P 14 can be expressed as and When the UE is at S 2 , it decodes the packet for a duration of t s , and if the next packet is received before the end of the current service time, the UE starts decoding the new packet at the end of service time; otherwise, it moves to S 3 . Therefore, P 22 and P 23 can be obtained as and When the UE is at S 3 , it moves to S 2 if the next packet is received before the expiry of t i ; otherwise, it moves to S 4 . 1 In a semi-Markov process, S(t) is a stochastic process with a finite set of states (S 1 , . . . , S 4 in our case), having step-wise trajectories with jumps at times 0 < τ 1 < τ 2 ..., and its values at the jump times (S(τn)) form a Markov chain. Therefore, P 32 and P 34 can be expressed as and Finally, at the end of every sleep cycle, the UE decodes PDWCH, and therefore, Steady state probabilities: The steady state probability that the UE is at state S k (P k ) is defined as By utilizing the set of balance equations (P k = 4 l=1 P l P lk ) and the basic sum of probabilities ( 4 k=1 P k = 1), the P k 's can be obtained as follows Holding times: The corresponding holding time for state S k is denoted by ω k , k ∈ {1, 2, 3, 4}. The holding times for ω 1 , ω 2 and ω 4 are constant and given by: However, ω 3 is dependent on the interpacket arrival time (t p ). If a packet arrives before t i , ω 3 is equal to the inter-packet arrival time, otherwise ω 3 equals to t i . Therefore, ω 3 can be calculated as a function of t p as Hence, E[ω 3 ] can be expressed as where f p (t) = λe −λt is the probability density function of the exponentially distributed packet arrival time.

A. Average Power Consumption
The average power consumption of the UE, denoted by P c , can be calculated as the ratio of the average energy consumption and the corresponding overall observation period. It is given by Eq. (15) at the top of the next page, where t su and t pd correspond to the length of the start-up and powerdown stages (transition times), respectively. The corresponding average energy consumption of transitions between states are calculated as the areas under the power profiles of start-up and power-down stages, see Fig. 1, whose contribution to the average energy consumption is multiplied by its probability of occurrence (P 1 P 12 and P 3 P 34 , respectively), thus leading to 0.5P 1 P 12 t su (PW 2 − PW 4 ) and 0.5P 3 P 34 t pd (PW 3 − PW 4 ), respectively. For modeling simplicity, we assume that t on ≈ 0, PW 4 ≈ 0, P f a ≈ 0, and P md ≈ 0. Therefore, Eq. (15) can be expanded as a multivariate function of t w and t i , denoted by P c (t w , t i ), as follows In order to provide more insight into P c (t w , t i ), the instantaneous rate of change of the power consumption with respect to both t w and t i is calculated next. Assuming continuous variables, the partial derivatives of P c (t w , t i ) with respect to t w and t i are given by and It can be seen from Eq. (17) that ∂Pc(tw,ti) ∂ti > 0 for all feasible values of t w and t i . From Eq. (18), we can conclude that ∂Pc(tw,ti) ∂tw < 0 due to fact that (1 + λt w )e −λtw < 1. Therefore, the average power consumption P c (t w , t i ) is a strictly increasing function with respect to t i at t i ≥ 0, and it is a strictly decreasing function with respect to t w at t w ≥ 0. As expected, increasing the wake-up cycle t w for a fixed t i can reduce the power consumption. However, by increasing t i for a fixed t w , the power consumption increases.

B. Average Buffering Delay
We next assume that packets arriving during S 4 are buffered at the gNB until the UE enters S 2 , thus causing buffering delay. Without loss of generality, we assume that the radio access network experiences unsaturated traffic conditions. Therefore, all packets that arrive are served without any further scheduling delay. Furthermore, to simplify the delay modeling, we omit the buffering delay caused by packets arriving on S 1 or at the start-up state of the modem. This is because the additional buffering delay of such packet arrivals is anyway very small (t on + t su ). Additionally, thanks to the adoption of the WuS seeking to reduce unnecessary start-ups, the number of occurrences of such scenarios is low. Due to slot-based frame structure of NR, where PDCCH is sent at the beginning of the TTI, inherently, all packet arrivals (regardless WuS is utilized or not) suffer from small buffering delay. Since Poisson arrivals are independently and uniformly distributed on any short interval, we assume that the arrival instant of the packet is uniformly distributed within the TTI, and hence an average extra delay of t s /2 will be introduced. Now, as already briefly mentioned in Section II, misdetections can in general increase the buffering delay. For this purpose, Fig. 3 illustrates the buffering delay experienced by the UE with no misdetections, with a single misdetection, and with two consecutive misdetections. The number of consecutive misdetections and the corresponding buffering delay are referred to as i and d i , respectively, and their dependence on t can be written as d i (t) = (i + 1)t w + t su + t on − t, for i ∈ {0, 1, ...}. Therefore, the average buffering delay, denoted by D, can be expressed as Furthermore, due to the small value of misdetection probability (P md ≈ 0), the contribution of multiple consecutive misdetections to average buffering delay is small. Thus, the average buffering delay for P md ≈ 0 can be expanded and solved as a multivariate function of t w and t i , D(t w , t i ), as follows We note that D(t w , t i ) in Eq. (20) is strictly-speaking a lower bound of the delay expression in (19). Similarly to P c (t w , t i ), the partial derivatives of D(t w , t i ) with respect to continuous variables t w and t i are given by Eq. (21) and Eq. (22), respectively, at the top of the page.
From Eq. (21), it can be easily concluded that ∂D(tw,ti) ∂tw > 0, due to fact that 1 − (1 + λt w )e −λtw > 0. Moreover, it can be shown that ∂D(tw,ti) ∂ti < 0 as follows Therefore, the average buffering delay D(t w , t i ) is a strictly increasing function with respect to t w at t w ≥ 0, and it is a strictly decreasing function with respect to t i at t i ≥ 0. As expected, contrary to the behavior of P c (t w , t i ), increasing the wake-up cycle t w for a fixed t i increases the buffering delay. On the other hand, by increasing t i for a fixed t w , the buffering delay can be reduced.
The findings related to the impact of t w and t i on the average delay and power consumption are intuitive while are rigorously confirmed and quantified by the presented expressions.

IV. OPTIMIZATION PROBLEM FORMULATION AND SOLUTION
In this section, dual-parameter (t w and t i ) constrained optimization problem is formulated with the objective of minimizing the UE power consumption under a buffering delay constraint. Specifically, the average buffering delay is constrained to be less than or equal to a predefined maximum tolerable delay or delay bound, denoted by D max , whose value is set based on the service type. To this end, building on the modeling results of the previous section, the optimization problem is now formulated as follows subject to where P c (t w , t i ) and D(t w , t i ) are defined in Eq. (16) and Eq. (20), respectively. The resulting optimization problem in (24)-(26) belongs to a class of intractable mixed-integer non-linear programming (MINLP) problems [34]. In this work, the corresponding MINLP is solved by using the equivalent non-linear programming problem with continuous variables, expressed below in (27)-(29), which is obtained by means of relaxing the second constraint (26) into a continuous constraint (see Eq. (29)), assuming that both parameters are positive real numbers larger than or equal to one (i.e., the minimum TTI unit). The relaxed optimization problem can be expressed as subject to In general, the optimization problem in (27)- (29) is not jointly convex in t w and t i . Therefore, finding the global optimum is a challenging task. However, in the next subsections, we exploit the increasing/decreasing properties of the power consumption and delay expressions that we have derived in Section III, in order to derive additional properties of the problem that will allow us to find the optimal solution in closed form.

A. Unbounded Feasible Region
In this section, a schematic approach is used to illustrate the feasible region for the relaxed optimization problem in (27)- (29) and then the feasible region is narrowed down to the boundary of the delay constraint, whose points are proved to remain candidate solutions while the other feasible solutions are henceforth excluded. Fig. 4 (a) and (b) show the increasing trend of the power consumption and the decreasing behaviour of the delay constraint as a function of t i , while t w is fixed at t w0 , i.e. ∂Pc(tw,ti) ∂ti > 0 and ∂D(tw,ti) ∂ti < 0 (as proved in Section III). Let us consider an arbitrary point A in the interior of the feasible region (t i A > t im , where D(t w0 , t im ) = D max ). As it can be seen from Fig. 4 (a) and (b), there is always a point on the boundary of the delay constraint, referred to as B (t i B = t im ), where its power consumption P c B is lower than that of A (P c B < P c A ). Hence, we can conclude that, for any fixed t w , > 0 (as proved in Section III). Consider an arbitrary point C in the interior of the feasible region (t w C < t wm where D(t wm , t i0 ) = D max ). As it can be seen from Fig. 4 (c) and (d), there is always a point on the boundary of the delay constraint, referred to as D (t w D = t wm ), where its power consumption P c D is lower than that of C (P c D < P c C ). Then, we can conclude that, for any fixed t i , under a given delay constraint, there is a point on the boundary that attains the lowest power consumption. Therefore, because for both scenarios (fixed t w and fixed t i ), the lowest power consumption occurs at the boundary of the delay constraint, we can conclude that the optimal point cannot be located in the interior of the feasible region, but rather it lies over the boundary. That is, any arbitrary point (t w , t i ) in the feasible region of the relaxed optimization problem in (27)-(29) cannot be an optimal point, unless it lies on the boundary (rather than the interior) of the delay constraint, i.e., D(t w , t i ) = D max .

B. Power Consumption over Boundary
Next, the equation of the boundary curve (expressed through t i as a function of t w ) is derived, and then the power consumption profile of all points on the boundary is calculated as well as formulated as a function of t w only. In particular, the boundary curve can be obtained by finding all the solutions for which the inequality constraint in (28) is satisfied with equality, while the constraint in (29) is met, i.e., By utilizing Eq. (20), we can isolate t i , and the boundary curve can be formulated as follows where t w b (see Eq. (36)) is the minimum feasible value of t w over the boundary. By using Eq. (31), one can show that t i (t w ) is an increasing function with respect to t w on any feasible t w point over the boundary of the delay constraint (i.e., dti(tw) dtw ≥ 0), as follows. Let us use the composite function rule over (31), so that dti(tw) dtw can be calculated as follows where Arg refers to the argument of the logarithm in (31).
Since the logarithmic function is monotonically increasing Therefore, we can conclude that dti(tw) dtw ≥ 0. Additionally, one can prove that t i = 1 and t w = t w b (in which t w b always exists, and is larger than or equal to one) is located over the boundary, as follows. Based on Eq. (31), and by induction on t i = 1, we can write where F = (e λ (1 + e λts ) + 2)(D max − ts 2 ) − t su λ + 1, H = λt su − e λ (D max − ts 2 )(1 + e λts )λ − 1 and W(x) is the Lambert W function [35]. For typical D max and t su values, 1 F λ and H < 0, therefore, the main branch of the Lambert W function (W 0 ) can be considered as a solution for (35) that has a value greater than −1. Then, we can conclude that and, as a result, all feasible points on the boundary curve can be specified and constrained by t i ≥ 1 and t w ≥ t w b . The t w b in (36) is the smallest feasible t w over the boundary curve because if we assume that there is a t w smaller than t w b , based on Eq. (32) and (34), its corresponding t i should become smaller than one, which belongs to the unfeasible region. Therefore, based on proof-by-contradiction, (t i = 1, t w = t w b ) lies over the corner part of the boundary. Consequently, t i ≥ 1 and t w ≥ t w b are equivalent constraints of the boundary of the delay constraint. Therefore, the point (t i = 1, t w = t w b ) is an extreme point, and it is located over the boundary curve of the delay constraint, where t w b is larger than or equal to one.
Finally, by substituting the value of e λti over the boundary (argument of logarithm in (31)) into (16), the average power consumption of all the points over the boundary, referred to as P b (t w ), can be obtained as follows where In the Appendix, we further analyze the expression in Eq. (37) in detail.

C. Optimal Parameter Values
The power consumption over the boundary curve in Eq. (37) depends on the packet arrival rate λ. Furthermore, as it is shown in the Appendix, P b (t w ) behaves differently for different ranges of λ. For this purpose, dP b (tw) dtw is calculated (a detailed analysis is provided in the Appendix). Briefly, its sign for different ranges of λ within the feasible region of the wake-up cycle (i.e., t w b ≤ t w ) can be expressed as follows where sgn(.) refers to the sign function, and λ t is referred to as the turnoff packet arrival rate. The turnoff packet arrival rate can be calculated using any typical root-finding algorithm that meets F 1 = 0 (see details in Appendix) where Theorem 1. t * w = t w b and t * i = 1 are the optimal parameter values of the optimization problem in (27)- (29) for the range 0 < λ ≤ λ t .
Proof. As it can be seen in (44), for all 0 < λ ≤ λ t , the power consumption increases when increasing t w over the boundary, so that the minimum power consumption is achieved at the minimum feasible t w , i.e., t w = t w b . Correspondingly, the optimal value of t i can be calculated by substituting t w b into Eq. (31), which leads to t i = 1. Therefore, for all 0 < λ ≤ λ t , t * w = t w b and t * i = 1 is the optimal solution of the optimization problem in (27)-(29).
Proof. As it can be seen in (44), for all λ t < λ < 1, the power consumption decreases when increasing t w over the boundary, so that the minimum power consumption is achieved at the maximum feasible t w , i.e., t w = +∞. Correspondingly, the optimal value of t i can be calculated by substituting t w into Eq. (31), which is t i = +∞. Therefore, for all λ t < λ < 1, t * w = +∞ and t * i = +∞ is the optimal solution of the optimization problem in (27)- (29). Corollary 1. For the range λ t < λ < 1, the optimal solution is equivalent to not utilizing WuS; the system is always at activedecoding and active-inactivity timer states (P 1 +P 4 ≈ 0). Hence, if the energy and delay overhead of switching on/off the BBU are taken into account, the WuS is not effective anymore for high λ values. Instead, other power saving mechanisms, such as DRX, microsleep, or pre-grant message could be used in this regime. Fig. 5 (a) and (b) illustrate how λ t changes with D max and t su + t pd , respectively. As it can be observed in Fig. 5 (a), the turnoff packet arrival rate is independent and insensitive to variations of the value of D max , however, it reduces, when t su +t pd becomes larger (see Fig. 5 (b)). Therefore, in order to decide whether to enable WuS or not, regardless of the QoS requirement of the considered traffic, the network needs to compare the estimated packet arrival rate with pre-calculated and fixed λ t .
Interestingly, for λ t < λ < 1, the power consumption reduces by increasing t w towards infinity (and correspondingly, t i increases), while the delay constraint is satisfied. This can be interpreted in a way that for packet arrival rates higher than λ t , the WuS is not effective anymore and only adds overhead energy consumption, thus implying that it is better not to switch off the BBU and to utilize short DRX cycles. As it is shown in Fig. 5 (b), when t su + t pd becomes larger, turnoff packet arrival rates become smaller, which justifies the fact that for the higher packet arrival rates, the frequent start-up and power-down related energy consumption becomes larger. Additionally, the main reason for such interpretation is due to the fact that for large wake-up cycles, P 12 approaches to one, which is equivalent to the reduction of number of potential scheduled PDCCHs, and hence there is no gain by using the wake-up scheme over DRX anymore. Furthermore, for higher packet arrival rates, based on the optimal policy, most of the time the BBU is at either S 2 or S 3 (P 1 =P 4 ≈ 0), to avoid wasted energy of start-up and power-down times, and to satisfy the delay constraint (illustrated in Fig. 6 (a)). As it can be seen in Fig. 6 (a), for small packet arrival rates, there is considerable energy consumption for transition between states, however once the packet arrival rate is higher than λ t , this trend changes and the UE is mainly at S 2 or S 3 , and it does not waste energy in start-up/power-down stages, due to the need for frequent start-up/power-down of the BBU. Such change in behaviour of the wake-up scheme can be explained by the objective of the system which is to reduce the overall power consumption, as it shown in Fig. 6 (b). Moreover, as it can be seen in Fig. 6 (a), for packet arrival rates higher than the turnoff packet arrival rate, most of the energy is consumed for decoding of the packets, and its energy consumption increases linearly with the packet arrival rate.
Finally, it can be shown that for λ less than the turnoff packet arrival rate (0 < λ ≤ λ t ), the optimal parameter values (t * w and t * i ) of the original MINLP (24)-(26) can be written based on the optimal values of the equivalent relaxed problem (27)-(29) as follows where refers to the floor function.
Theorem 3. t * w = t w b and t * i = 1 are the optimal parameter values of the MINLP in (24)-(26) for the range 0 < λ ≤ λ t .
Proof. If f is a function of continuous variables x and y, it can easily be shown that where Therefore, by assuming f as representative of either P c (t w , t i ) or D(t w , t i ) and x as of either t i or t w , one can prove, similarly to the case with continuous variables (proved in Section IV-A), that the optimal parameters of MINLP (24)-(26) are laid over the boundary. Therefore, the boundary of (24)- (26) consists of all combinations of ( t w , t i ) for which t i ∈ {1, 2, ...} and D(t w , t i ) = D max (as formulated in (31)). Similarly, based on (49) and (32), as well as the properties of the floor function, we can state that increasing t i over the boundary may increase t w (∆ t w ti > 0) or t w can remain in its previous value (∆ t w ti = 0). Furthermore, based on (44), for 0 < λ < λ t , we can conclude that ∆P b (t w , t i ) tw > 0. Therefore, t * w is the smallest feasible value of the wake-up cycle over the boundary, i.e., t * w = t w b . However, t w b may correspond to either t i = 1 or larger values at the same time. Since the power consumption has the lowest value at the lowest t i , for fixed t w , we can conclude that t * i = 1.    (24)- (26). Therefore, our relaxation approach yielded an equivalent reformulation.

V. NUMERICAL RESULTS
In this section, a set of numerical results are provided in order to validate our concept and the analytical results, as well as to show and compare the average power consumption of the optimized WuS over DRX for packet arrival rates less than the turnoff packet arrival rate. Power consumption of the mobile device in different operating states is highly dependent on the implementation, and also its operational configurations. Therefore, for the numerical results, the power consumption model used in [20], [33], [36], [37] is employed. Its parameters for DRX and WuS are shown in Table II and Table III, respectively. LTE-based power consumption values, shown in Table II, are considered as a practical example since those of the emerging NR modems are not publicly available yet. For simulations, we use φ = 1.1 as an example numerical value, while methodology wise, numerical results can also be generated for any other value as well.
Two different sets of performance results, in terms of power consumption and delay, are presented based on the optimal configuration of the wake-up parameters (48). Namely, a) with simplified assumptions of zero false alarm/misdetection rates, and t on ≈ 0 ms, equivalent to analytical results (ana.), and b) with the realistic assumptions of P f a = 10%, P md = 1%, t on = 1/14 ms obtained by simulations and [20], referred to as simulation results (sim.). Table IV shows the optimal resulting values of t * w in (48) for different values of λ and D max . As it can be observed, for tight delay requirements (D max = 30 ms), t * w tends to be small, enabling the UE to reduce the duration of packet buffering. Interestingly, for mid range of packet arrival rates (λ = 0.1 p/ms), optimal wake-up cycle for a given delay bound is shorter than for both lower and higher packet arrival rates. The justification is as follows. For higher packet arrival rates, t * w becomes larger, the reason being that the inactivity timer is ON most of the time. Therefore, the need for smaller wake-up cycles decreases and correspondingly higher energy overhead is induced. For lower packet arrival rates, in turn, the value of t * w is higher due to the infrequent packet arrivals, hence achieving a smaller delay. Fig. 7 illustrates the power consumption of the proposed WuS under ideal and realistic assumptions as a function of  the packet arrival rate (λ < λ t ), for different maximum tolerable delays. As it can be observed, for both analytical and simulation results, and for all delay bounds, the average power consumption initially increases, while then remains almost constant (especially for large delay bounds) as λ increases, due to the configuration of shorter wake-up cycles for mid packet arrival rates (see Table IV). Moreover, the UE consumes higher power in order to satisfy tighter delay requirements, which, as shown in Table IV, can be translated into shorter t * w . Furthermore, Fig. 7 shows that the simulation results closely follow the analytical results, the non-zero gap being due to the non-zero false alarm and misdetection rates. The relative gap between simulation-based and analytical results is somewhat larger for shorter delay bounds, which stems from the correspondingly higher number of wake-up instances.
Moreover, Fig. 8 depicts the average packet delay experienced by the WRx-enabled UE under ideal and realistic assumptions when packet arrival rates vary. As it can be observed, the analytical delay based on the optimal parameter configuration (48) is slightly shorter than the maximum tolerable delay. This is because of selecting the greatest integer less than or equal to the optimal wake-up cycle of the relaxed optimization problem. However, the actual average delay is slightly higher than the analytical average delay, especially for high delay bounds. The main reason for such negligible excess delay is the unavoidable misdetections, whose impact is more clear for large wake-up cycles corresponding to high delay bounds. In practice, to compensate for such small excess delay, the delay bound can be set slightly smaller than the actual average delay requirement.
Finally, for comparison purposes, the relative power saving of WuS over DRX representing the amount of power that can be saved with WuS as compared to the DRX-based reference system is utilized, assuming the same delay constraints in both methods. The value of the relative power saving ranges from 0 to 100%, and a large value indicates that the WuS conserves energy better than the DRX. Formally, we express the relative power saving (η) as where P DRX refers to average power consumption of DRX. Furthermore, for a fair comparison, we consider an exhaustive search over a large parameter set of DRX configuration, developed by authors in [17]. However, in order to take start-up and power-down power consumption into account, the solution in [17] is slightly modified to account for the transitory states. Fig. 9 shows the power saving results. It is observed that the proposed WuS, under realistic assumptions, outperforms DRX within the range λ < λ t , especially for low packet arrival rates with tight delay requirements. The main reason is that in such scenarios, DRX-based device needs to decode the control channel very often, residing mainly in short DRX cycles, which causes extra power consumption. The WRx, in turn, needs to decode the wake-up signaling frequently, but with lower power overhead. Additionally, as expected, regardless of the delay requirements, for higher packet arrival rates, DRX infers relatively similar power consumption to the WuS. The reason is that in such cases, the DRX parameters can be configured in such a way that there is a low amount of unscheduled DRX cycles, either by utilizing short DRX cycles for very tight delay bounds or by employing long DRX cycles for large delay requirements. Overall, the results in Fig. 9 clearly demonstrate that WuS can provide substantial energy-efficiency improvements compared to DRX, with the maximum energy-savings being in the order of 40%.

VI. DISCUSSIONS AND FINAL REMARKS
In this section, three interesting remarks are drawn and discussed.

Remark 1:
The proposed WuS is fully independent of DRX, which means that both methods can co-exist, interact and be used together to reduce energy consumption of the UE even further. Based on the numerical results provided in this paper, our opinion regarding the power saving mechanisms for moderate and generic mobile users is that there is no 'One-Size-Fits-All Solution', unless the UE is well-defined and narrowed to a specific application and QoS requirement. For a broad range of applications and QoS requirements, there is a need for combining and utilizing different power saving mechanisms, and selecting the method that fits best for particular circumstances. For example, the WuS can be utilized for packet arrival rates lower than the turnoff packet arrival rate; for higher packet arrival rates or shorter delay bounds (e.g., smaller than 30 ms), DRX may eventually be the preferable method of choice; for ultra-low latency requirements, other power saving mechanisms may be needed, building on, e.g., microsleep [6] or pre-grant message [38], [39] concepts. Further, depending on whether an RRC context is established or not, the WuS is agnostic to the RRC states, and can be adopted for idle (delay bound is in range of some hundreds of milliseconds), inactive, and connected modes.
Remark 2: As mentioned in Section III, a latency-optimized frame structure with flexible numerology is adopted in 5G NR, for which the slot length scales down when the numerology increases [2]. In this work, different time intervals within the WuS are defined as multiples of a time unit of a TTI with a duration of 1 ms. As shown before, the minimum power consumption over the boundary is limited by the minimum feasible value of t i . Therefore, if the TTI can be selected even smaller, i.e., with finer granularity, the optimal power consumption can be further reduced. In this line, Table V presents the power consumption with different TTI sizes (corresponding to NR numerologies 0, 1, 2, and 3 [10]), delay bounds, and packet arrival rates.
Interestingly, Table V shows how the 5G NR numerologies facilitate the use of WuS and improve the applicability and energy saving potential of WuS compared to longer TTI sizes. Besides the smaller t i sizes, with smaller TTIs, the corresponding optimal wake-up cycles are more fine-grained. With shorter TTI sizes down to 125 µs, the proposed WuS can provide up to 12% additional energy savings compared to the baseline 1 ms TTI. Therefore, on average, the benefit of the flexible NR frame structure is not only for low latency communication but it can also offer energy savings depending on the traffic arrival rates and delay constraints.

Remark 3:
The traffic model assumed in this article is basically well-suited for the periodic nature of DRX. For instance, voice calls and video streaming have such periodic behaviour. However, in other application areas such as MTCs, where sensors can be aperiodically polled by either a user or a machine, the traffic will have more non-periodic patterns. In such case, the DRX may not fit well, while the WuS has more suitable characteristics, being more robust and agnostic to the traffic type.

VII. CONCLUSIONS AND FUTURE WORK
In this article, wake-up based downlink access under delay constraints was studied in the context of 5G NR networks, with particular focus on energy-efficiency optimization. It was shown that the performance of the wake-up scheme is governed by a set of two parameters that interact with each other in an intricate manner. To find the optimal wakeup parameters configuration, and thus to take full advantage of the power saving capabilities of the wake-up scheme, a constrained optimization problem was formulated, together with the corresponding closed-form solution. Analytical and simulation results showed that the proposed scheme is an efficient approach to reduce the device energy consumption, while ensuring a predictable and consistent latency. The numerical results also showed that the optimized wake-up system outperforms the corresponding optimized DRX-based reference system in power efficiency. Furthermore, the range of packet arrival rates within which the WuS works efficiently was established, while outside that range other power saving mechanisms, such as DRX or microsleep, can be used.
Future work includes extending the proposed framework to bidirectional communication scenarios with the corresponding downlink and uplink traffic patterns and the associated QoS requirements, as well as to consider other realistic assumptions that impact the energy-delay trade-offs, such as the communication rate and scheduling delays. Additionally, an interesting aspect is to investigate how to configure the wakeup scheme parameters for application-specific traffic scenarios, such as virtual and augmented reality, when both uplink and downlink traffics are considered. Finally, focus can also be given to optimizing the wake-up scheme parameters based on the proposed framework, by utilizing not only traffic statistics but also short-term traffic pattern prediction by means of modern machine learning methods. APPENDIX: ANALYSIS OF P b (t w ) In order to find the optimal value of t w and correspondingly t i , the derivation of P b (t w ) with respect to t w is given as follows where Based on typical values of 1 ≤ φ < 2, the condition 0 ≤ (2 − φ)t su + t pd is met, and hence based on (54)-(56), we can conclude that F 1 + F 2 > 0, F 2 + F 3 > 0 and F 1 > F 3 . Furthermore, it can be shown that F 1 is a decreasing function of λ, which has a single root. We refer to its root as λ t ; where for all λ < λ t , then F 1 > 0 while, for all λ > λ t , then F 1 < 0. Root-finding algorithms can be utilized to find λ t as the λ value that meets F 1 = 0.
To determine whether dP b (tw) dtw is positive or negative, Y (t w ) needs to be analyzed. By differentiating Y (t w ) with respect to t w , we obtain Additionally, depending on whether F 3 and F 1 are positive or negative, the P b (t w ) behaves differently. In order to characterize the behaviour of P b (t w ), we define three mutually exclusive cases: Case A (F 3 > 0), Case B (F 3 < 0 and F 1 > 0), and Case C (F 3 < 0 and F 1 < 0). Due to fact that F 1 > F 3 , in the former case, F 1 is always positive. Case A (F 3 > 0): Based on (57), if F 3 is positive, Y (t w ) is a decreasing function for the range of t w < F2+F3 λF3 and an increasing function for t w > F2+F3 λF3 . Therefore, Y (t w ) has a minimum point at t w = F2+F3 λF3 , where the Y (t w ) at t w = F2+F3 λF3 is positive, i.e., Y ( F2+F3 λF3 ) = F 1 − F 3 e −λtw > 0 (due to fact that F 3 < F 1 ). As a result, Y (t w ) is always positive and hence P b (t w ) is a monotonous increasing function. Based on (57), if F 3 is negative (Case B or Case C), we can conclude that Y (t w ) is a monotonous decreasing function from F 1 + F 2 > 0 to F 1 .
Case B (F 3 < 0 and F 1 > 0): In this case, for all values of wake-up cycle, Y (t w ) is positive, and hence P b is a monotonous increasing function.
Case C (F 3 < 0 and F 1 < 0): In this case, Y (t w ) for t w < t ws is positive, and it is negative for t w > t ws ; where t ws is a stationary point, i.e., Y (t ws ) = 0 or equivalently, ∂tw | tw=tw s = 0. As a result, P b (t w ) is an increasing function within t w < t ws , and a decreasing function for t w > t ws . For the typical range of parameters, we have consistently observed through simulations that t ws < t w b , so that we can conclude that P b (t w ) is a decreasing function for the feasible range of the wake-up cycle (i.e., t w > t w b ).
To sum up, for F 1 < 0, or equivalently, λ t < λ (Case C), P b (t w ) is a monotonous decreasing function while, for λ < λ t (Case A or B), P b (t w ) is a monotonous increasing function. Due to the relevance of λ t , we refer to it as the turnoff packet arrival rate.