Wake-Up Scheduling for Energy-Efficient Mobile Devices

Recently, discontinuous reception mechanisms (DRX) and wake-up schemes (WuS) have been proposed to enhance the energy efficiency of 5G mobile devices and prolong the battery lifetime. The existing DRX and WuS use commonly pre-configured parameters that cannot be adjusted dynamically. In this paper, a novel wake-up scheduling (WuSched) concept is introduced to further improve the energy efficiency of WuS-enabled mobile devices while controlling the buffering delay in a dynamic manner. The main idea of WuSched is to use a fixed configuration of the wake-up scheme and adjust the scheduling of the wake-up signals dynamically based on actual traffic arrivals. For this purpose, two different optimization approaches of the wake-up scheduling concept are proposed, analyzed, and compared, namely offline and online wake-up schedulers (WuSched-Offline and WuSched-Online). First, the WuSched-Offline is analyzed analytically for Poisson traffic arrivals and optimized (offline) to balance the average delay and power consumption. Second, the WuSched-Online is proposed to take online decisions based on traffic prediction, which is able to deal with general and more complex traffic models. Towards this end, we develop a framework for the prediction of packet arrivals based on recurrent neural networks. Numerical results show that both wake-up schedulers outperform the ordinary WuS-based system where wake-up scheduler is not deployed. In particular, for predefined delay requirements of video streaming, audio streaming, and mixed traffic flow, the WuSched-Online reduces the power consumption of the baseline WuS by up to 36%, 28% and 9%, respectively. Results also show that the WuSched-Offline has slightly better energy efficiency than the WuSched-Online in the case of Poisson packet arrivals, as it is optimized for that, while its power consumption is slightly higher than that of the WuSched-Online scheduler for realistic traffic scenarios.


I. INTRODUCTION
T HE emerging fifth generation mobile networks (5G) have a promising capability to offer super-fast and ultra-low latency connectivity to the end users, and are expected to enable a wide range of futuristic mobile applications and services such as augmented/virtual reality, cloud gaming, and ultra-high-definition video streaming [2]. Such magnificent improvements are vital to accommodate the ever-growing needs for increased data rates and enhanced quality-of-service (QoS). In particular, they are realized in New Radio (NR) based 5G systems by adopting larger transmission bandwidths, higher modulation orders, advanced coding techniques, and sophisticated multi-antenna schemes [3]. However, the utilization of such computationally intensive techniques comes commonly at the cost of higher energy consumption that can deplete the mobile devices' battery power rather quickly, which in itself is one of the major causes of dissatisfaction for the users [4].
In general, the cellular modem is one of the primary energy-consuming elements of mobile devices, while the other units only contribute when they are used intensively [5], [6]. Furthermore, in current and future traffic trends, the data traffic of mobile users is mainly downlink-dominated [7]. Therefore, the development of power-saving mechanisms for cellular modems in receive mode has paramount importance in order to extend the mobile devices' functionalities in 5G networks and beyond. To this end, the 3rd generation partnership project (3GPP) has specified discontinuous reception (DRX) as the de facto power-saving mechanism for long-term evolution (LTE) based fourth-generation (4G) systems [8], [9] and NR based 5G systems [3], [10]. DRX enables the mobile device to reduce energy consumption by switching off the radio-frequency (RF) circuitry and other modules for long periods, activating them only for short intervals [11]. However, it has been shown in [12] that the time period for which a mobile device monitors the physical downlink control channel (PDCCH) without any data allocation has still a major impact on the battery consumption. Thus, further power-saving mechanisms are of large importance.

A. Wake-Up Based Access and State-of-the-Art
In the context of non-cellular networks, different power-saving mechanisms have been extensively studied and implemented, with specific focus on the low-power wide-area networks (LPWAN) and wireless sensor networks (WSN) [13]. In this context, duty cycling has been the major mechanism for energy conservation in LPWANs/WSNs [14], [15]. In duty cycling, which resembles cellular DRX, nodes wake up and sleep periodically, thus leading to idle listening and potential overhearing. Therefore, to reduce idle listening, the concept of wake-up radio based access has been recently studied, e.g., [16], [17]. Demirkol et al. [18] provided a comprehensive overview and insight into wake-up receiver (WRx), and investigated the benefits achieved with WRx along with the challenges observed in WSNs. In addition, they presented an overview of state-of-the-art hardware and networking protocol proposals as well as classification of WRx schemes. Moreover, authors in [19] introduced the concept of wireless-powered wake-up receiver, reducing the energy consumption of the wireless node considerably. The proposed receiver scavenges the RF energy from the received signal to power its sensor, communication and processing blocks. The proposed scheme can be utilized for a wide range of energy-constrained wireless applications such as wireless sensor actuator networks and machine-to-machine communications. Due to the large energy saving potential of such wake-up radio based methods, similar concepts are raising increasing interest also in cellular networks, primarily 5G NR [20], in which this paper is also focused on.
In order to reduce the energy consumption of unscheduled cycles in DRX, cellular wake-up schemes (WuS) have been recently proposed, e.g., in [5], [21]. In cellular WuS, or WuS for short, the mobile device monitors a narrow-band wake-up signaling periodically (every wake-up cycle) at specific time instants and subcarriers, which indicates to the device whether to process the upcoming PDCCH or remain in sleep mode. As soon as a packet arrives at the transmission buffer of the base station, the wake-up indicator is assumed to be sent at the next upcoming wake-up instant. Furthermore, a low-complexity WRx is required to decode the corresponding wake-up signaling and to acquire the necessary time and frequency synchronization [5], [22]. Additionally, in [22], synchronization is one of our main design factors in the design of wake-up signaling and WRx. To this end, we utilized built-in self-synchronizing signal structure and assumed high-power high-precision oscillator to remove the need for a separate synchronization stage for WRx. Our extensive simulation results [5], [22] verify that the proposed scheme can achieve very low misdetection (less than 1%) and false alarm rates for signal-to-noise ratios (SNRs) even below 0 dB. Furthermore, very high-quality synchronization can be obtained down to SNRs of −4 dB [22]. We also showed that the impact of such negligible errors is very low on power consumption and buffering delay. Furthermore, in our previous work [23], [24], we introduced an offline method to optimize the WuS configuration (i.e., the wake-up cycle period) based on a delay bound under the assumption of Poisson traffic. In cases where traffic dynamics vary over time, the WuS optimization method in [23], [24] requires reconfiguration of the WuS parameters, which need to be communicated to the mobile device, and thus increases the control signaling overhead as well as the associated energy consumption.

B. Contributions and Novelty
In this paper, we introduce a novel concept called wake-up scheduling (WuSched) to further improve the energy efficiency of mobile devices in cellular networks. The main idea is in starting with a fixed WuS configuration and then adjusting the scheduling of the wake-up signals dynamically by determining whether to wake-up the mobile device or not. More precisely, in wake-up scheduling, the network does not send the wake-up indicator to the mobile device as soon as there is one (or more) packet arrival(s), but rather it may wait to send it while at the same time taking different QoS and other requirements into account, specifically the latency constraint and the mobile device power consumption. The proposed concept not only concerns to the physical layer (PHY), but mainly, it uses WuS as a mechanism to reduce energy consumption at PHY and then uses adequately scheduled wake-up signals from the medium access control (MAC) layer. In particular, offline and online optimizations of the wake-up scheduler parameters are proposed in this paper, namely WuSched-Offline and WuSched-Online. The offline optimization (WuSched-Offline) is based on the assumption that traffic arrivals follow a Poisson distribution and it is analyzed analytically. The objective is to reduce the power consumption of the mobile device while satisfying delay requirements. The optimal solution for the tunable operational parameter of the WuSched-Offline, which is referred to as the buffer size threshold and which only concerns the network side (so that it can be easily reconfigured based on traffic dynamics), is obtained in closed form. Then, for a general and thus very likely more complex traffic models, an online optimization is proposed through the WuSched-Online. It uses a proactive scheduler that takes decisions every wake-up cycle based on traffic predictions over a forecast horizon. A multi-step Long Short-Term Memory (LSTM) neural network is trained with data from real user applications and tailored for traffic prediction purposes. To the best of our knowledge, this is the first attempt to introduce online wake-up scheduling decisions with traffic prediction capabilities into the wake-up scheme. Unlike previous works [5], [23], [24], the WuSched-Online is not tied to any specific traffic models and operates dynamically.
The rest of this paper is organized as follows. Section II summarizes the WuS principle of operation, 1 and introduces the proposed wake-up scheduling concept. Section III mathematically models and optimizes offline the parameters of the wake-up scheduler (WuSched-Offline) for Poisson traffic. Then, the online optimization of the wake-up scheduler (WuSched-Online), which is valid for any traffic distribution, is presented and described in Section IV. These are followed by simulation results and conclusions in Sections V and VI, respectively. Finally, some proofs related to the WuSched-Offline are reported in the Appendices. For readers' convenience, the most relevant variables and mathematical operations used throughout this paper are listed in Table I.  Terminology-wise, we use gNB to refer to the base-station  TABLE I   MOST IMPORTANT VARIABLES AND MATHEMATICAL OPERATIONS USED THROUGHOUT THE ARTICLE unit and UE to denote the mobile device, according to NR specifications [3].

A. WuS Overview
In WuS, the cellular modem is configured with a WRx, as a companion low-complex single-purpose receiver in order to decode the wake-up signaling [5]. WuS allows the terminal to reduce the energy consumption by switching off the modem for long periods of time, activating the modem (ON mode) only for short intervals to decode data and control plane signals.
At every wake-up cycle (w-cycle), represented as t w , the WRx monitors the wake-up signaling for a specific on-duration time (t on ) to determine if any data is scheduled or not (see Fig. 1). Occasionally, based on the interrupt signal from WRx, the modem switches ON, decodes both PDCCH and physical downlink shared channel (PDSCH), and performs connected-mode procedures. The wake-up signaling on each w-cycle is represented by 1-bit, referred to as wake-up indicator (WI), where 0 indicates WRx not to wake up the modem (remaining in OFF mode) and 1 triggers WRx to wake up the modem (moving to ON mode) because there is a packet to receive [5]. When WI = 1 is sent to WRx, the gNB expects the target mobile device to decode the PDCCH with a time offset equal to the start-up time (t su ). After successful decoding of PDCCH/PDSCH, the UE initiates its inactivity timer with a duration of t i . After the inactivity timer is initiated, if a new PDCCH message is received before the expiration of inactivity timer, the UE re-initiates its inactivity timer. However, if there is no PDCCH message received before the expiration of the inactivity timer, a sleep period starts (modem goes through transitional periods of power down, with a duration of t pd ).
In WuS, if there are one or more packet arrivals during the sleep state, the gNB sends WI = 1 to the target UE at the next upcoming wake-up instant (as shown in Fig. 1). However, if the WuS configuration (namely, t w and t i ) is not correctly optimized for the upcoming traffic, the immediate waking up of the UE can either adversely increase its energy consumption, eventually decreasing the benefits of using WuS (meaning that the UE can tolerate longer w-cycles), or even create a worst-case scenario, in which the UE may not even satisfy its delay requirements (implying the need for shorter w-cycles) [24].

B. Wake-Up Scheduling
In our proposal, both w-cycle (t w ) and inactivity timer (t i ) are configured semi-statically, and the desired power and delay trade-off is achieved by adjusting the wake-up instant. More precisely, the wake-up scheduler does not send WI = 1 as soon as there is a packet in the w-cycle, but waits until some condition is met; for instance, until the number of buffered packets at the gNB for a given UE is larger than a predefined buffer size threshold (γ), or until the estimated average buffering delay exceeds a predefined threshold (D max ). The former condition is the core part of the WuSched-Offline and is illustrated in Fig. 2, where the gNB does not send WI = 1 until the number of buffered packets reaches to γ = 3, and it takes four w-cycles to reach the threshold. This way, instead of switching ON the UE for three times, it is switched ON only once after the fourth w-cycle. Note that the buffer size threshold γ influences the packet delays and so it establishes a trade-off in between the energy consumption and the experienced delays. On the other hand, the latter condition mentioned above is used in the WuSched-Online, in order to allow the network to meet maximum tolerable delays of the target applications.
The main motivation behind not sending WI = 1 as soon as a packet arrives at the gNB but instead waiting and sending the packets consecutively, is that the state-of-the-art modems suffer from large start-up and power-down stages [5]. Therefore, it is desired in terms of energy-efficiency that once the modem is at ON mode, it receives multiple packets and not a single packet. Although, waiting for longer times to buffer packets can eventually increase the buffering delay. This extra buffering delay should not be problematic as long as the average delay is maintained within a maximum bound. It is worth mentioning that WuS is a specific example of the WuSched-Offline when γ = 1.
Under the wake-up scheduling, the ON and OFF periods of the UE vary based on its traffic dynamics. For this purpose, we define the scheduling cycle as the length of a full cycle of empty, dormant and active periods. The scheduling cycle starts from the expiry of the inactivity timer of the previous scheduling cycle and ends by the expiry of the current cycle's inactivity timer. The scheduling cycle's length (L) is a random variable that depends on the buffer size threshold and the packet arrivals. During each scheduling cycle, only a single WI of 1 is sent to the target UE. We assume N (random variable) packets in the scheduling cycle are served (equivalent to the overall number of packet arrivals in the corresponding scheduling cycle).
In order to help the readers to follow up, the different periods of the scheduling cycle are illustrated in Fig. 2, and defined in what follows: • empty period: It starts right after the beginning of the scheduling cycle and lasts until the arrival of the first packet of such scheduling cycle at the gNB. During the empty period, the number of buffered packets is zero. The length of the empty period is a random variable that we refer to as L e . • dormant period: It starts as soon as the first packet arrives and lasts until the end of the start-up stage. During the dormant period, packets are buffered at the gNB until the number of buffered packets reaches γ. As a result, by the end of the corresponding w-cycle, the modem is switched ON and, after the start-up stage, the UE is ready to receive the packets. The length of the dormant period is a random variable, denoted by L d , and the number of packets buffered during the dormant period is referred to as N d , which is greater than or equal to γ. • active period: It starts after the end of the start-up period and lasts until the end of the scheduling cycle. During the active period, the modem is at ON mode and consumes the high power of PW on , and either it is processing the packets or its inactivity timer is running.
The active period's length is a random variable that is denoted by L a . The number of packets that arrive at the gNB during the active period is referred to as N a . During the active period, the UE serves N = N d + N a packets, before it enters the next scheduling cycle. The relationship between the length of the different periods of each scheduling cycle is L = L e + L d + L a . For the modem during OFF mode, packets are buffered, and it consumes low power of PW of f . In general, the UE power consumption in different operating states is highly implementation dependent, while also depends on the operational configurations. Stemming from the specifically-designed narrow-band WuS signal structure, the WRx power consumption (PW wrx ) is generally much lower than that of the modem during ON mode (PW on ). Following [5], [22], [25], PW wrx =57 mW, PW on =850 mW, and PW of f =16 mW can be considered as representative numbers, while the start-up/ power-down periods read t su =15 ms, and t pd =10 ms. Additionally, regarding the WuS parameters, we consider t on =3/14 ms and t i =1 ms [5]. Furthermore, since the on-duration period of WuS signaling is very short, only three OFDM symbols [22], the WRx contribution to the device energy consumption is very minor. Therefore, in our system model, WRx power consumption is eventually ignored, i.e., we consider PW wrx ≈0. However, it is noted that in later numerical results, non-zero WRx power consumption is considered.
The wake-up scheduler can be located at the network side (e.g., MAC layer of the gNB), and hence all the computationally intensive processing is performed by the network. Without loss of generality, we assume that the UE can process a single packet (regardless of its size) per transmission time interval (TTI) and that the packet arrival rate (λ) is at most one packet per TTI. TTI of 1 ms is assumed. In general, because NR supports wide bandwidth operation, packets can be served in a very short time duration. In addition, in case the user packet sizes are small, packet concatenation in NR for duration of a TTI is used, so that all packet arrivals in a relatively short time window can be served in a single TTI. Accordingly, we assume that radio-link control entity (located at the gNB) concatenates all those packets arriving during the slot, and as soon as the BBU is triggered on, the device can receive and decode the concatenated packets for a duration of a single TTI. During the corresponding slot, if there was a new packet arrival, the BBU starts serving the corresponding packet by the end of current slot time. Also, we assume that packets are served individually based on first-input first-output (FIFO). One of the key components of the 5G NR design is a flexible self-contained slot-based framework that allows delivering significantly lower latency than LTE. This slot structure framework includes the opportunity for uplink and downlink scheduling, data, and acknowledgement to occur in the same slot. In other words, in each time slot, UEs can send their acknowledgment to network, and network can decide to re-transmit the packet or not in next inactivity period. In our work, we assume that the self-contained slot-based framework is utilized.
In the case of multimedia packet-data traffic, there is not a strong need to provide a maximum delay budget per packet.
Rather, from a user perspective, the delay over the radio interface should simply be lower than maximum average packet delay (D max ), whose value is set based on the service type. Even in case of typical constant-rate services such as voice and video, (short-term) exceeding delays are often not an issue, as long as the average delay remains constant, assuming averaging over some relatively short time interval. Moreover, maximum delay requirements are mainly used for ultra reliable and low latency communications (URLLC). However, since our main focus in this paper is on multimedia type traffic, we consider average packet delay as QoS indicator of services.

III. OFFLINE OPTIMIZATION OF WAKE-UP SCHEDULING
FOR POISSON TRAFFIC In this section, the average power consumption and buffering delay of the wake-up scheduler are derived as a function of the buffer size threshold (γ) and the packet arrival rate of a Poisson process (λ). Then, γ is optimized for a given λ and a maximum delay bound (D max ).
The WuSched-Offline can be modeled as a stationary GI/G/1 2 FIFO queuing system [26]. We use such system's properties to analyze the wake-up scheduler's average delay and power consumption. In this section, packet arrivals are modeled as according to a Poisson process for analytic simplicity and due to its attractive theoretical properties.
Let us refer to the packet inter-arrival times of the n th and n+1 th packets at the gNB as T n , where T is exponentially distributed, and hence E[T ] = 1/λ and Var[T ] = 1/λ 2 . Furthermore, we define the n th packet's buffering delay caused by the wake-up scheduler as D n . Based on Fig. 2, the following expression is always valid, where W n is the time duration between decoding n th and n+1 th packets by UE.
Depending on the relation between T n and D n , three disjoint sets of packets can be defined, • X 1 : If n ∈ X 1 , the n+1 th packet arrives before the end of serving n th packet (T n ≤ D n + 1). Therefore, the UE serves n+1 th packet immediately after serving n th packet, i.e., W n = 1. All packet arrivals during the dormant period (referred to as X d ) are part of X 1 (last packet of the dormant period may or may not 3 belong to X 1 ). Therefore, If n ∈ X 2 , the n+1 th packet arrives after inactivity timer is triggered and before its expiry time (D n + 1 < T n ≤ D n + 1 + t i ). In such conditions, n+1 th packet is served immediately, D n+1 = 0, and based on (1), then W n = T n − D n . • X 3 : If n ∈ X 3 , the n+1 th packet arrives after inactivity timer is expired (D n + 1 + t i < T n ). X 3 has a single packet which is the last served packet. Therefore, n+1 th packet belongs to the next scheduling cycle. As a is the length of the next scheduling cycle's dormant period or, equivalently, the delay of the first packet in the next scheduling cycle. For compactness purposes, in the rest of the paper, the subscript n from random variables T n , D n and W n are removed, unless there is a need to emphasize their dependence of n explicitly. The summary of W n calculation is drawn in the second column of Table II.
We note that the WuSched-Offline is analyzed for Poisson traffic arrivals and thus it cannot strictly-speaking cover the case of retransmissions. This is because retransmissions would change the statistics of the packet arrivals (including new packets and retransmission) based on the channel quality, error model, and retransmission timings.

A. Stationary Probabilities
The stationary probabilities that the n th packet belongs to one of the three sets (X 1 , X 2 , X 3 ) need to be calculated to derive the delay and power expressions of the wake-up scheduler analytically. For this purpose, based on the definition of X 1 and X 3 , we can write, and where f D (t) is the probability density function (PDF) of D. Therefore, based on (2) and (3), we can model their relation as follows, Also by using (4) and the probability assignment rule Furthermore, we can model the expected value of W based on all possible values of W n (second column of Table II) by using the law of total probability formula as follow, Appendices B, C, D and E include the derivations of , respectively. Then, by substituting (4), (5), (31), (33), (34) and (36) into (6), we can obtain, Then, Pr[n ∈ X 3 ] and Pr[n ∈ X 2 ] can be calculated based on (4) and (5), respectively. The summary of the calculation of the stationary probabilities is drawn in the fourth column of Table II.

B. Average Holding Times
In this section, the average holding times (i.e., the length) of the empty and active periods, as well as the average number of packet arrivals during the dormant and active periods, are calculated. Note that we already derived the average length of dormant period in Appendix D.
1) Empty Period: If the n th packet belongs to X 3 , then the n+1 th packet is the first packet of the next scheduling cycle and hence the length of the empty period equals to T − D − 1 − t i . As a result, based on (32), where 1 is raised due to presence of the first packet in each scheduling cycle.
3) Active Period: During the active period, first, N d packets are served for a duration of N d TTIs, and then other packet arrivals, during the serving time of N d TTIs, with average number of N d λ packets, are served. After some rounds, there will be a point in which the inactivity timer expires, and no buffered packets remain in the queue. Therefore, the average number of received packets during the active period can be modeled by a geometric progression as follows,

4) Scheduling Cycle:
The average number of packets that is served during each scheduling cycle can be obtained as follows, Furthermore, the length of the inactivity timer (ω) is dependent on the packet inter-arrival time (t p ). If a packet arrives before t i , ω is equal to the inter-packet arrival time, otherwise ω equals to t i . Therefore, ω can be calculated as a function of t p as, By utilizing (11) and (13), we can obtain the average length of the active period as follows, Finally, the average length of the scheduling cycle (L) can be calculated as follows, where C 1 and C 2 are constants given by, The summary of the calculation of the average holding times and the average number of packets is shown in Table III.

C. Average Power Consumption
The average power consumption of the UE with wake-up scheduler, denoted by P c , can be calculated as the ratio of the average energy consumption and the corresponding overall observation period, expressed as, where e t and t t are the energy consumption of transitional states and the overall time period that the UE spends on transitional periods, which respectively read as, Due to the negligible value of the power consumption of the UE at OFF mode, we can further assume that PW of f ≈ 0. Therefore, (17) can be expanded as a function of γ as follows, From the above equation, it is clear that the average power consumption P c (γ) is a strictly decreasing function with respect to γ at γ ≥ 1, i.e., dPc(γ) dγ < 0. As expected, increasing the buffer size threshold reduces the power consumption.

D. Average Buffering Delay
By squaring both sides of (1) and using basic sum and multiplications, we can obtain the following equation, where Then, by averaging both sides of (20), we get, . (22) In Appendices G and H, we present the calculations of Cov[D, T ] and E[H]. Finally, the average delay can be obtained by replacing (43) and (51) into (22), as follows, For presentation purposes, we represent E[D] as D(γ). Similar to P c (γ), the derivative of D(γ) with respect to continuous variable γ can be calculated (refer to Appendix I), from which it can be concluded that the average buffering delay D(γ) is a strictly increasing function with respect to γ at γ ≥ 1, i.e., dD(γ) dγ > 0. As expected, contrary to the behavior of P c (γ), increasing the buffer size threshold increases the buffering delay. Therefore, a clear energy-delay trade-off appears in the selection of γ for the wake-up scheduler.

E. Offline Optimization of Wake-Up Scheduler
From the system-level point of view, the tunable parameter of the WuSched-Offline is the buffer size threshold (γ ≥ 1), assuming a fixed configuration of the w-cycle and the inactivity timer. For the sake of presentation compactness, we will not investigate how to set both parameters; readers can refer to our recent work in [24]. The remaining parameters of the wake-up scheduler (t on , t pd , t su ) depend on physical constraints and signal design, and accordingly, we assume them to be fixed as well. Based on these assumptions, we focus on optimizing the buffer size threshold (γ) in order to minimize the UE's power consumption while satisfying a specific delay requirement (i.e., average buffering delay should be less than or equal to a maximum tolerable delay, D max ), under Poisson traffic model assumption, for given values of t w , t i , t on , t pd and t su .
By using the analytical models of the power consumption and the buffering delay, as well as their behaviour as a function of γ (i.e., P c (γ) in (19) is a decreasing function and D ( γ) in (23) is an increasing function), and by following a similar approach as the one in [24], the optimal buffer size threshold (γ * ) can be easily obtained. The result is included in the next Theorem 1.
Theorem 1: The optimal buffer size threshold that minimizes the UE's power consumption while satisfying a specific delay requirement is γ * = γ m , being γ m the boundary point of the delay constraint, i.e., D(γ m ) = D max .
Proof: Thanks to dPc(γ) dγ < 0 and dD(γ) dγ > 0, we can easily show that γ = γ m is the optimal solution to minimize the UE's power consumption subject to a specific delay requirement, as detailed next. Fig. 3 (a) and Fig. 3 (b) show the decreasing trend of the power consumption and the increasing behaviour of the delay constraint as a function of γ, respectively, which satisfies dPc(γ) dγ < 0 and dD(γ) dγ > 0. Consider an arbitrary point C in the interior of the feasible region for γ (γ C < γ m where D(γ m ) = D max ). As it can be seen from Fig. 3, there is always a point close to the boundary of the delay constraint, denoted by D (γ D = γ m ), where its power consumption P cD is lower than that of C (P cD < P cC ). Then, we can conclude that under a given delay constraint, the point γ m always exists and attains the lowest power consumption within the feasible region, and hence it is the optimal solution. The γ m can be calculated using any standard root-finding algorithm that meets D(γ m ) = D max . Fig. 4 shows how γ * changes when λ varies for delay bounds of 30 ms, 75 ms and 500 ms, for t i = 1 ms and t w = 15 ms. It clearly shows that by increasing λ, γ * increases too. The high buffered size threshold reduces energy consumption; however, if the packet arrival rate is low, configuring a high buffer size threshold can increase buffering delay and cannot satisfy the maximum delay bound. As a result, a smaller γ should be configured for a low λ to satisfy the delay requirement. For high λ, it is necessary to increase γ to reduce energy consumption. Similarly, for higher delay bounds, γ can be configured high, due to the much-relaxed delay requirements. Interestingly, for high packet arrival rates close to 1 p/TTI, γ reduces to one, implying that the UE is on ON mode most of the time (because of the inactivity timer, most of the time the UE does not enter to OFF mode). This is the main reason for limiting λ for less than 1 p/TTI. Therefore, the wake-up scheduler is not effective anymore for packet arrival rates close to or beyond 1 p/TTI. Instead, other power-saving mechanisms, such as microsleep, could be used. Finally, as can be observed in Fig. 4, γ * (precisely γ m ) has a linear trend concerning λ for lower packet arrival rates, and this can be exploited to reduce the computational complexity of root-finding algorithms.

IV. ONLINE OPTIMIZATION OF WAKE-UP SCHEDULING BASED ON TRAFFIC PREDICTION
In this section, we present the online optimization of the wake-up scheduler, which aims at trading-off in between power consumption and packet delay in a dynamic manner by adaptively and autonomously determining when to send the WI, according to the traffic pattern and a maximum tolerable delay (D max ). Differently from the WuSched-Offline that was presented and modeled analytically in Section III, the WuSched-Online does not assume any a priori knowledge about the traffic statistics, and thus it is general and can be applied to all traffic distributions as well as mixed traffic combinations.
Proactively knowing the packet arrival times for a forecast horizon, allows the UE to remain at OFF mode for longer periods. In this regard, the proposed wake-up scheduler increases the sleep period of the UE as much as possible in a greedy manner by not sending WI = 1 until the average buffering delay approaches D max . For this purpose, the average delay is estimated for k packets, in every w-cycle.
In the proposed scheme, traffic predictor forecasts the packet arrival times of the target UE for the forecast horizon of one w-cycle based on past packet arrival times. In other words, the traffic predictor observes the session's packet arrival time for p previous TTIs until beginning of the current TTI (c) and then predicts the packet arrival times for the upcoming w-cycle with TTI indexes of [c, c + t w ). Note that, differently from the WuSched-Offline, the WuSched-Online can also cover retransmissions, by taking the packet arrival times of previous Furthermore, every w-cycle, a delay estimator block estimates the average buffering delay ( D) of k packets, assuming that the UE is switched on at the end of the upcoming w-cycle. If D is higher than D max , the network realizes that the only way to have shorter delay is by sending WI = 1 promptly. Otherwise (if D≤D max ), it leaves the UE to remain in OFF mode for at least another w-cycle. Finally, a delay comparator block performs the task of comparison and decision making (i.e., whether to send WI = 1 or WI = 0) accordingly.
The overall block diagram of the proposed WuSched-Online is shown in Fig. 5. The different modules and variables are described below.

A. Dataset From Real Traces
In this paper, the performance of the WuSched-Online is investigated using real video and audio streaming traces. For this, we monitored one operative network in Spain during one month using the online watcher presented in [27]. We have selected only those traces gathered during the night hours (1am -6am) to be sure that the selected cell is serving very few users. This allows us to assume that our traces are not affected by the packet scheduler at the base station, since an adequate number of radio resources per TTI is available to accommodate all the transmitting UEs.
Our dataset includes two columns: the Identifier of the UE, and the timestamp of the packet arrival (with TTI granularity). The classifier introduced in [28] is used to properly select the traces of the apps of interest. The collected dataset consists of 1500 sessions of different traffic type. For the sake of comparison, we also generated Poisson traffic with mean packet arrival rate of 0.2 p/TTI, and added them to the dataset.

B. Traffic Predictor
The traffic prediction can be formulated as a time series forecasting problem, where the packet arrivals at each TTI are defined as the values of the time series. The dataset with size z for a particular traffic type is represented by x t | z 1 , where x t indicates the packet arrival time during the t th TTI. In this work we tailor a stacked LSTM neural network architecture [29] to predict the next packet arrivals over a finite horizon. We choose LSTM since it has been proven in [29]- [31] to have lower prediction errors than other time series forecasting approaches, such as auto regressive integrated moving average (ARIMA) [32].
In the proposed architecture, multiple LSTM units are concatenated to form one layer of the LSTM network. Each unit computes the operations on single TTI and transfer the output to the next LSTM unit. The number of concatenated units indicates the number of TTIs (p) that are considered before making the prediction. The proposed architecture for the traffic predictor is depicted in Fig. 6. The LSTM unit of each layer extracts a fixed number of features, which are passed to the next layer. The depth of the network (e.g., the number of layers) is to increment the accuracy of the prediction, which is done by the last fully connected layer.
As shown in Fig. 5 and 6, the proposed network observes x t | c−1 c−p and, then, predicts the traffic in the upcoming w-cyclẽ x t | c+tw−1 c by delaying the prediction for the duration of t w . Finally, the output of the LSTM network (h t | c+tw−1 c ) is fed to a fully connected neural network that performs the actual prediction. The last feed-forward layer applies the softmax activation function, which is needed during the training phase to optimize the weights of the network neurons [30]. The first layer size corresponds to p observed TTIs, while the last layer output has a length equal to future horizon t w .
The traffic predictor is trained using the dataset in Section IV-A and specified for each of the considered traffic type. In particular, we have trained the LSTM for four traffic profiles: Youtube videos, Spotify audios, Mixed Youtube/Spotify, and Poisson traffic. The implementation of the traffic prediction algorithm was performed in Python, using Keras and Tensorflow, as backend. The chosen hyperparameters are reported in Table IV. The number of hidden layers is fixed to 5, which is the number giving a good trade-off between prediction accuracy and model complexity. For the training part, we used the Adam's algorithm [33] as optimizer and the Mean Absolute Percentage Error (MAPE) as loss function. We define the MAPE as follows, where x t is the predicted packet arrival time on the t th TTI.

C. Delay Estimator
We categorize packet arrivals during past observation [c−p, c) and forecast horizon [c, c+t w ) into three disjoint sets: (1) already served packets with index of 1≤n≤i, (2) buffered packets with index of i+1≤n≤j where j≤p, and (3) forecast packet arrivals for upcoming w-cycle with index of j+1≤n≤k, where k−j≤t w . Delay estimator utilizes the served packets' delay times (D n , for 1≤n≤i), and estimated delays of buffered and forecast packets (D n , for i+1≤n≤k), to estimate the average buffering delay ( D), as follows, Finally, the decision whether to send WI = 1 or not is decided by comparing D with D max . If the estimated delay is larger than maximum delay bound, WI = 1 is sent to the target UE.

V. NUMERICAL RESULTS
In this section, a set of numerical results are provided in order to evaluate the accuracy of the traffic predictor used for the online optimization of the wake-up scheduler (WuSched-Online, in Section V-A) and validate the functionality of the proposed wake-up schedulers (WuSched-Offline and WuSched-Online) for different traffic patterns including Poisson traffic (in Section V-B) and realistic traffic (in Section V-C).
As previously mentioned, four traffic types are considered: video streaming, audio streaming, mixed audio/video streaming, and Poisson traffic. One of the distinguishing features of the video and audio streaming is their low playback latency. The average latency to have high quality playback of a track is 265 ms [34]. Accordingly, for audio streaming, we assume that the maximum delay bound (D max ) is 265 ms. Similarly, we assume that the maximum delay bounds for video streaming, mixed flow and Poisson traffic are 40 ms, 40 ms, and 30 ms, respectively. Furthermore, for the numerical results, the UE power consumption model similar to [5], [8], [22], [25] is deployed, for which PW wrx =57 mW, PW on =850 mW, PW of f =16 mW, t su =15 ms, and t pd =10 ms. Regarding the WuS parameters, we assume t on =3/14 ms and t i =1 ms [5].
Three different sets of performance results, in terms of power consumption and delay, are presented. Namely, (1) wake-up scheme without scheduler ('WuS') that is considered as a benchmark scheme, (2)  According to Theorem 1 and (23), it is necessary for the WuSched-Offline to know the packet arrival rate a priori in order to calculate the optimal buffer size threshold. Therefore, in this work we assume that packet arrival rate is estimated based on an exponential moving average, as proposed in [35]. Authors in [35] introduce an approach to estimate the packet arrival rate, and they show that their method converges to the actual packet arrival rate under a wide range of traffic types.

A. Prediction Accuracy
In this section, we seek to evaluate the accuracy of predictions of the proposed traffic predictor as a function of the number of previous observations (p), the length of the horizon (t w ), and the type of applications generating the traffic. For that, we use the MAPE in (24) to quantify the accuracy of traffic prediction.
The impact of t w and p on the prediction errors is illustrated in Fig. 7. For shorter w-cycles, the predictions follow the actual values closely, whereas for larger w-cycles, the prediction error is bigger: longer forecast horizons (t w ) decrease the accuracy of the predictor, as expected. Furthermore, as it can be observed, the MAPE reduces with a larger number of observations (p) for all four traffic types. Also, the accuracy decreases (i.e., MAPE increases) based on the different traffic type. The accuracy rate is smaller for Poisson packet arrivals than for video and audio traffics, due to its simpler traffic pattern. For Poisson traffic, the MAPE increases around 15% when t w increases from 10 to 30 TTIs for given p = 20 TTIs; however, for other traffics the accuracy reduction is high and MAPE increases around 50% for the same t w change.
As shown in Fig. 7, from prediction accuracy point of view, it is desirable to reduce t w and enlarge p. However, in terms of power consumption, such a reduction of the w-cycle would contribute to a higher energy consumption due to frequent checking of wake-up signaling. Additionally, a higher number of past observations p involves a longer memory length of the LSTM network and a large amount of information that must be stored for a precise traffic prediction. As a result, the floating point operations per second (FLOPS) of the LSTM network increases. This complexity overhead can become very high, especially if the number of users per cell increases.
Note that different parameters of the traffic predictor can be configured in such a way that they provide adequate precision for the WuSched-Online, which is measured in terms of the estimated delay over a certain number of packets k (i.e., D in (25)). In particular, the impact of traffic prediction errors on the estimated delay depends on p, k and t w . To ensure efficient usage of the forecast horizon and, at the same time, limit the long-term differences in the quality-of-service to an acceptable level, k should be set longer than t w for the upcoming w-cycle. At the same time, k should be sufficiently  short so that prediction errors are not strongly noticed by a user. In this work, we set k to 45 packets.
From (25), it can be inferred that the estimated delay has lower sensitivity with respect to prediction accuracy. To illustrate this, we evaluate the impact of the prediction errors on the actual WuSched-Online performance. Fig. 8 depicts the power consumption of the WuSched-Online as a function of p and t w , for each traffic type, considering the associated maximum delay bounds. It can be observed that configuring p and t w to 20 and 15 TTIs, respectively, can achieve reasonable power saving. Indeed, further reducing t w and/or further increasing p beyond such values, reduces the power consumption slightly. Accordingly, for the rest of paper, we assume k=45 packets, t w =15 TTIs, p=20 TTIs.

B. Performance Evaluation: Poisson Packet Arrivals
In this section, we investigate the performance of the three methods (WuSched-Online, WuSched-Offline, and WuS) in terms of average buffering delay and average power consumption when traffic follows a Poisson pattern, and packet arrival rate (λ) is increased from 0 to 1 p/TTI. For this purpose, Fig. 9 and 10 show the average delay and power consumption of proposed mechanisms under two different delay bounds of 23 ms and 30 ms, respectively. Fig. 9 (a) depicts the average packet delay experienced by the WRx-enabled UE when packet arrival rates vary. As it can be observed, the average delay for WuS is about D max = 23 ms for lower arrival rates. Note that, in case of WuS, the average delay is dependent on start-up period and w-cycle. For the WuSched-Offline, the experienced delay follows closely the maximum delay bound for wider range of packet arrival rates, and is slightly shorter than the maximum tolerable delay. This is because of selecting the greatest integer less than or equal to the optimal buffer size threshold of the optimization problem. For the WuSched-Online, the actual average delay is slightly higher than the maximum delay bound. The main reason for such negligible excess delay is the unavoidable errors in the traffic predictions, whose impact depends on the w-cycle. In practice, to compensate for such small excess delay, the delay bound can be set slightly smaller than the actual average delay requirement. Finally, for larger arrival rates, all three methods' delays reduce sharply. This is because of the inactivity timer, which causes the UE to remain on active state most of the time, due to high arrival rates, and therefore the overall delay reduces to the packet processing delay. Fig. 9 (b) compares the average power consumption of the three methods under Poisson arrivals. As it can be seen, the simulated results of the WuSched-Offline closely follow the analytical results. Interestingly, one may observe that that the optimal buffer size threshold increases when increasing λ, as shown in Fig. 4. Based on dPc dγ < 0, it is expected that the average power consumption would decrease when increasing λ, however Fig. 9 (b) contradicts it. This can be justified by the fact that at same time that γ * increases, λ also increases, which increases the power consumption due to frequent packet processing, and it is a dominant contributor to the mean power consumption than the power reduction due to increasing γ. Additionally, there are some sharp reductions on the power consumption for lower packet arrival rates, caused by increasing γ with one unit. Furthermore, WuS and WuSched-Offline yield similar power consumption for lower packet arrivals, however, it is clear that WuSched-Offline consumes less power than WuSched-Online and WuS for larger packet arrival rates.  This shows that there is need to reconfigure and optimize WuS for different packet arrival rates. Also, the WuSched-Online outperforms WuS for higher packet arrival rates. Finally, for high packet arrival rates, all three methods approach to a fully modem ON scenario with power consumption of 850 mW. Similar to Fig. 9, Fig. 10 is drawn to show the buffering delay and average power consumption of the proposed methods under 30 ms delays. As it can be observed in Fig. 10 (a), the average delay for WuS is much lower than for the D max = 30 ms case. However, the proposed wake-up schedulers behave consistently, and adapt themselves to new delay requirement, similar to Fig 9 (a). Furthermore, Fig. 10 (b) compares the average power consumption of the three methods. It is clear that WuSched-Offline consumes less power than WuSched-Online and WuS. Also, the WuSched-Online outperforms WuS.

C. Performance Evaluation: Realistic Traffic
In this section, the average power consumption and the buffering delay of the three methods (WuSched-Online, WuSched-Offline, and WuS) are evaluated for different realistic traffic patterns. Fig. 11 shows the empirical cumulative distribution function (CDF) of packet delay for the four different traffic types. Generally, the video streaming's session is much longer than that of the audio traffic, and packets arrive burstly (implying high self-similarity). As it can be observed in video results of the WuSched-Online, a large number of packets are served with near to zero delay, and the reason is due to the consecutive packet arrivals that are served while the inactivity timer is triggered. At the same time, a large number of packets are served with delays larger than the maximum delay budget of video (40 ms), and this comes from the fact that the WuSched-Online is a greedy method and waits until the average buffering delay approaches to D max . As compared to the WuSched-Online, WuSched-Offline achieves similar average buffering delay (sketched with dashed vertical lines), however it has packets with longer delays (e.g., for video, there are packets with delays over 65 ms). Furthermore, WuS has a lower and consistent delay regardless of the traffic types. However, this comes at cost of an extra energy consumption (as it will be shown in Table V).
For mixed traffic flow (aggregation of video and audio traffics), the average delays are similar to video traffic rather  than to audio traffic. The reason is that the delay bound plays a pivotal role in the operation of wake-up scheme, which is the same for both traffics. The small difference between mixed and video traffic comes from the inaccuracy of the traffic predictor. Additionally, the WuSched-Offline satisfies the delay requirements by optimizing the buffer size threshold based on estimated packet arrival rate and delay bound. As shown in Fig. 11, the average delays of the WuSched-Online for different traffic types are slightly higher than D max , which is stemmed from prediction inaccuracy. Therefore, in order to satisfy the delay requirements, D max for the WuSched-Online could be set slightly lower than the actual delay requirements.
To complete the study, Table V shows the average delay and the average power consumption in third and fourth columns, respectively. It is clear that the average power consumption of WuS for all traffic types is higher than that of the WuSched-Online; however, it achieves a much lower buffering delay. Furthermore, the WuSched-Offline only outperforms the WuSched-Online for the case of Poisson traffic, and for rest of realistic types the WuSched-Online outperforms the WuSched-Offline. To illustrate the benefits of the wake-up schedulers better, we define the wasted energy (E w ) as the ratio (in percentage) of the energy that the UE consumes for transitory states plus inactivity timer over the overall energy consumption of the UE. Note that the rest of energy is consumed for processing the packets. The wasted energy E w is shown in the fifth column of Table V. As it can be observed, the gain of the WuSched-Online is coming from having less amount of wasted energy, owing to the use of an intelligently and greedily strategy so that packets are served mainly in a consecutive manner without the need for frequent start ups and power downs. For the case of Poisson arrivals, both wake-up schedulers have similar CDF shape, with a small difference that is stemmed from prediction errors. Moreover, it can be observed that audio streaming requires lower power consumption than the rest of traffic types, due to the small packet arrivals per given time period. Furthermore, due to the fact that packets in video streaming and mixed traffic flow have much higher self-similarity characteristics, the wasted energy is slightly lower than that of other traffics.
The computational complexity of the WuSched-Offline can be less than that of the WuSched-Online due to not using the predictive framework, which requires additional processing. However, the computational complexity for a cell can be most likely kept feasible even for larger UE populations, especially in applications such as machine-type-communication, where group-specific wake-up signaling could be utilized -instead of UE-specific, which further reduces the signaling overhead. Those users that may have similar traffic type can be grouped and network can utilize the same wake-up sequences and same predictive entities. Overall, the computing capabilities in the base-stations and other network entities are continuously growing, hence we believe that executing the predictive entity is feasible when the networks evolve.

VI. CONCLUSION
In this work, the concept of wake-up scheduling and two optimizations (offline and online) of its parameters are proposed. The offline optimization of the wake-up scheduler is analyzed mathematically for Poisson packet arrivals. On the other hand, the feasibility of the online optimization of the wake-up scheduler based on user traffic prediction has been investigated. For this purpose, a traffic predictor which leverages on LSTM networks is also proposed. A detailed and extensive analysis comparing the power consumption and buffering delay of both wake-up schedulers was carried out, under different traffic types and various design parameters. Both wake-up schedulers were shown to facilitate a lower energy consumption compared to the wake-up scheme without scheduler. Moreover, the online optimization of the wake-up scheduler outperforms the offline one for realistic traffic types. These promising results motivate jointly considering user traffic prediction and wake-up scheduler in order to reduce the energy consumption of users under different traffic conditions. Based on the numerical results provided in this paper, our view regarding the wake-up scheduling is that there is no 'One-Size-Fits-All Solution', unless the UE is well-defined and narrowed to a specific traffic type. Further interesting research areas include extending the proposed framework to autonomously combine and utilize different wake-up schedulers and power saving mechanisms together, and selecting the method that better fits for particular circumstances. While FIFO was considered in this work which does not discriminate between different traffic QoS requirements, our future work will consider the weighted fair queuing for wake-up scheduling in order to satisfy the diverse QoS requirements of different services.

APPENDIX A
In this section, we prove that if T has an exponential distribution with mean 1/λ, then T = T − t has the same distribution as T for t > 0. Due to fact that T has an exponential distribution, it has memory-less property as follows (s ≥ 0), Furthermore, by assuming T = T − t, and based on the above equation, we can write, where F T (s) is the CDF of T . Additionally, by expanding (27), we can obtain, Since s is assumed to be non-negative, therefore, F T (0) = 0, and based on (28), we can conclude that Pr[T > 0] = 1. As a result, we can can express that T has an identical exponential distribution with T .

APPENDIX B
By replacing t with D + 1 and using the proof explained in Appendix A, we can state that T −D −1 has an exponential distribution that is the same as that of T , and hence, (29) Furthermore, based on the law of total expectation, for any exponentially distributed random variable we can write, Then, based on (29) and (30), APPENDIX C Similarly to the derivation of E[(T − D)|n ∈ X 2 ] in Appendix B, by utilizing Appendix A, replacing t = D + 1 + t i , and assuming n ∈ X 3 or equivalently T > D + 1 + t i , we can write, (32) so that, APPENDIX D The first packet that arrives in each scheduling cycle has to wait for the arrival of other γ − 1 packets plus the time period until the end of the w-cycle (referred to as t r , as shown in Fig. 2) as well as WRx's on time (t on ) and the start-up period (t su ). Since Poisson arrivals are independently and uniformly distributed on any interval of time, we can assume that the arrival instant of the γ-th packet is uniformly distributed along the last w-cycle, which can be justified due to the relatively short length of t w . Hence, an average extra delay of t w /2 is introduced. Consequently, the mean transmission time of the first packet is delayed as follows (which is equivalent to the average holding time of the dormant period), where C 0 is a constant that can be obtained as follows, APPENDIX E By averaging both sides of (1), and assuming a stationary system, we can obtain, APPENDIX F In this section, we prove that E[T 2 |(T > t i )] = t 2 i + 2t i /λ + 2/λ 2 . By using the result in Appendix A, the PDF of conditional exponential distribution T |(T > t i ) is the same as T with time-shift t i , i.e., λe −λ(t−ti) . Therefore, the expected value of T 2 can be obtained as follows, APPENDIX G Due to the independence of D n and T n for n ∈ X C d ∪{N d }, the covariance of D and T is zero for those packets arriving during the active period (see second row of (41)). Similarly, if γ = 1, the covariance of D and T is zero for all values of n, as written in the second row of (41). However, it is obvious that D n for the dormant period (except the last packet in the dormant period) depends on the following packet arrivals until the end of the dormant period (provided that γ is greater than one), D n = T n +T n+1 . . .+T N d −1 +t r +t on +t su +n − 1, (38) for all n ∈ X d − {N d }.
In order to find Cov[D, T ], we follow a similar approach as the one described in [26] for GI/G/1 queuing system. According to the law of total covariance, the covarinace relation between any three random variables (i.e., n, D, T ) can be written as follows, APPENDIX H The expected value of H n (already expressed in (21)) can be calculated by using the law of total probability formula, as follows (summarized in third column of Table II), for n ∈ X 2 . L d 2 − (T n − D n − 1) 2 , for n ∈ X 3 .  (29), by utilizing met. z 3 (γ) is always positive, because its numerator (refer to it as N z3 (γ)) is an increasing function with respect to γ, and N z3 (1) ≥ 0 is met for all values of the parameters, so that we can conclude that z 3 (γ) is always positive for γ ≥ 1.