Delay-Optimal Resource Scheduling of Energy Harvesting-Based Devices

This paper investigates resource scheduling in a wireless communication system operating with energy harvesting (EH)-based devices and perfect channel state information (CSI). The aim is to minimize the packet loss that occurs when the buffer is overflowed or when the queued packet is older than a certain pre-defined threshold. So, we consider a strict delay constraint rather than an average delay constraint. The associated optimization problem is modeled as the Markov decision process (MDP) where the actions are the number of packets sent on the known channel at each slot. The optimal deterministic offline policy is exhibited through dynamic programming techniques, i.e., value iteration (VI) algorithm. We show that the gain in the number of transmitted packets and the consumed energy is substantial compared to: 1) a naive policy which forces the system to send the maximum number of packets using the available energy in the battery; 2) two variants of the previous policy that take into account the buffer state; and 3) a policy optimized with an average delay constraint. Finally, we evaluate our optimal policy under imperfect CSI scenario where only an estimate of the channel state is available.


I. INTRODUCTION
Energy harvesting (EH) technology has emerged recently as a promising solution to improve the energy efficiency and self-sustainability of 5G mobile and IoT networks. While relying on renewable energy sources in their surrounding environments, the mobile devices can harvest energy to perform their communications and operational tasks. In this way, they can extend I. Fawaz is with CEA, LIST, Communicating Systems Laboratory, PC 173, 91191 Gif-sur-Yvette, France. M. Sarkiss is with Télécom SudParis, 9 Rue Charles Fourier, 91000 Evry, France. P. Ciblat is with Télécom ParisTech, 46 Rue Barrault,75013 Paris, France. Contact: ibrahim.fawaz@cea.fr, mireille.sarkiss@telecom-sudparis.eu, philippe.ciblat@telecom-paristech.fr. Part of this work has been published in IEEE ICC 2018 conference [1]. This work has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 675891. April 17, 2019 DRAFT their battery lifetimes by reducing their dependency on conventional battery and grid power, decreasing thus their carbon emissions. However, in contrast to conventional power supply where the available energy is fixed, harvested energy arrives randomly and sporadically due to environmental influence (weather, geolocation), rendering unpredictable the available energy behavior. To avoid the waste of energy excess and save it for future use, capacity-limited batteries is used to store the collected energy. The stochastic energy harvesting process and the energy storage constraints in addition to the time-varying nature of the wireless channels bring new design challenges in EH communications making the optimization of the transmission policies a more difficult task. Therefore, efficient resource scheduling of mobile devices need to adapt the transmission rate and power to the dynamic levels of the available energy and the channels in order to ensure the users quality of service (QoS) and the system sustainability.
During the past decade, extensive research efforts have been devoted to investigate resource scheduling with EH capabilities at the transmitters [2]- [11]. Surveys can be read in [12], [13]. In these works, several performance criteria have been optimized such as throughput, completion time, average delay, outage probability, for various models of energy arrival rate, battery capacity, or fading channel. For instance, in [2], data amount transmitted during a predefined time was optimized and the transmission completion time was minimized by choosing carefully the transmit power when the channel is time-varying. The authors proposed optimal offline policies based on directional water-filling in a non-causal energy setting which means that the energy amount available at any time is known in advance. They also proposed online policies using continuous time stochastic dynamic programming in a causal energy setting. The throughput maximization problem was similarly investigated in [3] but for limited energy battery and limited data buffer, allowing thus buffer overflow. The optimal solutions were proposed by decoupling energy and data problems using a new variant of directional water-filling with added energy pumps, or applying recursively the shortest path algorithm. When only causal Energy State Information (ESI) and Channel State Information (CSI) are available, the same throughput maximization problem was modeled as Markov Decision Process (MDP) in [4] and related optimization techniques were used. In [5], an online algorithm maximizing the throughput is designed by assuming capacity-limited EH system. It relies on a new estimation method of future energy arrivals without any prior information. Both offline and online algorithms were also provided in [6] to maximize the throughput in finite-horizon scheduling with EH transmitter. April

17, 2019 DRAFT
The offline solution is expressed in terms of water levels and the online solution minimizes successively the expected throughput losses with respect to the offline optimal decision. Finitehorizon optimization problem was also considered in [7] to minimize the outage probability in a EH system. A low complexity fixed threshold transmission is proposed based on the offline mixed integer linear programming solution. In [8], an average delay optimal scheduling problem under energy consumption constraint was studied where the transmitter relies on hybrid energy supplies. Actually, the data transmission is mainly powered by harvested energy and resorts to power grid as a backup. The problem was modeled as a two-dimensional Markov chain and an optimal policy depending on a critical threshold of the queue length is proposed using Linear programming formulations. In [9], optimal deterministic scheduling in EH-powered network satisfying an average delay constraint and an average consumed energy constraint was obtained by minimizing the packet blocking probability, due to non-transmission at the transmitter. The problem was formulated as an MDP and solved using dynamic programming Value Iteration (VI) algorithm. In [10], a weighted packet loss rate under an average delay constraint is minimized leading to a constrained MDP and solved by using a linear value iteration approximation that locally determines the energy allocation at every EH wireless node by multilevel water-filling.
Near-optimal policy was also derived by applying online learning based on post-decision state framework. In [11], MDP modeling and online post-decision learning approach were derived to maximize the data arrival rate at the transmitter queue under delay and energy constraints. Two delay constraints were considered: average delay constraint or statistical delay constraint. This latter is a bounded delay with maximum acceptable delay-outage probability constraint.
In this paper, we address a resource scheduling for a point-to-point communication powered by energy harvesting at the transmitter side. It may correspond to an Uplink (UL) case where the transmitter is a node with energy harvesting ability and the receiver is a base station plugged on the grid. Unlike [8]- [10], the main novelty of this work is by considering a strict delay constraint on each queued packet in the transmitter buffer rather than an average delay constraint. We have initially introduced this hard constraint on the delay in [14] to find the optimal scheduling policy minimizing the average power consumption. Now, we incorporate energy harvesting aspects within the scheduling problem. Working with hard delay constraint is timely even if it involves a more complicated system description. It has especially led to a new way to think information theory by using short-length block codes as in [15] and by applying it on some resource allocation power policy based on only 1-bit feedback was proposed in [17] for EH communications over Rayleigh fading channels. The receiver sends bit 1 if the channel realization is above a certain threshold. Then, the transmitter does not transmit if the bit is 0 or transmits with a certain predefined power. The related data rate is chosen according to the threshold and not to the true value of the channel realization. Consequently, the selected data rate always ensures a safe transmission but with a pessimistic rate. The paper found out the optimal feedback channel threshold and the optimal policy that maximizes the throughput based on finite-horizon constrained MDP formulation. In [18], the problem of data amount maximization within a fixed duration was studied assuming imperfect CSI at the transmitter (CSIT) in point-to-point communications with an EH transmitter. The authors proposed first a Markov process to model the energy arrivals and the channel impulse response with strong correlations and then derived the optimal online power scheduling policy using finite-horizon dynamic programming techniques. In addition, they studied the performance limits of EH systems under imperfect CSIT through an asymptotic analysis of the average throughput at low and high average energy recharge rates. In [19], they determined the optimal offline policy for a similar problem. In previous paper, they do not consider the cost to obtain the CSI even imperfectly, such as the energy consumption to send training sequence and the time spent to estimate and so not available anymore for doing data transmission.
In this paper, we investigate both perfect and imperfect channel state information at the transmitting EH device in our scheduling problem. In a first part, for perfect CSI scenario, we April 17, 2019 DRAFT ideally assume that the channel is perfectly known at the transmitter without any cost. Taking into account sporadic energy arrivals, random data arrivals and time-varying channel states, we minimize the packet loss rate, i.e., the average number of discarded packets induced by strict delay constraint in addition to buffer overflow constraint. We formulate the problem as an MDP and solve it using Relative Value Iteration algorithm. We find an optimal offline stationary policy and compare it with a naive policy that performs immediate scheduling irrespective of energy and buffer states, and two variants of it taking into account the buffer state in the decision process.
Then, we compare our proposed system with a similar one using average delay constraint. In this part, we mainly consider i.i.d EH process for sake of simplicity and clarity, but we compare also the results when time-correlated EH process is considered.
In a second part, for realistic imperfect CSI scenario, we consider that acquiring channel estimates incurs some time and energy costs on the system performance. We assess the previously obtained optimal policy under imperfect CSI conditions due to channel estimation errors. We also consider imperfect CSI assumption with Automatic Repeat ReQuest (ARQ) protocol, allowing thus packet re-transmission. Therefore, in these cases, the packet loss rate is affected twofold: on one side, with respect to the imposed strict delay because of a smaller transmission period of data packets, or because of a longer duration of packets in the buffer for re-transmission (with ARQ protocols); and on the other side, with respect to the erroneous channel estimation which can lead to an increase in the number of discarded packets. We analyze the system taking into account these errors and show through numerical results that an appropriate trade-off is needed between the channel estimation accuracy and the transmission period in order to reduce the dropped packets depending on the available energy, energy arrivals and data arrivals.
The remainder of the paper is organized as follows. In Section II, we describe the system model.
In Section III, we formulate the optimization problem as an MDP and solve it using value iteration algorithm. In Section IV, we present the framework of the imperfect CSI scenario. We provide and analyze numerical results in Section V. Finally, we give some concluding remarks and perspectives in Section VI.

II. SYSTEM MODEL
We consider a point-to-point communication over a fading channel with an energy harvesting transmitter. The transmitter is equipped with two queues: one corresponds to a capacity-limited April 17, 2019 DRAFT battery to store harvested energy from an external source and the other is a finite buffer to store data packets arriving from the upper layer. The communication is slotted into consecutive epochs of equal duration T s . At the beginning of each time slot, scheduling decisions are made to define the number of packets to be transmitted during the slot depending on energy arrivals and data arrivals during previous slot as well as channel states at the current time.

A. Energy model
Due to the random nature of energy harvesting sources, we model the EH process as an independent identically distributed (i.i.d.) Poisson distribution with an average arrival rate λ e .
We assume that the energy arrives in multiple packets of energy units (e.u) of E U Joules (J) 1 .
The received energy is stored in a battery of finite capacity B e , and is lost when it exceeds B e .
At the beginning of time slot n, let e n denote the harvested incoming energy (counting as a number of the energy units). Its probability distribution is given by We assume that the processing energy is negligible compared to the transmission energy, thus the energy stored in the battery is only used for communication. We also consider the energy causality constraint where the system can only transmit if a sufficient amount of energy is available in the battery. Let b n denote the energy level of the battery at the beginning of time slot n, b n ∈ {0, ..., B e }, and E n the energy consumed to send packets during time slot n, then E n b n . In addition, we suppose perfect energy state information at the transmitter (ESIT).

B. Data queue model and strict delay constraint
The transmitter receives also data packets and store them for future transmission in a data buffer of size B d packets. We model the data arrival process as an i.i.d. process following a Poisson distribution with an average arrival rate λ d . We assume that all packets are of the same size L bits. At the beginning of time slot n, let q n denote the queue length in the buffer, q n ∈ {0, ..., B d }, and a n the received packets with probability distribution 1 There is a huge amount of literature assuming i.i.d EH processes. We adopted this approach for sake of clarity. Nevertheless, this work can be easily extended to time-correlated EH processes. This is done in Section V to plot Fig. 11.

April 17, 2019 DRAFT
A packet is discarded from the buffer • if there is a buffer overflow, i.e., if the sum of packets in the queue and arrival packets exceeds the buffer size. In that case, we discard the arrival packets in overflow; • if there is a delay violation, i.e., it stays in the queue more than K 0 slots. This can occur if the system decides not to transmit for a long period due to energy shortage or bad channel conditions.
In order to describe the delay violation, we need to introduce a new variable k i (n) counting the time spent in the buffer of the i-th packet at time n. By definition, we have k i (n) ∈ {−1, ..., K 0 }, ∀i, k and k i (n) = −1 for an empty space in the buffer (i.e., when the i-th packet does not exist). In Fig. 1, we provide a buffer state at time n. Notice that k j (n) ≤ k i (n), ∀i j.
Buffer of B d packets (ordered from the oldest to the newest) q n packets empty area

C. Channel model
We consider a single user flat-fading channel with signal bandwidth W (Hz) and additive white Gaussian noise with zero mean and variance N 0 . During time slot n, the channel remains constant with complex-valued amplitude h, and varies in a i.i.d. manner across time slots. We define the channel gain as g n = |h n | 2 , where g n is a continuous random variable distributed exponentially with probability density function p(g) = 1 ξ e − g ξ with mean ξ. For the sake of simplicity, we assume only quantized channel state x n = Q g (g n ), where Q g (.) represents the quantization process 2 . Fixing a sequence of fading power quantization thresholds, the channel gain x n is then a discrete variable taking values from a finite channel state space X .
In order to define the discrete channel states, let M be the number of quantization levels, {t m } M −1 m=0 the set of thresholds and {L m } M −1 m=0 the set of quantization levels for Q g . The quantization regions of the channels are then given by the intervals I m = [t m , t m+1 [ with t 0 is fixed such that the transmission of 1 packet using 1 e.u. is guaranteed and t M = +∞. In our model, we consider a Uniform quantizer. So, let t max = t M −1 be the maximal threshold such that the transmitter can send U 0 packets using B e e.u., where U 0 is the maximal value of scheduled packets. By applying E(x n , u n ) = B e in (1) and u n = U 0 in (2) (equations (1) and (2) are defined in next section II-D), we obtain the corresponding value for x n which is forced to t max . The uniform quantization thresholds are given by t m+1 = t m + δ with m = 0 . . . M − 2 and δ = tmax M −1 . We select the quantization levels as the lower bound of the regions, which is the worst case scenario. Thus, L m = t m for m = 0 . . . M − 1 and a channel is said to be in state Note that the defined quantization process and parameters are used by default for the perfect CSI scenario, thus the values of x correspond to the perfect discrete channel states. However, for the imperfect CSI, the channel is first estimated before being quantized. Letĥ n andĝ n = |ĥ n | 2 denote the estimated channel and the estimated channel gain. Then, the estimated discrete (quantized) channel states are defined byx n accordingly. In this case, a channel is said to be in statex n = x n ifĝ n ∈ I m while g n ∈ I m with m = m.

D. Consumed energy
We denote u n (u n q n ) the number of packets to be transmitted during time slot n of period T s , through the channel of gain x n for perfect CSI and the channel of gainx n for imperfect CSI.
In the former case, the consumed energy to transmit these packets is expressed as an integer multiple of the energy unit. It is given by where is the required power for this transmission. In the latter case, similar expressions are obtained by replacing x n by the estimated channel gainx n and T s by T s − τ where τ is the time required to perform channel estimation.

III. PROBLEM FORMULATION AND RESOLUTION IN PERFECT CSI SCENARIO
In this section, we assume first perfect CSI at the transmitter without any cost. Our main objective now is to ensure reliable communication by minimizing the number of discarded packets due to strict delay and buffer overflow constraints. This can be achieved by finding an optimal policy that specifies the number of packets u to be scheduled at each time slot based on the past system states and actions. The optimization problem can be formulated as MDP problem [30].
We characterize in this section the appropriate states, actions and reward of this MDP.

A. State Space
The state space S is the set of s = (k, b, x) where Notice that in the previous works [9], [10], the queue length q describes the data buffer states.
In our work, q is replaced with k due the strict delay constraint. In fact, q is unnecessary when k is given since The state space is finite, and the total number of possible states is |S| which is upper-bounded by The state space can be significantly reduced by assuming that packets are queued in an increasing order of time spent in the buffer, i.e. k 1 (n) ≥ k 2 (n) ≥ · · · ≥ k qn (n).
For instance, if we consider B d = 6, K 0 = 3, B e = 4 and |X | = 5, the upper-bound is 390625 while our system only has 5250 states by removing all the impossible combinations in k.

B. Action Space
The action space U denotes the number of packets u that the transmitter can send during a time slot. This space is finite and the number of actions is |U| = U 0 + 1.

C. Markov Decision Process
On one hand, during time slot n, w n = max(u n , m n ) packets leave the buffer, either transmitted and/or discarded where u n is the number of transmitted packets and m n is the number of packets April 17, 2019 DRAFT with delay K 0 slots in the buffer 3 . The age of the remaining packets in the buffer is incremented by 1. Moreover, a n+1 new packets arrive to the buffer with age 0. Therefore, the vector k can be updated from slot n to slot n + 1 according to the following rule.
1: for i = 1 to q n − w n do k i (n + 1) = k wn+i (n) + 1 end for 2: for i = q n − w n + 1 to q n − w n + a n+1 do k i (n + 1) = 0 end for On the other hand, during time slot n, e n+1 e.u are harvested and stored in the battery and E n e.u are removed from the battery to schedule u n packets. Therefore, at time slot n + 1, the battery state is updated according to We thus remark that k n+1 (resp. b n+1 ) only depends on previous state k n (resp. b n ), action u n (resp. E n ) and external perturbation a n+1 (resp. e n+1 ). Therefore, we can define p(s |s, u) as the transition probability to fall in the future state s = (k , b , x ) after taking action u in the current state s = (k, b, x). Assuming that the buffer, battery and channel states are independent and channel states are not correlated, the transition probability satisfies the following equation.
where p(x ) is the distribution of the channel states, p(k |k, b, u) indicates the probability transitions between buffer states, and p(b |b, x, u) indicates the probability transitions between battery states. After tedious but simple derivations, we obtain the transitions between the buffer states and the battery states according to the following respective rules.  where Q(•, •) is the regularized Gamma function.

D. Markov Decision Problem and its Resolution
In the context of infinite-horizon MDP, we consider time-averaged cost, where at a given time slot n ∈ {0, · · · , N }, the system state is denoted by s n = (k n , b n , x n ) and µ(s n ) = u n is the action deciding the number of packets to be transmitted. We aim at finding the optimal policy µ that minimizes the average number of dropped packets. The cost function of this infinite-horizon MDP problem is given by where E is the expectation with respect to the policy µ and where ε d (s n , u n ) is the instantaneous number of discarded packets due to delay violation and ε o (s n , u n ) is the expected number of April 17, 2019 DRAFT discarded packets due to buffer overflow. According to [29], we know that finite-state MDP without additional constraint exhibits an optimal deterministic policy. Thus, the function µ is a deterministic policy and µ is the optimal deterministic policy to be found.
At a given slot n, when the system state is s n and the performed action is u n , the number of discarded packets due to delay violation is given by The buffer overflow occurs when q n − w n + a n+1 > B d , thus the number of discarded packets due to buffer overflow is obtained as follows We need to consider an expected reward for the buffer overflow since at the beginning of the slot (when the decision is made), the number of incoming packets is only known statistically.
Finally, our MDP optimization problem can be stated as We know that µ exists [29] and can be found via an offline dynamic programming approach using, for instance, the so-called VI algorithm [30]. Exploring statistical a priori knowledge of energy arrival and data arrival dynamics and channel states at the EH transmitter, the offline approach can accurately model the state transition probabilities of the MDP and provide an optimal solution. The optimal offline deterministic policy µ for Problem 1 can be computed through Algorithm 1.

IV. IMPERFECT CSI SCENARIO
In wireless communication systems, channel state information is not perfectly known at the transmitter and can include errors due to the channel estimation process. Indeed, in a Time where c(s, u) is the instantaneous cost and s 0 is a fixed state chosen arbitrarily. Division Duplex (TDD) UL transmission between an EH device and a base station, the CSI can be obtained at the EH device by first estimating the channel at the base station via an UL training process and then feeding back a quantized version of the estimate to the transmitter.
We assume that the feedback channel is error-free and instantaneous as soon as the receiver has estimated the channel. Therefore, accounting for the channel estimation phase, the time slot structure is divided into two parts: a duration of τ ms to acquire CSI at the mobile device and the remaining (T s − τ ) ms to schedule data packets. In particular, the EH device exploits the acquired CSI to send data whenever scheduling decisions are made. In this section, we aim at evaluating the optimal policy µ obtained with Algorithm 1 when the CSI are imperfect which means that the current states used for computing the output of µ are not necessary correct.

A. Channel estimation
At τ ms after the beginning of time slot n, we consider that the EH mobile device has an estimated discrete channel statex n as described in Section II-C. This estimated channel can be obtained through a training sequence of η pilot symbols using a total training power P tr during the period τ of the time slot. Then, the required energy to perform this channel estimation is Due to the imperfect channel estimation, we havê where e hn is the estimation error independent of h n and it is a zero-mean i.i.d. complex-valued Gaussian process with variance σ 2 e per complex dimension. According to [31], this error variance can be expressed in terms of energy per pilot symbol E s , the number of pilot symbols used for estimation η and the Gaussian noise variance per complex dimension σ 2 w as Given the channel gain g n , the estimated channel gainĝ n = |h n + e hn | 2 is a non central χ 2 random variable with 2 degrees of freedom in which the Gaussian variables are independent with common variance σ 2 e /2 and mean g n = |h n | 2 . It has a probability density function (PDF) of the form where I 0 is the zero-order modified Bessel function of the first kind [32].

B. Error probability and packet loss rate
In this section, we analyze the impact of channel estimation on the system performance, in particular on the packet loss rate. In fact, channel estimation can affect the number of discarded packets in three ways. Firs of all, the transmission period is reduced which offers less time to transmit the same amount of data. On one hand, if the channel estimate is smaller than the actual channel, less packets can be scheduled at decision instants. Thus, more packets can be queued in the data buffer with higher delays, and may lead to more delay violation and buffer overflow occurrences. On the other hand, if the channel estimate is higher than the actual channel, the scheduled packets are all dropped. This latter condition incurs additional loss rate besides the delay violation and buffer overflow losses given in equations (7) and (8). Therefore, we need to take into account such errors in the total error probability. This extra error probability (called, channel mismatch probability in the rest of the paper) can be expressed as Then, using Bayes rule and some derivations, we can compute (20) where PĜ |G (ĝ|g) is given in (15), Q 1 is the Marcum function, and P G (g) = 1 and 0 otherwise is the probability density function PDF of the channel gain.
At a given time slot n, when the action u n is done by applying the optimal policy µ (obtained for the perfect channel knowledge case) on the estimated channel statex n > x n , the number of discarded packets due to CSI errors is computed as ε e (u n , P e,CSI ) = u n × 1(P e,CSI = 0), (21) and the cost function of our MDP problem under policy µ and imperfect CSI is given by ε d (s n , u n ) + ε o (s n , u n ) + ε e (u n , P e,CSI ) .

V. NUMERICAL RESULTS
We evaluate numerically the optimal policy obtained by resolving Problem 1. We consider a system as described in Section II with the following characteristics: the slot duration is T s = 1 ms and the maximum delay is These channel values are obtained according to Section II-C. The noise power spectral density is N 0 = −87 dBm/Hz and the allocated bandwidth is W = 5 MHz.

A. Perfect CSI
In this section, we consider that the transmitter has a perfect knowledge of the channel state without any cost.
In Fig. 2, we plot the average number of discarded packets versus the number of iterations for evaluating the optimal policy obtained by the VI algorithm for various energy arrival rates λ e where the data arrival rate λ d is fixed to 1.5. We show that the VI algorithm converges rapidly within a few hundreds iterations for most cases. We can also notice that as λ e increases, the average number of discarded packets considerably decreases. Indeed, when the available energy from the surrounding environment is larger, the system is able to send more packets, reducing thus the number of discarded packets.  Fig. 3, we display the percentage of discarded packets versus the data arrival rate λ d for different energy arrival rates for two policies. The first policy is the (deterministic offline) optimal one introduced in this paper and obtained after convergence of the VI algorithm. The second policy is a naive one in which we force the transmitter to send the maximum number of packets using the available energy in the battery. As we can observe, the proposed optimal policy provides significantly better performance than the naive one in terms of percentage of discarded packets.
In fact, this policy enables us to adapt the transmission rate according to the buffer, battery and channel conditions. In addition, we remark that the number of discarded packets increases when the data arrival rate λ d increases because the buffer overflow could happen more often.
On the one hand, when the energy available to scavenge is low (small λ e ), an efficient energy management becomes crucial to ensure the sustainability of the system, and the gap between both policies increases. On the other hand, when a large amount of energy is available (large λ e ), the system can survive even without controlling relevantly the energy consumption which leads to similar performance between the optimal and naive policies. Similar to Fig. 3, Fig. 4 compares the percentage of discarded packets of the optimal policy with two other variants of the naive policy. Unlike the naive policy that sends the maximum number April 17, 2019 DRAFT of packets using the available energy in the battery, the introduced p-Naive policy restricts the number of packets sent by the naive one by taking the buffer state into account through an additional parameter p in that way: • fixed p: The policy sends only the packet i from the buffer if k i ≥ p • variable p: The policy performs a first step similar to the previous case (fixed p). If no packet satisfies the condition, p is decreased by 1, and the first step is repeated, until p = 0.
The naive policy corresponds to a 0-Naive policy. Here, we choose p = 2 for the p-naive policy.
As we can see, taking only the age of the packets into the buffer without adapting carefully the number of packets by the energy battery level and buffer state leads to decrease the number of sent packets, and therefore the naive policy remains much better. Fig. 4: Percentage of the discarded packets versus data arrival rate with different energy arrival rates and different naive policies.
In Fig. 5, we show the percentage of discarded packets due to delay violation among all the discarded packets for the optimal policy with different data arrival rates λ d and energy arrival rates λ e . As explained before, a packet can be discarded due to either delay violation or buffer overflow.
When the data arrival rate increases, the probability to discard a packet due to buffer overflow increases which decreases the contribution of the delay violation in the discarded packets. When the energy arrival rate decreases, the percentage of discarded packets due to the delay violation April 17, 2019 DRAFT slightly increases because, in average, a packet remains more often in the buffer since there is no energy enough to transmit it. Hence, it is flushed from the buffer for latency's purpose. Fig. 5: Percentage of the discarded packets due to delay violation versus data arrival rate and energy arrival rate with the optimal policy.
In Fig. 6, we plot the average consumed energy versus the data arrival rate λ d with different energy arrival rates λ e . We observe that the optimal policy consumes less energy than the naive one while sending more packets because it adapts the number of transmitted packets per slot to the channel conditions and the battery state and thus, the transmission is done according to the energy it consumed.
In Fig. 7, we show the average battery state versus the packet arrival rate λ d with different energy arrival rates λ e . As the optimal policy offers a lower energy consumption (see Fig. 6), the battery is less used and its energy level is thus higher. This ensures a better sustainable communication with less number of discarded packets.
In Fig. 8, 9 and 10, we compare the performance of our optimal policy to the optimal policy obtained by forcing the average (instead of the strict) delay to be less than a pre-defined threshold.
Both policies are applied assuming buffer overflow and delay violation as the way to drop the where q n is the queue length. Notice that we do not consider the delay violation for this optimization since the strict delay is not taken into account in this policy as we just force the average delay to be less than a threshold. So the policyμ Qct is done to handle properly the average delay and not the strict delay.
Our optimal policy adapted to strict delay has been computed with K 0 = 3. In order to compare both policies in the strict delay constraint set up (it means that the packet is dropped if the delay is strictly larger than K 0 even if we apply the policyμ Qct ), we need to choose properly D ct . It makes sense to force D ct ≤ 3 in order to have a small amount of dropped packets due to delay violation. As D ct = 2 or D ct = 3 have led to similar performance, we have fixed D ct = 3.
As we can see, our policy outperforms the policy considering only the average delay in terms of percentage of discarded packets, consumed energy (in most cases), and battery levels (in most cases). So, it was worth to do the effort to optimize the policy by taking into account the strict delay into the state model rather than just using the optimal policy adapted to the average delay with a well-tuned threshold.
We now consider that the EH process is time-correlated. In order to cast this assumption into an MDP framework, we need to add EH process e to the state of the system, i.e., s = (k, b, e, x) instead of (k, b, x) as done previously. Then, a new optimal policy taking into account the EH correlation is re-computed by using the same tool, i.e., the VI algorithm. Here, we assume that the transition probability of the Markov Chain satisfies the following equation where p(b , e |b, e, x, u) is obtained according to the following rules:  p(b , e |b, e, x, u) = 0.
In addition, the transition probability from a energy arrival state j at time slot n to another energy arrival state i at time slot n + 1 is given by where ρ e is the so-called correlation factor and H e is the set of potential energy units harvested during one slot.
In Fig. 11, we compare the performance of the optimal policy (adapted to time-correlated EH process) with the naive policy. We set H e = {0, 1, 2} e.u. per slot. The proposed optimal policy is still better than the naive policy. The performance of the system decreases when ρ e increases because the system will be trapped in the state e = 0 for a longer period of time, leading to more discarded packets.

B. Imperfect CSI
In this section, our goal is to evaluate the proposed optimal policy when the transmitter relies on an estimated version of the channel state. The estimation phase duration is equal to τ = 10 April 17, 2019 DRAFT Fig. 11: Percentage of the discarded packets versus data arrival rate with different correlated energy arrival rates between strict and naive policies.
µs (1% of T s ), and a power of P tr = 4 mW is used. The corresponding energy consumption for the estimation phase is thus E = 40 nJ which can be neglected to the energy unit, and therefore we assume E ce = 0 e.u..
In Fig. 12, we compare the percentage of discarded packets between perfect and imperfect CSI scenarios. For low data arrival rate λ d , the gap between both scenarios is large. Indeed, in our set up, the smallest channel mismatch probability is between 10 −3 and 10 −2 which implies that the percentage of discarded packets is necessary worse since as soon as the channel is overestimated, the packets are dropped. However, when the data arrival rate increases, the buffer overflow can happen more often and the channel mismatch probability has less impact, which lead both scenarios to behave similarly.
In Fig. 13, we compare the optimal and naive policies under perfect and imperfect CSI scenarios.
For small energy arrival rate λ e , the optimal policy under imperfect CSI is better than the naive policy with perfect CSI, because the latter sends packets without any adaptation to the energy and data arrivals, so energy shortage can happen more often and the number of discarded packets increases. For high energy arrival rate, imperfect CSI has stronger impact since the energy has to be controlled in a smarter way and knowing the channel accurately is more required. In Fig. 14, we compare the percentage of discarded packets for different estimation times τ (expressed in % of T s ). For low data arrival rate λ d , increasing the estimation time leads to a better channel estimation, which slightly reduces the number of discarded packets since the impact of estimation error is high in this configuration (see Fig. 12). Nevertheless, after a certain threshold, for instance τ ≈ 5%, the number of discarded packets will increase because the remaining communication time of the slot is smaller. This leads to decrease the number of sent packets and so to increase the number of packets into the buffer, exhibiting thus more delay violation and buffer overflow. For high data arrival rate, we know that the estimation accuracy is not required (see Fig. 12). Therefore, increasing the estimation time directly decreases the performance since the system has less time for data packets transmission. In Fig. 15, we display the nature of discarded packets in percentage due to delay violation, buffer overflow and channel mismatch with different data and energy arrival rates. The number of discarded packets due to channel mismatch is significant for low data arrival rate because the delay violation or the buffer overflow can happen less often. However, for high data arrival rate, the number of packets discarded due to channel mismatch is negligible and the policy behaves approximately in the same way for perfect and imperfect CSI. Nevertheless, the imperfect CSI degrades the whole system (on the delay violation and buffer overflow) since a part of the time April 17, 2019 DRAFT slot is now devoted to perform the estimation rather than the transmission.

Delay 26%
Estim. Under imperfect CSI assumption, it is usual to allow packet re-transmission through an Hybrid Automatic Repeat ReQuest (HARQ) protocol instead of trashing the packet once sent [33]. But adapting our work to HARQ requires a huge modification of the MDP framework. Here, we just run our policy (the optimal one described in Section III) when ARQ and Chase Combining HARQ (CC-HARQ) protocols are carried out. The only modification is to keep the packet into the buffer by the end of the ARQ process instead of wasting it. So there is a trade-off between the higher probability for each packet to be correctly decoded at the receiver, the higher duration for the packet to stay in the buffer while waiting for the feedback, the higher energy consumed April 17, 2019 DRAFT for re-transmitting the packet. In Fig. 16, ARQ and CC-HARQ are implemented with at most two transmissions (one re-transmission is allowed only). When λ e is low, using ARQ and CC-HARQ is not efficient because re-transmitting the same packet twice consumes energy while it is not available in large quantities. However, when λ e is large, these two protocols significantly improve the performance by reducing the number of discarded packets due to imperfect CSI. Fig. 16: Percentage of the discarded packets versus data arrival rate with different energy arrival rates between perfect and imperfect CSI scenarios.

VI. CONCLUSION
We have addressed resource scheduling problem under energy harvesting capabilities with strict delay constraint and perfect CSI. More precisely, we have solved the packet loss optimization problem using MDP framework and dynamic programming techniques. The optimal policy adapted the number of transmitted packets according to the channel conditions, the available energy in the battery, and the battery level such that the number of discarded packets is minimized.
We have compared our proposed strict delay based policy with different variants of a naive policy and the state-of-the-art policy relying only on the average delay, showing significant savings in packet loss and energy consumption. Finally, we have evaluated the impact of imperfect CSI without and with ARQ protocols on the optimal policy in terms of additional packet loss due to April