Randomization of Data Generation Times Improves Performance of Predictive IoT Networks

—Input trafﬁc from Internet of Things (IoT) devices is often both periodic and requires to be received by a given deadline. This can create congestion at instants of time when trafﬁc ﬂowing from multiple devices arrives at a shared input port or gateway, resulting in missed deadlines at the receiver. As a consequence, scheduling techniques such as the “Earliest Deadline First” (EDF) and “Priority based on Average Load” (PAL) are used to schedule the ﬂow from different devices so as to try to satisfy the needs of the largest number of trafﬁc ﬂows in a timely fashion. In this paper, we propose the Randomization of ﬂow Genaration Times (RGT) in order to smooth the total incoming trafﬁc to the input port or gateway, on top of the use of EDF and PAL. We then evaluate the performance of RGT together with PAL and EDP, for trafﬁc load with a varying number of up to 6400 IoT devices. Our simulation results show that RGT provides signiﬁcantly better performance when added to EDF and PAL. Also, the additional computation required by RGT at each device can be quite small, suggesting that RGT is a very useful addition for improving the performance of IoT networks.


I. INTRODUCTION
The number of devices on the Internet is constantly growing, and expected to attain 30 billion by the year 2023 [1]. In addition, while the IoT is the key enabler for "smart cities" and Machine-to-Machine (M2M) communications [2], it is expected that the majority of devices on the Internet will be machine-type devices [3] and the usage of the IoT in smart cities has expanded to a wide range [4] [5].
Moreover, it is estimated that 52% of all the IoT will be comprised of devices that fall in the Massive IoT segment by the end of 2025 [6] where a large number of low-cost devices will exist within the coverage area of a single base station or gateway, causing access problems due to Physical Random Access Channel (PRACH) overload [3][7] also known as the "Massive Access Problem".
Another trend of research [25], [26], [27], [28], [29], [30] has suggested proactive network solutions, and other research has used the predictability of traffic generation patterns of IoT devices to address the Massive Access Problem under the "predictive network" framework. To this end, a recent paper [31] proposed the Joint Forecasting-Scheduling (JFS) system and the Priority based on Average Load (PAL) scheduling algorithm which allocates available resources to IoT devices based on forecasting the traffic generation pattern of each device for the case of single frequency channels.
In addition, in order to improve the performance of JFS and make it more applicable to longer time windows, a Multi-Scale Algorithm was proposed [32], and in [33] optimal scheduling and a Multi-Channel version of PAL was suggested to extend JFS for multi-frequency channels. The results in [31], [32], [33] showed that predictive networks with JFS are a promising for the solution to the Massive Access Problem, and that scheduling heuristics are crucial when optimal scheduling policies cannot be implemented due to their high computational requirements in practical circumstances.
In this paper, we propose an additional heuristic that we call "Randomization of Generation Times" (RGT) in order to improve the performance and decrease the optimality gap of scheduling heuristics for predictive networks. To the best of our knowledge, RGT could be implemented at each device and it is a novel pre-processing method that can be applied to any scheduling heuristic under the JFS system. In particular, we apply RGT to PAL and to the Earliest Deadline First (EDF) algorithm.
Our simulation results show that RGT improves the performance of each scheduling algorithm significantly, even when forecasting is conducted with high error, as well as in the case of zero-error perfect forecasting. Furthermore, we provide and estimate of the computation time of RGT to be under 2.5 µs per IoT device, and iindicate that it increases linearly with the number of devices. Thus we can conclude that RGT is an important addition that can facilitate the practical usage of JFS.
The rest of this paper is organized as follows. Section II provides background on the scheduling of IoT traffic as well as regarding the PAL and EDF algorithms. Section III proposes our RGT process. Section IV presents our results. Section V summarizes our main conclusions and sugget further research avenues.
II. SCHEDULING OF IOT TRAFFIC We now explain the scheduling problem of the IoT device traffic in predictive networks, and start with the JFS system proposed in [31] which is comprised of the Forecasting and Scheduling modules. We shall note that we evaluate the performance of the system for perfect forecasting and for the increasing forecasting error in Section IV. Between the Forecasting and the Scheduling modules, we insert a module that operates our RGT process that we will now define.
First, we define each burst j of any device i by the triple (r j , d j , a j ) where r j is the instant when it is generated, its deadline is d j and the number of bits is forwards in a particular burtst is a j . In addition, if c j denotes the maximum number of bits that can be transmitted by burst j in a single MAC-layer slot, and p j is the total number of slots p j needed to transmit burst j, we have: where R i is the data rate of device i which generated burst j, and τ MAC denotes the duration of a MAC-layer slot, while each burst j has a strict delay constraint ∆ j = d j −r j . Furthermore, we let N denote the total number of devices in the coverage area of IoT gateway and J denote the set of bursts that are generated by those devices. Next, in the JFS system, based on the output of the Forecasting module, the Scheduling module allocates the available resources to IoT devices for a given upcoming time interval, namely the scheduling window. That is, the inputs of the Scheduling module is the future traffic generation pattern and the output of that is the binary schedule matrix S, where S(j, m) = 1 if MAC-layer slot m is allocated for burst j and S(j, m) = 0 otherwise. We also let u j be a binary variable which equals 1 if burst j is successfully transmitted, and 0 otherwise.
Although a variety of scheduling algorithms can be used in the Scheduling module of JFS, in this paper, we only consider PAL and the EDF. Note that EDF is optimal [?], [34] in the sense that it will minimize the number of bursts whose deadline cannot be met. PAL aims to maximize the total number of bits in transmitted bursts, while EDF aims to maximize the total number of successfully transmitted bursts without considering the number of bits in each burst.

A. Priority based on Average Load (PAL)
The greedy scheduling algorithm PAL was proposed in [31] for resource allocation in predictive networks, and especially for the JSF system, and it schedules the bursts by prioritizing each burst j based on its average load over ∆ j .For each time slot m, we define the set of "active bursts": and the average load per burst: PAL schedules the bursts by prioritizing each burst j based on its average load over the duration ∆ j , and works as follows starting for m = 1: • It computes the burst j * with the largest average load: • Then for j * , PAL allocates the upcoming p j * slots starting with current slot m. accordingly.
As the definition of PAL suggests it is a non-preemptive algorithm, which means that it completes the transmission of each burst j which has been started, i.e., when the transmission of a burst has started, PAL waits for the completion of the transmission before it schedules another burst.

B. Earliest Deadline First (EDF)
We use the Earliest Deadline First (EDF) algorithm for the resource allocation for IoT. In this paper, we use a nonpreemptive EDF algorithm which is optimal to maximize the total number of successfully delivered bursts for our scheduling problem [35]. In short, the EDF algorithm works as follows: We first sort the bursts in J with respect to d j 's into vector J sorted . Then, for each j in J sorted starting with the first burst, if there are enough available slots to transmit j, EDF reserves the first p j available slots between r j and d j .

C. Performance Metrics
Throughout this paper, in order to measure the network performance, we define the following metrics: First, η denotes the cross-layer network throughput and is defined as Second, ζ denotes the fraction of the bursts that are successfully delivered, which is defined as Third, E denotes the transmit energy consumption per successfully delivered bit. In this paper, we assume that one unit of energy is consumed by an IoT device for each time slot at which the IoT device continues its transmission. Thus, E is defined as III. RANDOMIZATION OF THE GENERATION TIMES Since the heuristic algorithms are fast, inexpensive with respect to the computational hardware requirements and able to achieve relatively high QoS, those are promising for the scheduling of the IoT traffic in the predictive networks. Thus, we now aim to improve the performance of the heuristic scheduling algorithms. To this end, in this section, we propose Randomization of the Generation Times (RGT) which is a preprocessing on the predicted IoT traffic for the heuristic scheduling algorithms. The RGT process relieves the system by distributing the traffic generations over the scheduling window with duration of T sch . In this process, we update the generation time, r j , of each burst j via uniformly distributed random offset value as where A. Search for S j to Achieve the Performance Upper Bound In this method, we aim to determine the best value of S j so the upper bound for the scheduling performance under RGT. To this end, we search for the best value of the fraction γ of the range [0, ∆ j ]. That is, for each j ∈ J , we first define S j = γ∆ j , then search for γ in range [0, 1] as follows: 2) Set S j = γ∆ j ∀j ∈ J 3) Update r j for each j according to (6) 4) Schedule IoT traffic for the updated r j 's 5) Compute the performance metric (e.g. η, ζ or E) and save those in a vector 6) Update γ ← γ + 0.05 7) If γ ≤ 1, return to step 2); else, continue. 8) Find the best value of γ which leads to the best values of S j 's to maximize performance metric Note that in this search, we use each of η, ζ and E as the performance metric.

B. Estimation of S j
We now aim to calculate the estimation of S j based on the theory of the single server waiting line, M/M/1 queue, model. First, since our scheduling basically works on the required processing times of the bursts, we replace a single customer of M/M/1 queue model with a single required processing slot of a burst. Due to this replacement, in our system, the average number of serving at a slot, µ, equals 1.
Then, we let λ denote the average arrival rate of required processing slots and calculate that as We also know that the average waiting time spent in the system by a single processing slot, denoted by W , equals 1/(µ − λ). When µ = 1, Furthermore, for each burst j, we can estimate the waiting time that might be spent by that burst. To this end, we scale the average waiting time per required processing slot, W , for the total required processing slot by burst j, p j . That is, we calculate the estimation of the waiting time of burst j, denoted by W est j , as However, according to (7), if the network is highly loaded and the system resources are insufficient to transmit all bursts, λ will be greater than 1. Since λ > 1, W est j takes negative values, which is unrealistic and impossible in real-life. On the other hand, for the highly loaded networks, this issue is expected since (7) actually does not satisfy the assumption µ > λ of M/M/1 models. In order to prevent the case where W est j < 0, we are not able to increase the system resources but we may revise (9) as Considering it is estimated that j might wait for W est j slots in the system (including p j ), there should be at least W est j slots within ∆ j to transmit j successfully as long as W est j < ∆ j . Thus, S j is defined as Note that, under the case where the offered network traffic is higher than the resources so λ > 1 and S j = 0, RGT may randomly drop some of the bursts when d j − r new j < p j . This property of RGT will relieve the system.

IV. RESULTS
In this section, we evaluate the performance of each of the R-PAL and R-EDF schedulers under the predictive network. To this end, we use the IoT dataset [?] which is described in [33], comprised of the bootstrapped traffic generation patterns of real IoT devices whose traffic output belongs to one of the following traffic classes: Fixed Bit Periodic, Fixed Bit Aperiodic, Variable Bit Periodic, and Variable Bit Aperiodic.
In addition, for each burst j of each bootstrapped device i, the deadline takes one of the following six values ∆ j ∈ {0.5, 1, 2, 180, 600, 3600 (in seconds)}.
On this dataset, we measure the performance of the scheduling techniques for N = 12 devices as well as for integer values in the range[400 ≤ N ≤ 6400] with increments of 400 devices. For each N , we randomly select N/4 devices from each device traffic class. Thus in our simulations, the set of devices from each device class is composed of 25% of all devices that are simulated.
In addition, in order to increase the load of the system and evaluate the algorithms on a highly loaded network, for each device i in this dataset, we decreased the data rate R i by 30% (i.e. R i ← 0.7R i ). Furthermore, we set τ MAC = 100 ms, and performed the simulations for scheduling windows with a duration of 900 seconds.

A. Performance Comparison of Scheduling Algorithms under Perfect Forecasting
We now present the performance of each of the R-PAL and R-EDF algorithms and their comparison with PAL and EDF, and the upper bound of those with respect to each of η, ζ and E. In Figure 1, we see that the PAL-based scheduling algorithms significantly outperform the EDF-based algorithms for N > 4800 devices. The reason is that PAL aims to maximize the total number of bits in successfully delivered bursts while EDF aims to maximize the total number of successfully delivered bursts as described in Section II. We also see that both R-PAL and R-EDF aare able to achieve their upper bounds. Furthermore, within the PAL-base algorithms in Figure 1, we see that the R-PAL outperforms the PAL for N > 5400 devices and the throughput difference between R-PAL and PAL increases with N . As explained in Section II-A, the PAL algorithm is a greedy algorithm that schedules the job with respect to the generation times. This is why the RGT process improves the performance of the PAL significantly (about 0.15 for N = 6400 devices). Within the EDF-base algorithms (Upper Bound R-EDF, R-EDF and EDF) in Figure 1, we see that the R-EDT outperforms the EDT for N > 4400 devices; that is, the RGT process significantly improves the performance of EDT for N > 4400. In Figure 2, we see that the EDF-base algorithms are able to schedule more bursts for successful transmission than PALbased algorithms. The reason is that the EDF aims to schedule maximum number of bursts without considering the number of bits that is carried by that burst. On the other hand, PAL aims to maximize the total delivered bits. This shows that the EDF is a more fair algorithm than PAL throughout the devices in the network since its values all burst equally. In this figure, we also see that the ζ performance of the R-EDF and R-PAL are comparable with EDF and PAL respectively. In Figure 3, we see that each device consumed more energy to transmit a single bit under EDF-based algorithms than that under PAL-based algorithms. The reason is that PAL aims to schedule bursts with a higher number of bits; in other words, it maximizes the denominator of (5). Moreover, our results in Figure 3 shows that the RGT process significantly decreases the energy consumption of both PAL and EDF algorithms.

B. Sensitivity of Scheduling Algorithms to Forecasting Error
We now aim to evaluate the performance of the scheduling algorithms for the increasing forecasting errors in the predictive network. To this end, we model the forecasting error as the realization of a random variable from the Normal Distribution 1 . That is, for each burst j, the forecast number of bits,â j = a j + n, where n is the realization of the random variable from the Normal Distribution with zero mean and σ variance. In this subsection, we will analyze the network performance against the increasing value of σ which leads to the increasing forecasting error. In Figure 4, we present the network throughput η for the increasing value of σ so for the increasing forecasting error for N = 6400 devices. In this figure, we see that the RGT process significantly improves the network throughput performance of each of the PAL and EDF algorithm. That is, even while the forecasting error is very high compared to the maximum number of bits (σ = 60 and max j∈J a j = 128), RGT has impact on the scheduling performance. On the other hand, we see that the performance difference between R-PAL and PAL as well as the difference between R-EDF and EDF decreases as the forecasting error increases. In practice, since it is shown that the IoT traffic is predictable with acceptable forecasting error [24], the throughput performance of the algorithms at σ = 20 can be considered as their performance under an average performing forecaster (where the σ is higher than the 10% of the maximum generated bit over all devices).

C. Computation Time
We present the computation time of the RGT process for increasing number of devices. To this end, we measure the computation time of RGT on MATLAB on a Laptop with Intel Core i7-10750H cpu and 16 GB ram. In Figure 5, we now present the mean of the computation time with a standard deviation bar for each N over 100 simulation runs. In this figure, we see that the computation time of RGT is under 0.015 seconds for all values of N and increases linearly with N . Thus, the computation time cost of RGT to scheduling algorithm is very small but the performance improvement is relatively high.

V. CONCLUSION
In this paper, we propose the Randomization of Release Times (RGT) process for the scheduling heuristics to allocate the resources in a predictive IoT network. The RGT can be implemented as a preprocessing algorithm on any scheduling heuristics to distribute the load of the network over a time window. In this paper, we evaluate the performance of the RGT process under each of the Priority based on Average Load (PAL) and Earliest Deadline First (EDF) algorithms for the IoT network with the increasing number of devices up to 6400 devices. Our results showed that the RGT process significantly improves the performance of both PAL and EDF heuristics in terms of the QoS metrics (throughput and energy consumption) while the fairness of each heuristic remains almost the same. In addition, we showed that the computation time of RGT is under 15 ms for 6400 devices and increases linearly with the number of devices. That is, the RGT is a fast and effective preprocessing algorithm that significantly improves the performance of the scheduling algorithms for IoT. Thus, since RGT enables to achieve much higher performances via fast heuristics, it will pave the way to include the devices with much lower delay constraint into predictive IoT networks.