Trafﬁc-Driven Sounding Reference Signal Resource Allocation in (Beyond) 5G Networks

. Abstract —Beyond 5G mobile networks have to support a wide range of performance requirements and unprecedented levels of ﬂexibility. To this end, massive MIMO is a critical technology to improve spectral efﬁciency and thus scale up network capacity, by increasing the number of antenna elements. This also increases the overhead of Channel State Information (CSI) estimation and obtaining accurate CSI is a fundamental problem in massive MIMO systems. In this paper, we focus on scheduling uplink Sounding Reference Signals (SRSs) that carry pilot symbols for CSI estimation. Under the large number of users and high load that are expected to characterize beyond 5G systems, the limited amount of resources available for SRSs makes the legacy 3GPP periodic allocation scheme largely inefﬁcient. We design TRADER, an SRS resource allocation framework that minimizes the age of channel estimates by taking advantage of machine learning-based short-term trafﬁc forecasts at the base station level. By anticipating trafﬁc bursts, TRADER schedules SRS resources so as to obtain CSI for each user right before the corresponding trafﬁc arrives. Experiments with extensive real-world mobile network traces show that our solution is efﬁcient and robust in high load scenarios: with respect to a round robin schedule of aperiodic SRS, TRADER provides more often CSI within the coherence time (up to 5 × for given scenarios), leading to channel gains of up to 2 dB.


I. INTRODUCTION
As the fifth generation of mobile networks (5G) is now being deployed worldwide, opportunities to further improve data rates, number of simultaneously connected devices, reliability and latency start being investigated for beyond-5G systems. The increasing complexity of mobile architectures makes selfconfiguration and self-optimizationkey traits, enabled by datadriven management and control of network components [1].
In this work, we apply a data-driven approach to the specific problem of efficiently configuring signals for channel state information (CSI) estimation in massive MIMO systemsa crucial technology to scale the capacity of beyond-5G mobile networks. Indeed, acquiring accurate CSI between each transmitter and receiver antenna pair is paramount to fully exploit the potential of massive MIMO where at least 64 antennas per base station are used [2]. However, Channel Quality Indicator (CQI) reports are either too coarse (when single reports are used for the entire band), or they come with a prohibitively large overhead (when performed on each subband) under massive MIMO [3]. Instead, for TDD systems with channel reciprocity, uplink CSI is also valid for the downlink, and uplink SRS pilots are an effective solution to obtain accurate CSI for downlink MIMO precoding. Therefore, when and for which user to schedule SRS resources becomes a crucial problem in future massive MIMO systems.
In the high-load scenarios with hundreds of active users at a time that are expected to characterize beyond-5G networks, the periodic SRS scheduler defined by 3GPP [4] can be inefficient. On the one hand, the available SRS resources are insufficient to guarantee that each user equipment (UE) can be scheduled continuously, and the base station (BS) often has to use stale CSI. On the other hand, not all active UEs have traffic over subsequent frames, and a periodic allocation risks to waste substantial SRS resources for UEs with no traffic to be served.
To make the best use of SRS resources, one should schedule SRSs just before a burst of traffic starts, as well as during a burst as frequently as possible. This requires operating directly at the level of the BS scheduler, where the only information available is the Modulation and Coding Scheme (MCS), Physical Resource Blocks (PRB) and Transport Block Size (TBS) at each Transmit Time Interval (TTI). Such traffic allocations occur irregularly over very fast timescales of tens of milliseconds, and must be accurately predicted for every user independently. However, current solutions proposed in the vast literature on mobile traffic forecasting typically target much lower time granularity, in the order of minutes to hours [5], [6]. Even fine-grained prediction mechanism like LinkForecast [7] and PERCEIVE [8] still aim at predictions over hundreds of milliseconds to seconds. The only model considering timescales aligned with our needs operates on traffic that is aggregated at the BS level rather than the much more challenging case of per-user traffic [9].
We propose TRADER, a TRAffic-DrivEn Resource allocation framework that employs per-user TTI-level traffic forecasts to inform an aperiodic SRS scheduling -an alternative to the default periodic approach specified by the standard [4]. This aligns with ITU-T recommendations for ML inclusion in network operational lifecycle [10]. Specifically, TRADER strives to minimize the age of channel estimates to ensure both frequent channel measurements within bursts and to anticipate the start of a burst with an SRS transmission. To this end, TRADER takes advantage of the time series prediction capabilities of Long Short-Term Memory (LSTM) neural networks, by feeding them with traffic allocation information on the burst size, duration and gap (i.e., the idle time between subsequent traffic allocations) generated by individual users.
We evaluate the performance of TRADER by using real-world LTE traffic traces, which we collect from production BSs of two different major European mobile network operators. Specifically, we use passive measurement tools based on software-defined radios (SDRs), FALCON [11] and OWL [12], to decode the unencrypted information of the Physical Downlink Control CHannel (PDCCH) of LTE in a fully privacypreserving manner. Our results show that TRADER largely outperforms a round robin schedule of aperiodic SRS, as it is capable of triggering more often an SRS right before the user generates traffic. For example, while a round-robin heuristic provides a median of 10 frames of anticipation, TRADER is able to lower this number down to 2, which is the minimum possible value. Thanks to this, TRADER provides more often CSI within the coherence time (up to 5× more for given scenarios), and achieves channel gains up to 2 dB.

A. Sounding Reference Signals
To obtain CSI, mobile networks rely on the CSI-RS for the downlink and on the SRS for the uplink. In time division duplexing (TDD) systems with channel reciprocity, uplink SRS feedback can also be used to estimate the downlink channel [3]. In line with both LTE and 5G specifications, we consider 10 ms TDD frames subdivided into subframes of 1 ms, and 14 OFDM symbols per subframe for a 20 MHz channel. Uplink and downlink share the same frequency band, hence the switching delay between transmission and reception is accounted for by using a guard period. Depending on the uplink and downlink switching periodicity, up to 7 types of TDD frame structure configurations can be used (see Table 4.2-2 of [4]). The guard period is announced in a special frame, along with the Downlink Pilot Time Slot (DwPTS) and the Uplink Pilot Time Slot (UpPTS), whose main purpose is to carry pilot signals including the SRS. Specifically, SRS signals are transmitted during the UpPTS and can span 1, 2 or 4 symbols mapped to the last 6 symbols of the subframe.
The SRS is configured at the Radio Resource Control Layer (RRC) and can be periodic or aperiodic [4]. Periodic SRS is scheduled with a periodicity that ranges from 2 ms to 320 ms. In contrast, an aperiodic SRS has to be actively triggered for each occurrence. In addition, other parameters can be configured for the SRS. The cyclic shift allows to send multiple orthogonal signals, e.g., a cyclic shift equal to 4 allows the BS to configure 4 UEs in the same subframe. The transmissionComb parameter is a flag defining whether SRSs are transmitted in every even or odd subcarrier; it also provides the BS with the capability of multiplexing two UEs by assigning them the same cyclic shift, frequency and time resources, but different transmissionComb.

B. Dataset
For our study, we collect a dataset of LTE traffic allocations from multiple BSs located in different areas of Madrid, Spain. For completeness, we run both the SDR-based LTE sniffer tools FALCON [11] and OWL [12] on a Linux laptop connected to a USRP B210 to decode the unencrypted information of the TTI-level traffic allocation that LTE BSs send to the UEs over the PDCCH channel. Specifically, we gather the temporary user ID (C-RNTI), the ID of the frame containing the traffic allocation for the C-RNTI, and the associated transport block size (TBS). This information is sufficient to determine the size and duration of per-user traffic bursts and idle times between transmissions, i.e., the gaps. From the collected data, we filter out background traffic by removing RNTIs that have less than 5 active TTIs over the entire activity period; we also discard RNTIs reserved for random access (RA-RNTI with ID 1-960), paging and system notification (P-RNTI with ID 65534), and broadcast system information (SI-RNTI with ID 65535). Fig. 1 shows our measurement data. The plots illustrate the number of UEs that are simultaneously connected with active or inactive RRC states to two BSs. The left plot refers to a BS monitored for 3 h with OWL, and covering a typically crowded touristic area. The plot on the right concerns a BS monitored for over 13.5 h with FALCON, and covering a quiet residential area. Note that the user count takes into consideration that the RNTI is a temporary ID: when the time elapsed from the last transmission exceeds the RNTI refresh timer interval of 10.15 s, the UE 1 performs an RRC re-establishment to obtain a new C-RNTI, without affecting the number of users.

C. Motivation
The scenarios portrayed in Fig. 1 illustrate how scheduling SRSs periodically can be highly inefficient: in both BSs, the number of UEs that can be potentially scheduled to transmit largely exceeds the capacity of the periodic SRS. 2 What is needed is a more sophisticated mechanism to trigger SRSs at the right moment in time and for the right users, so as to maximize the usefulness of SRS resources. More precisely, if a traffic burst starts at frame t, an SRS triggered at t − 2 would be optimal, leaving time for the BS to inform the UE through control information (t − 2), receive the feedback (t − 1), and exploit the fresh CSI (t).
However, the bursty and heterogeneous nature of user traffic patterns at millisecond granularity makes achieving an ideal SRS schedule particularly difficult. Fig. 2 shows the distribution of gaps in the traffic generated by individual users during the 3 h dataset in Fig. 1: in 37% of the cases, the gap is lower than 5 LTE frames (i.e., 50 ms), and in 50% of the cases the gap is less than 27 frames; however, in the remaining 50% of the cases, the gap sizes can take any value between 27 and 1015 frames (the latter matches the RNTI refresh timer duration). Such distribution calls for a framework capable of providing accurate forecasts at the frame level that can steer the decision mechanism of SRS resource allocation. Our solution tackles precisely such a problem, using machine learning to anticipate per-user traffic gaps, and then leveraging such information to self-configure aperiodic SRSs to the most appropriate users. 16  Our approach, which improves scalability and avoids wasting SRS resources, is detailed next.

III. TRAFFIC PREDICTION AT FRAME GRANULARITY
We start by discussing how to forecast user-level traffic at single-frame 10-ms resolution.

A. Forecasting Model
Classical network traffic prediction aims at anticipating the volume of data transmitted in the next time slot(s). However, the nature of per-user traffic at the very short timescales we target is inherently hard to forecast: transmissions occur in rapid bursts that are irregularly spaced in time, as exemplified by Fig. 3 for five users in a sample from our real-world measurements. Moreover, the motivation set out in § II sets our forecasting problem apart from traditional traffic prediction: instead, we aim at learning when the next traffic burst generated by a user will start. We thus design input features and a neural network architecture tailored to that specific objective, as follows. Input features. We use three input features, illustrated in Fig. 3, that can be easily collected on a per-user basis by analyzing transmission events for each registered active device.
• A gap (g -measured in ms) represents the time between two consecutive transmissions of the same user, and inherently indicates when the next transmission will take place. Since two frames are enough to trigger an SRS on the channel, gaps correspond to user silence periods above 20 ms. As users disconnect after 10.15 s of inactivity, gaps are upper-bounded by this value.
• The burst size (bs -measured in Byte) is obtained by summing all TBS generated by a user between two subsequent gaps. The resulting aggregate describes the total volume of traffic generated by a user, continuously over time.
• The burst duration (bd -measured in ms) is defined as the time interval elapsed between the last and the first g bs bd . . . frame with non-zero TBS within the corresponding burst size, i.e., how long a burst lasts in time.
Neural network architecture. The input features above are fed to a deep neural network. We experiment with two configurations. In a first case, we employ hidden Stacked Long Short-Term Memory (LSTM) layers with multiple memory cells [13]. The rationale for this design is that LSTM recurrent neural networks (RNN) are known to perform well with time series [14], and our prediction problem can be seen as an instance of (per-user) time series forecasting, where samples map to gaps, and the goal is anticipating the magnitude of the next gap. The leaky version of the Rectified Linear Unit (ReLU) is used as the activation function for neurons in the LSTM layers, with a negative ReLU slope coefficient of 0.01 to avoid the dying ReLU phenomenon [15]. As a second option we use a pure regression model, i.e., a standard feed-forward Multi-Layer Perceptron (MLP). This neural network has similarly been widely applied to time series forecasting [16], although it does not have the memory properties of LSTM. In this case we use a standard ReLU activation function since no dying ReLu problem was faced with MLP.
In both LSTM and MLP models, we experiment with different RNN depths, by stacking LSTM or fully connected layers on top of each other. Such layers are followed by a hidden fully connected layer and by an output layer with a single hidden unit for the actual prediction. We also test various configurations of the input layer, by varying the number of hidden units from 32 to 256 and halving the number of neurons at each subsequent layer.
Dropout layers are interleaved with the last two hidden layers to avoid overfitting during training [17]. Based on extensive experiments, we set the dropout layer rates to 0.3. In fact, an unconventional design choice we make is to keep dropout layers active also during testing, and perform multiple concurrent forward passes on the same test data. This returns, for each forecast instance, a distribution of predicted burst gaps instead of a single output; the mean (µ) and standard deviation (σ) of the distribution approximate those obtained via Bayesian inference in (computationally prohibitive) deep Gaussian processes, as proven by recent findings in deep neural network operation [18]. The mean and deviation provide a richer information than usual univariate prediction, and we take advantage of them for SRS scheduling. Specifically, positive errors in the prediction cause the SRS to be delayed after the start of the user transmission, and hence are not available to estimate the channel state when needed, which leads to a substantial performance degradation. This is a much more severe situation than that entailed by negative errors (of comparable magnitude), which cause an anticipation of the SRS with respect to the ideal allocation, and hence a less up-to-date (but usable) CSI. To minimize the chance of positive errors, we use the uncertainty expressed by the deviation as a safety margin: the predictor returns the mean minus the standard deviation of the predicted next gap for each forecast instance. Fig. 4 summarizes the architecture for the case of a threestacked LSTM that employs 256 neurons in the first layer. The MLP model can be obtained by replacing the LSTM layers depicted in Fig. 4 with fully connected layers. Model training. The deep neural network receives N past observations of the three input features, i.e., gap, burst size and burst duration, and aims at forecasting the following gap, so as to inform TRADER about the expectation (and uncertainty) of when the next transmission of a given user will take place.
In order to ensure the highest prediction accuracy, we test and compare different loss functions. We consider the wellknown MAE and MSE, but also the Pinball-Loss (PL) proposed in [19]. Formally, the three functions are defined as: where x and x are the predicted and observed values respectively and κ is the conditional quantile: for instance, κ = 0.5 is an estimator of the conditional median. While MAE and MSE are legacy loss functions used for RNN training, the rationale for experimenting with a Pinball-Loss function is that we aim at minimizing overestimation: as explained above, avoiding positive errors is key for SRS scheduling, hence we set the Pinball-Loss threshold to κ = 0.2 to favor underestimation. In all variants, the model is trained using the popular Adam optimizerwith a learning rate of 0.001 during 120 epochs.

B. Prediction Accuracy
We assess the performance of the proposed deep learning predictor with the FALCON dataset (see § II-B). We use a standard 80:20 training-testing ratio to split the data, as highlighted by the different backgrounds in Fig. 1. We thus use 3, 041, 118 <g, bs, bd> tuples to train and validate the neural network architecture. Then, we test the model over 760, 280 new observation samples. Comparative evaluation. We compare the forecast accuracy in an extensive set of cases, by varying the architecture, loss function, number of stacked layers, neurons per layer and drop rates. We start by analyzing the performance of LSTM and MLP using MAE as the loss function. Fig. 5(a) summarizes the results, portrayed as the cumulative distribution of the error (with = x − x) incurred by each model during testing. We restrict the error interval to [−300, 300] ms as we are particularly interested in short-term predictions, and since the vast majority of the recorded deviations are in that range. A perfect predictor would yield a step function in = 0: the closer to this ideal performance is the curve, the better the forecasting model. Also, positive errors should be avoided as much as possible because, as explained before, they are particularly problematic for SRS allocation. The legend denotes LSTM with solid lines and MLP with dashed lines, and in round brackets the number of neurons followed by the drop rate. Varying all these settings did not produce a clear winner, which we ascribe to the extremely noisy nature of the input data, which does not present easy-to-learn patterns. As a result, all the predictors produce forecasts that are primarily based on inference of the statistical distribution of the burst gaps, and any neural network configuration suffices to that end.
We then analyze the quality of the gap forecast under different loss functions. In order to provide a reference for the results obtained with MAE, MSE and PL, we consider two simple statistical predictors: (i) a static forecast (MED) corresponding to the median of the gap distribution observed over the training dataset, and (ii) a naive model that uses the last value (LV) of observed gap as the prediction for the next gap. Other than serving as baselines, these benchmarks help us understand if a deep learning approach is strictly necessary, or if much simpler options are also valid. Fig. 5(b) summarizes the results for LSTM with 1 stacked layer, 64 neurons and drop rate 0.3. We observe that the predictor using MAE as the loss function outperforms all other predictors including the simple ones, with 68% of the cases falling within the interval [−100, 100] ms and 91% of the cases falling within the interval [−300, 300] ms. This allows to operate sufficiently well in the regime of interest, i.e., with the small gaps seen in Fig. 2, whose accurate prediction can inform decisions on whether to trigger an SRS or not. For longer gaps, such a decision can be revisited later with lower urgency. MAE is more robust to outliers than MSE, and this is the reason for its better performance. PL with κ = 0.2 favors underestimation: this can be appreciated in the curve, as the incidence of negative values of is very small. However, the error curve is shifted from = 0, which means that the price paid in terms of underestimation is excessive, and would not be of help for SRS allocation. 3 The MED estimator drops to zero overestimation for all those gaps bigger than the observed median, which is desirable, but the accuracy of underestimations is significantly lower than MAE. As a result, the gain obtained by preventing overestimation is achieved at the cost of a lower accuracy in the region for which we are interested in obtaining accurate predictions. Finally, LV performs similarly to MAE for positive values of , but its accuracy in overestimation is significantly worse, which justifies building a learning model for this problem.
Model parametrization. Having confirmed that a simple deep learning architecture trained with a MAE loss function outperforms other approaches, we investigate how its parameters affect the quality of the forecast.
We first evaluate the impact of different input lengths, considering 20 and 60 previous samples fed to the model. The minimum number of past observations required by the LSTM network has been set to 20 since, after several experiments, such length has proven to be effective in obtaining an accurate prediction. Furthermore, we decided to increase the history sequence length up to 60, in order to evaluate the impact of a larger number of past observations on the prediction accuracy. Fig. 5(c) shows that the accuracy of MAE is very similar with 20 or 60 samples: there are minor differences in the interquartile range, in favor of a history of 20. We also include the results for the other predictors, MSE and PL for which the median of the distribution is not centered in zero.
As a second test, we assess the impact of the depth of the neural network structure in the forecasting performance. To this end, we compare LSTM architectures with a varying number (1, 2 or 3) of stacked layers, and different numbers of neurons per layer. Fig. 5(d) does not show relevant performance differences, which makes us favor the simpler variant with 1stacked layer and 64 neurons because of the better accuracy in over-estimation.
Summary. Based on extensive experiments, the gap-burst time series under analysis do not fall into the category for which LSTM models perform exceedingly well, most likely because of the highly irregular timings of transmission bursts at the LTE frame granularity. Still, a very simple 1-stacked LSTM model with 64 input neurons can learn the basic statistics of the gap distribution better than simple predictors, and is computationally less demanding 4 than MLP architectures. In the light of these results, we assist the SRS scheduling with a MAE predictor, configured with a 1-stacked LSTM layered architecture with 64 neurons, fed with 20 past samples.

IV. THE TRADER RESOURCE ALLOCATION STRATEGY
We now discuss the aperiodic SRS resource allocation problem. The objective is to minimize the age of the channel estimate, i.e., the time elapsed between the last SRS that was scheduled for a user and its next traffic allocation. Call U the set of RRC connected UEs, where u denotes a UE, u ∈ U. Let t be the current time (or equivalently the current frame number) and f u be the (unknown) LTE frame that carries the next traffic allocation for u (with t < f u ). f u occurs after a gap τ u from the last traffic allocation for u that we denote with z u , i.e., f u = z u + τ u . Let R be the amount of SRS resources available in each TDD frame and let The time of the most recent SRS triggered for UE u, l u is: At time t, the objective is to allocate SRS resources r u,t to minimize the maximum age of channel estimate: provided that the future allocation is not far ahead in time than 2 frames and without exceeding the R SRS signals available in each frame: At the present time t, the arrival of future traffic f u is unknown (recall that f u > t). We thus leverage the forecasting methodology presented in § III to obtain an estimate f u of such time. The term f u can not be directly estimated, but our gap prediction methodology allows to estimate τ u and given that the former traffic allocation z u is known, we can compute f u = z u + τ u . We recall that our deep learning model returns both the expectation (here denoted by τ u ) and the standard deviation (σ u ) of the inter-burst gap, via multiple concurrent forward passes with active dropout layers. We leverage the standard deviation as a measure of the uncertainty of the prediction, and set f u = t + τ u , where τ u = τ u − σ u : in this way, a safety margin informed by the model uncertainty anticipates the SRS, and limits the chances that an incorrect forecast triggers the SRS too late with respect to the actual traffic arrival. We then reformulate (6) as: Algorithm 1 summarizes the workflow of TRADER. At time t, the output of the algorithm is a set of UEs X ⊆ U for which an SRS is triggered. X depends on the amount of resources R available each frame. We store in U the information about the last triggered SRS l u , the corresponding age of estimate a u and the last frame with traffic allocation z u for each active UE. The latter component allows determining whether the current UE should be removed from the set of active UEs or not. For this, we compute the age of last transmission as the difference between the current frame t and z u , and, if this difference exceeds T , u is considered not eligible for allocation (line 3).
A new UE joins the system if, during t, there is traffic allocated for him/her. As no past SRS is available in this case, l u = ∅ and we include in U a new tuple < a u , l u , z u > with infinite age for the last estimate and last SRS; the current frame is set as the last frame with traffic allocation (lines 4-5). Then, u acquires the highest priority to be scheduled in the current round. For UEs already in the system and for which predictions are available (i.e., with at least N past samples), we compute the age of the last estimate as the future allocation f u or the current frame t with traffic. The future allocation is given by the gap estimate τ computed from the last traffic allocation z u (lines [8][9][10][11][12][13]. If the future allocation is two frames ahead from t, the SRS should be scheduled at time t. Otherwise, there is no urgency and the SRS can be scheduled later on (line 9). We then sort U in descending order of a u (line 19) and schedule the first UEs in the set until the available R resource units are used. This builds the schedule X and for these users we update the information in U by setting the age of estimate to 0 and the last scheduled SRS as the current frame t (lines 20-24).
We benchmark TRADER against a simpler Round Robin heuristic which does not make use of predictions. Round Robin's workflow is also described in Algorithm 1 with the exception of the if condition (lines 7-11). During each iteration, Round Robin computes the age of the estimate of u with respect to the current frame t and schedules UEs accordingly. When no predictions are available for the reasons discussed above, TRADER falls back to Round Robin and blindly triggers SRS for those UEs that currently have the highest age.

A. Methodology and Metrics
We benchmark TRADER by comparing its performance against both the Round Robin heuristic and an Oracle. The latter is an omniscient predictor with perfect knowledge on future traffic allocations, including the gaps and bursts.
For a fair comparison, we evaluate TRADER, Oracle and Round Robin under the same network conditions, i.e., we do trace-driven simulation using as input the test sets of the LTE traces (see § II). For the number of SRS resources for each 10 ms frame we consider 64, 32 and 16 signals. 64 represents the number of SRS signals at disposal when the TDD system

Algorithm 1 TRADER Resource Allocation Algorithm
Input: T RNTI refresh timer, R SRS resources per frame, U set of connected UEs Output: X set of UE with SRS triggered in t Var: au age of last estimate, lu last SRS, zu last traffic allocation 1: for each u ∈ U do 2: if t − zu < T then Check RNTI Refresh expiration 4: if lu == ∅ then New UE with no previous SRS 5: au ← ∞, lu ← ∞, zu ← t 6: else Existing active UE 7: if prediction available then 8: if fu − t = 2 then fu requires SRS at t 10: au ← fu − lu ar ← 0, lr ← t 23: U [r] ←< ar, lr > 24: end for works with a MIMO configuration of 2T4R (2 Transmitter and 4 Receiver), transmissionComb equal to 2 and cyclic shift equal to 4. Such setting allows to assign 8 SRS signals in one symbol. Within a 10 ms timeframe, the number of symbols available for SRS is 8, which leads to a total of 64 SRS signals.
For evaluation, we focus on the following three metrics.
Delay: given our objective of obtaining fresh CSI as often as possible, this metric measures how close an SRS s is with respect to the frame t that carries actual traffic for the UE. The best result possible happens when t − s = 2, which means that the SRS was triggered timely.
Allocation within coherence time: this metric measures how often the SRS is triggered within the channel coherence time T c of data traffic. During T c , the channel is assumed to be constant, thus gathering a channel estimation with SRS within T c yields the highest utility. T c can be approximated as T c = 9/16πf max , where f max is the maximum frequency shift that is proportional to the velocity v of the UE and the carrier frequency f c : where c is the speed of light [20]. Impact on Received Signal Power (RSP): this metric quantifies the gain that TRADER provides over Round Robin.
To compute the two last metrics, we employ the well known QuaDRiGa channel model [21] and set up three different urban scenarios. For each one, the UE moves for 100 s with the same sequence of speed variations (the maximum speed is 10 m/s) and stops in two parts of the trajectory. We characterize both Line of Sight (LoS) and Non-Line of Sight (NLoS) propagation by using the 3GPP_3D_UMa_LOS and 3GPP_3D_UMa_NLOS, where UMa stands for Urban  (2) where the BS is closer to the same UE's trajectory and the NLoS period corresponding to second 10 of the trace is missing. As a result, the trajectory of Scenario (3) is characterized by relatively high RX power for the first part until second 40. We sample the channel coefficients with a granularity of 10 ms, corresponding to the frame duration.

B. Results
Delay. Fig. 7 compares Oracle, TRADER and Round Robin and shows CDF curves for the delay for the two datasets and for different resources reserved for SRS in each 10 ms frame. Obviously, by lowering the amount of SRS resources, the UEs obtain SRS on average less frequently, depending on the number of simultaneously active UEs (see Fig. 1). As a consequence, the tail of delay assumes the highest values when the amount of available resources is the lowest. With 64 SRS per frame (see Fig. 7(a)), TRADER provides intermediate gains between Round Robin and Oracle. TRADER triggers more often SRS exactly with 2 frames of delay (in 50% of the cases) than Round Robin. With 32 SRS each 10 ms (see Fig. 7(b)), we observe that TRADER obtains higher gains over Round Robin with respect to the former case: in nearly 50% of the cases the delay is 2 frames for TRADER and 10 frames for Round Robin. However, we note that for a delay of 15 frames or more, Round Robin marginally outperforms TRADER. By spending SRS close to when the predictions occur, TRADER subtracts resources from the one or more UEs that should have been scheduled according to the principle of minimization of the age of last estimate. These have to be re-scheduled later, thus increasing their delay. Such behaviour  Fig. 7(c)). Here the tradeoff that TRADER enforces in obtaining a better delay on the short term at the expense of some degradation at longer delays is more evident. Fig. 8(a) evaluates the delay of TRADER and Round Robin for the 13.5 h long dataset. As the results are in line with what we obtained for the 3-h long trace, we only provide the result using 64 SRS signals each 10 ms. Fig. 8(b) instead compares the results obtained with the different traces. Given the different traffic patterns (see Fig. 1), such a comparison provides an insight into the gains of TRADER in such different scenarios. We observe that in the longer trace, TRADER guarantees more often the lowest delay at the expense of a longer tail (see the outliers). Vice versa, on the shorter trace, with TRADER the tail of long delays is considerably reduced.
Allocation within T c . We set f c = 2 GHz and determine T c according to the velocity v shown in Fig. 9(a). Fig. 9(b) shows the duration of T c , data traffic and SRS allocation for TRADER and Round Robin. TRADER triggers SRS during coherence time more often than Round Robin especially during mobility when the coherence time is shorten than when the UE stands still. We crosscheck for all the 9250 UEs of the 3 h long trace: Fig. 9(c) confirms that this result holds in general.
Impact on RSP. We now assess the channel gains due to the reduced SRS age. Fig. 10 shows that TRADER outperforms Round Robin (we include mean and 95% confidence intervals). Specifically, for the two schemes we use the minimum and maximum frame delay from the above results for each of the resources considered (64, 32 and 16 signals for each 10 ms frame). For example, in Fig. 7(a)) the maximum delay is 3 frames and the minimum is 1 frame. Then we compute |c TRADER (t) − c Round Robin (t)| for each instant t of the channel profiles (see Fig. 6(b), 6(c) and 6(d)). Fig. 10(a), 10(b) and 10(c) show the results for the OWL dataset. We observe that the average channel gains increase as the amount of SRS signals available each 10 ms frame diminish. Fig. 10(a) shows that for a simple scenario with linear trajectory the variability in channel gain is minimal. The difference of the means obtained with minimum and maximum frame delay spans from 0.4 dB to 0.6 dB (for 64 and 16 respectively), while for Scenario (3) the differences are 0.3 dB to 0.8 dB (for 64 and 16 respectively). The highest gains occurs for Scenario (2) which is the most complex one.
VI. RELATED WORKS Relevant to our work are studies on the prediction of mobile network traffic, and on the allocation of SRS resources. We discuss hereafter the novelty of TRADER with respect to previous efforts in those two areas. Mobile network traffic prediction. Real-time data traffic forecast are a paramount input to emerging strategies in traffic engineering and resource management that take advantage of the increasing virtualization of mobile networks, as repeatedly demonstrated by many recent studies [22], [23], [24]. In this context, traditional models relying on information theory [25], Markovian [26] or autoregressive [27] approaches have been supplanted by deep learning architectures.
Specifically, a variety of neural networks have been proposed to predict the traffic demands observed at individual BSs [5], [28] or antenna sectors [29], possibly over long time horizons [6], and separately across mobile services [30]. All these solutions aim at estimating traffic volumes over timescales in the order of minutes, whereas our focus is on much faster dynamics at the millisecond level.
The literature on network state prediction at the short timescales we target is much thinner. Previous research has mostly proposed deep learning predictors that operate on time steps of seconds or less for physical layer indicators, such as bandwidth [7], transmission inactivity [31], signal strength [32], uplink throughput [8], Physical Resource Blocks (PRBs) [33], or channel state [34].Unlike these works, we are interested in user-generated traffic, and not physical layer properties.
The study that is the closest to ours in spirit aims at mobile traffic volume forecasting over time steps of tens of LTE Transmission Time Intervals (TTI), i.e., tens of ms, by using LTE Physical Downlink Control CHannel (PDCCH) information [9]. However, neither this research nor any of those listed before addresses the problem of anticipating traffic burst duration or gaps, as done by our forecasting model. SRS allocation. Standardization bodies are still debating on the amount of dedicated resources and allocation strategies for SRS. In the light of such uncertainty, several recent works have investigated optimizations to the use of SRS when there is a very large number of users for which channel estimation must be performed in parallel. The approaches proposed to date have tackled the problem from different angles.
Some studies have focused on coordinating uplink SRS allocation across neighboring base stations to mitigate pilot contamination, by using fractional reuse [35].
A different line of research has considered enhancing Channel Estimation Capacity from SRS information, e.g., by using deep learning tools, hence inherently improving the utilization of SRS sequences [36]. Also, multi-user grouping has been proposed as a strategy to minimize SRS resource requirements, by bringing together users based on channel state information [37], or channel correlation [38].
However, none of the works above addresses the problem of anticipatory SRS allocation based on per-user traffic prediction. Further, all the above works use periodic SRS while both TRADER and Round Robin mechanisms use aperiodic SRS which makes all previous studies fully orthogonal to ours.

VII. CONCLUSIONS
For future mobile networks, obtaining fresh and accurate CSI is especially important. Technologies such as massive MIMO impose a high overhead to acquire CSI, which scales with the number of antenna elements involved in the measurement. In this work, we tackle the problem of efficient configuration of SRS pilots that are used to obtain accurate CSI for downlink MIMO precoding in TDD systems. Specifically, we design TRADER, a TRAffic-DrivEn Resource allocation framework that leverages per-user ms-level traffic forecasts for aperiodic SRS scheduling. TRADER uses an LSTM network to predict the idle time between subsequent user traffic allocations that are essential to determine a good SRS schedule. We extensively evaluate the performance of the predictor and of TRADER with real world mobile data. Unlike an aperiodic round robin heuristic, the trace-driven simulations show that TRADER is able to trigger more often SRS right before the actual traffic and performs well in high load scenarios like the ones tested. The more accurate scheduling of SRSs close to future user traffic translates into channel gains of up to 2 dB.