Wideband Spectrum Reconstruction with Multicoset Sub-Nyquist Sampling and Collision Classification

This paper proposes an improved method for reconstructing wideband sparse spectrum. We utilize a multicoset setup based on time delay. The simple multicoset setup is more suitable for practical implementation in comparison to more sophisticated sub-Nyquist systems. We first introduce the general reconstruction model that solves for a fixed number of variables. We employ a simple machine learning technique to classify the aliased sub-Nyquist bins into two categories. The classification method reduces the reconstruction time by decreasing the number of combinations and variables needed for resolving the signals. The saving in solution time is significant at low occupancy levels. Furthermore, the approach is robust against higher noise levels, because although the classification accuracy decreases as SNR decreases, the reduction in the accuracy of the classifier does not adversely affect the overall detection. We define detection performance metrics and provide simulation results to demonstrate the effectiveness of our approach.


I. INTRODUCTION
A key challenge for spectrum sharing systems is the need for spectral occupancy information over a wideband. In addition, spectrum sharing techniques require sensing results from a large geographic area. In general, it is difficult to collect instantaneous spectrum occupancy information unless multiple parallel narrow-band, or wideband sensors are employed. However, it may not be very cost-effective to use wideband sensors, or many conventional sensors in parallel. Moreover, high sampling rates result in high energy consumption. Various sub-sampling techniques are proposed for wideband spectrum sensing [1]. Usually, these techniques require the spectrum to be sparse. The sparsity condition is often satisfied when the spectrum is viewed from a wideband perspective. The sparsity assumption is even more accurate in rural areas.
Numerous approaches have been developed for wideband spectrum sensing and reconstruction. The majority of the compressive sensing approaches for spectrum sharing are less practical, or require special hardware designs. Some of these techniques require random sampling, and analog processing [2], [3]. Other techniques require co-prime sampling rate combinations that can not be achieved with conventional ADCs [4]. A practical implementation of wideband compressed sensing is studied in [5]. A wideband sensing approach called BigBand is proposed in [6]. Specifically, BigBand is based on multicoset sampling and it is implemented using conventional analog to digital converters (ADCs). While BigBand approach is a good candidate for practical implementation due to its simplicity and fast reconstruction time, some issues were still not addressed. For instance, BigBand relies on the phase rotation property to detect a change in magnitude and decides if there is a collision between Nyquist bins that fall into the same sub-Nyquist bin. However, if the multiband signal is corrupted by noise, there will be a change in magnitude even when no collision occurs. This is especially true at low signal to noise ratio (SNR) scenarios. Furthermore, BigBand suggests to set all the frequencies that are associated with a specific sub-Nyquist bin to be occupied when a solution is not realized. Therefore, higher false positive rates are expected. In general, false positives in spectrum sensing techniques are not as harmful as false negatives. However, if the false positives are spread over a wide band and separated by the sub-Nyquist rate, then large chunks of the spectrum will be rendered as not usable for many spectrum sharing techniques.
This paper focuses on the development of a low complexity spectrum sensing and reconstruction technique that is suitable for practical implementation with conventional ADCs. We adopt a multicoset uniform-sampling setup with time delays similar to the one in [6]. We develop the general least square model to reconstruct the wideband signals under the condition of sparsity. In order to reduce the total reconstruction time, we incorporate a simple classification model to classify the sub-Nyquist bin collision order. The collision classification will enable us to solve for lower number of combinations. Hence, the total solution time is reduced. The final reconstruction algorithm combines the least square resolver and the collision detector. Since most spectrum sharing techniques require only the occupancy measurements, we focus on detection performance metrics to evaluate our system. However, the accuracy of signal reconstruction is another possible metric for evaluation. For instance, the estimation error of the magnitude and the phase of the reconstructed signal can be evaluated but is beyond the scope of this paper and will be considered in future work. The main contributions of this paper are: a) the solution for the general signal reconstruction case is developed, and b) a classifier is utilized to reduce solution time while keeping minimum effect on the reconstruction accuracy along wide range of SNR values.
The remainder of this paper is organized as follows. Section II describes the model and presents the sub-sampling approach. Section III provides the steps for reconstructing the sub-sampled signal, presents the collision classifier, and defines the performance metrics. Section IV demonstrates the simulation results. Finally, Section V concludes the paper.

II. SYSTEM MODEL AND PROBLEM STATEMENT
Let ( ) be a wideband signal of interest that is bandlimited to [− 2 , 2 ], where is the required Nyquist sampling rate that guarantees a full recovery of the signal. Practically, ( ) is the sum of different signals and noise over a multiband channel. Let [ ] = ( = ), = 0, . . . , − 1 be the Nyquist sampled version of ( ), where ∈ ℂ and the sampling time = 1/ . The frequency domain representation of [ ] can be calculated by computing the discrete Fourier is sparse in the frequency domain, we wish to recover the signal ( ) using a sampling rate that is less than the Nyquist sampling rate . We utilize a multicoset system with uniform samplers and time delays. Fig. 1 shows the multicoset system setup. The input signal ( ) is delayed over branches, each with a time delay of . To simplify the analysis, we restrict the time delay to multiples of , i.e., = , ∈ ℤ + 0 . The delayed signal is uniformly sub-sampled with a sampling rate = 1 < , and [ℓ] = ( = ℓ ), ℓ = 0, . . . , − 1 is the sub-sampled version of the input signal at branch . The DFT of [ℓ] is denoted by . The sub-samplers in this setup should not employ anti-aliasing filters; the aliased signals are later resolved in signal reconstruction. The down sampling ratio is defined as = = , where is the length of [ℓ] and is equal to the number of sub-Nyquist bins. To justify such a system in practice, must be less than (otherwise, we could divide the band into sub-bands, each of which is Nyquist-sampled); thus, < . Furthermore, assume is integer multiples of . Since ( − ) is sampled with a sampling rate < , the sampled signal is aliased and the corresponding Nyquist bins fall into specific sub-Nyquist bins. Consider sampling branches as shown in Fig. 1. For each branch , the sub-Nyquist bin ℓ can be written in terms of Nyquist bins as where (ℓ+ * ) is the index of the bin frequency (ℓ+ * ) . The phase shift term − 2 (ℓ+ * ) is the result of the time shift and can also be represented in terms of frequency and time delay as − 2 (ℓ+ * ) . Fig. 2 demonstrates the mapping of the frequency bins from the Nyquist dimension to the sub-Nyquist dimension. Clearly, one sub-Nyquist bin ℓ is equal to the summation of values in row ℓ. In practice, the sampled signal will be corrupted by noise. Hence, where ( ) is the signal part, including any channel distortion, and ( ) is additive white Gaussian noise (AWGN). In x(t − τ1) Fig. 1. Delay-based multicoset sampling. addition, let be the set of indices of occupied Nyquist bins, or more specifically, the Nyquist bins that contain signal energy. The number of occupied Nyquist bins is given by | | ≤ , and the occupancy level is defined as ( | | * 100)%. If the Nyquist bandwidth is sparse enough, some of the sub-Nyquist bins ℓ may contain only noise energy. Thus, if we know this information for a specific sub-Nyquist bin, we can declare corresponding Nyquist bins as unoccupied. To simplify the problem further, we rewrite the sub-Nyquist bin as a combination of signal and noise components.
Equation (3) implies that for a given ℓ if (ℓ+ * ) = 0, ∀ ∈ {0, 1, 2, . . . , −1}, then corresponding bins contain only noise. The first step to reconstruct the original signal is to identify the sub-Nyquist bins that contain at least one non zero signal bin. This can be achieved by a simple threshold-OR rule where ℒ is the set of threshold decisions, and is the threshold. For each sub-Nyquist bin that contains signal energy, we try to resolve that specific bin to its Nyquist equivalent. Nevertheless, the reconstruction task is not trivial since we don't know how many Nyquist bins are active in that specific bin. The assumption in [6] exploits the phase rotation property between different branches to decide whether there is a collision between Nyquist bins in that specific sub-Nyquist bin. This is accomplished by detecting a change in magnitude among the branches. However, once the signal is corrupted with noise, there will always be a change in magnitude among the branches. Furthermore, a higher noise level leads to a higher change in magnitude.
Since we are not interested in estimating Nyquist bins that contain only noise, we define frequency collision as the case when more than one Nyquist bin that contains signal energy falls into the same sub-Nyquist bin. In addition, we define the collision order of a sub-Nyquist bin as the number of Nyquist bins that contain signal energy and map to that specific sub-Nyquist bin. Let ℓ ∈ ℤ + 0 be the sub-Nyquist bin collision order, i.e., ℓ = 0 means the sub-Nyquist bin contains only noise, ℓ = 1 means the signal energy in a sub-Nyquist bin originates from one specific Nyquist bin, and ℓ ≥ 2 means that two or more Nyquist bins that have signal energy fall into one sub-Nyquist bin. In general, highly sparse spectrum leads to more sub-Nyquist bins of order zero and one. On the other hand, resolving one sub-Nyquist bin results in estimating Nyquist bins. Consider the case when only two components in a specific sub-Nyquist bin contain signal energy and the rest are only noise, e.g., We are interested in estimating the two main signal components (2 ) and (6 ) . In general, there is no exact solution to (5). The system of equations in (5) is overdetermined and consists of a linear combination of the variables (2 ) and (6 ) . Typically, the number of variables that exist and contain signal energy is not known. Without loss of generality, we will restrict the solution to an overdetermined system that is one dimension higher than the number of variables. Therefore, the maximum number of possible equations is equal to the number of branches .
The overdetermined system of (5) can be solved using the least square method. However, in practice we don't know which and how many variables exist. To address this issue, we can solve for all possible combinations of the frequencies for a given number of equations and variables. Furthermore, while it is difficult to detect a frequency collision in the presence of noise, the noise would also make it difficult to resolve the case of no collision without solving for the system of equations. Consider the case of one variable when 0 = 0, e.g., Ideally, if we know that only one Nyquist bin that contains signal energy falls into 0 , which is similar to the case of no collision in [6], then we can easily calculate the frequency given that ∠ 1 0 −∠ 0 0 = −2 (ℓ+ * ) 1 . However, due to the effect of noise, the value of the calculated frequency will not be accurate. In addition, it will be difficult to accurately decide where the detected bin falls when the frequency resolution is increased. Hence, we will also use the least square method to solve for the case of no collision.

III. SIGNAL RECONSTRUCTION
Let the number of variables for sub-Nyquist bin ℓ be ℓ ≤ ℓ , and the number of equations ℓ ≤ . For branches, (1) can be written in a matrix form as For each sub-Nyquist bin, ℓ consists of ℓ variables out of possible variables. Hence, we want to find the best ℓ fit among all possible combinations. In the following subsection, we define a generalized form for this problem.

A. Least Square Solution
Define ℓ as the set of all combinations of ℓ ∈ ℂ ℓ ×1 from possible variables. Specifically, ℓ := Similarly, define the set of all corresponding phase shift matrices ℓ : . At each ℓ, ℓ ∈ ℂ ℓ ×1 is the same for any combination. For each combination of variables ℓ , define the residual as ℓ ∈ ℓ , ℓ ∈ ℓ . We want to minimize for a suitable choice of ℓ . The general minimization problem becomes In general, cannot be made zero. In addition, (9) can be reduced to | ℓ | individual problems. Furthermore, for each value of , the problem is the well defined least square which is optimal for the linear model. Therefore, the solution is to The pseudoinverse solution for each combination is given bŷ which can be obtained by solving the normal equations or by using the QR decomposition method [7]. The final step is to selectˆℓ with minimum ∥ ∥ 2 . In principle, solving for more variables when there are enough branches produces a better estimation of the Nyquist bins. This is true even when there are fewer Nyquist bins that originate from a signal in comparison to the number of variables. However, this may increase the complexity of the problem. For instance, the order of change in the number of combinations between ℓ and ℓ + 1 for the same compression ratio is equal to ( − ℓ )/( ℓ + 1), e.g., if = 10, and ℓ = 2, then the number of combinations for ℓ is equal to 45, and for ℓ + 1 is equal to 120. Therefore, we can reduce the solution time by solving for fewer variables, which results in a trade-off between accuracy and time.

B. Collision Classifier
Although we are interested in estimating bin collision order, collision detection should be sufficient enough to reduce the computational complexity under the assumption of sparse spectrum. We consider a similar concept of phase rotation as in [6], except we no longer assume the change in magnitude to be zero in the case of no collision. We propose a collision detection scheme based on machine learning. Specifically, the algorithm aims to predict the collision occurrence by utilizing a trained model. The changes in magnitude among branches are attributes supplied to the algorithm. As such, ℓ is defined as the change in magnitude between branch 0 and . Namely, We adopt the Naïve Bayes classifier (NBC) as a collision detector because of its high speed 1 . Let ( ℓ | ℓ ) be the probability of the observation ℓ being in class ℓ . From Bayes rule we have in which ( ℓ ) is the prior probability of class ℓ . It is difficult to compute ( ℓ | ℓ ) unless conditional independence of the attributes given the class is assumed. Although this assumption is generally not satisfied, NBC still results in a classifier that often performs well [8]. Hence, the class conditional probability is given by The normalization value in (11) is the same regardless of the class and can be ignored. The correct class can be computed by using a maximum a posteriori (MAP) estimator.

C. Reconstruction Algorithm
Although the proposed classifier can identify different collision orders with acceptable accuracies, special attention should be given to its effect on the overall detection performance. As such, carefully chosen cost functions should be incorporated with the classifier to give a much higher cost for falsely predicting lower order collisions. In addition, since our objective is to reconstruct a highly sparse spectrum, most of the saved time comes from identifying no collision cases, i.e., ℓ = 1. Therefore, we aim to predict two states of collision order, i.e., ℓ = 1, and ℓ ≥ 2. The signal reconstruction approach is shown in the following algorithm. Before the algorithm enters the working mode, the classifier needs to be trained. The training process can be achieved by generating different signals in an environment that is representative of the overall band. The Nyquist and sub-Nyquist versions of these signals are collected and fed to the classifier for training.
In practice, the classifier can be trained by capturing signals from the real environment of interest. The sub-Nyquist samplers can still be used to capture Nyquist signals within their limits and hop over multiple chunks of the wide band.
The training process is only required for one time, and subsequently the trained classifier can be used to detect collisions. For each sub-Nyquist bin that is not free, we test for collision detection. If no collision is predicted, we solve for cases of one variable. Otherwise, if collision is detected, we solve for ( ℓ ) cases of ℓ variables.

D. Performance Metrics
System performance depends on many factors such as sparsity, SNR, threshold and the accuracy of the classifier. In order to evaluate the performance of the reconstruction algorithm, we generate multiple sinusoids randomly distributed over the entire band. Each sinusoid has a random amplitude and phase within some range. We combine these sinusoids and propagate them over a frequency-selective block-channel fading with AWGN. We set one sampling block to in which the channel does not change. This configuration will enable us to control the occupancy level over a wideband channel while measuring different performance metrics. Recall that is the set of indices of occupied Nyquist bins; consequently, it represents the set of active sinusoids. Therefore, the number of active sinusoids is equal | |. However, some sinusoids may be undetectable due to the combined effect of the randomization of their amplitudes and the frequency selectivity of the channel. Therefore, we evaluate probability of detection , and probability of false alarm relative to a typical Nyquist energy detection. Denote by CPD, the number of correct positive decisions. The detection ratio DR is defined as where NCPD is the number of Nyquist CPD, i.e., the number of correct detection decisions in the Nyquist bandwidth with- out sub-sampling. We also define the false alarm ratio FR as where FPD is the number of false positive decisions. We average DR and FR over many Monte-Carlo iterations to estimate and . The upper limit for for this approach is equal to ℓ . This stems from the fact that detection decisions are made for ℓ variables out of Nyquist bins for each sub-Nyquist bin, regardless if they are correct decisions, we declare the remaining − ℓ as free. The reason for choosing this approach is twofold. First, there is no robust method to decide whether the least square estimation was the correct one aside from the minimum of residuals approach proposed in Section III-A. Second, even if we impose a threshold-based decision on the residuals to dismiss some decisions, declaring Nyquist bins that are spread over the whole bandwidth as occupied will render large chunks of the spectrum as not useable.
The overall algorithm can be viewed as a two stage system. The first stage is the collision detector, and the second stage is the signal resolver. Occupancy detection is evaluated at the second stage, but its performance is also affected by the first stage. Our goal is to reduce the computational complexity of the signal resolver by correctly detecting collisions. The reduction in the computation steps is crucial for practical implementation since it reduces execution time. At the same time, we want to ensure minimum effect on the accuracy of detection. Obviously, when the collision detector falsely predicts a collision, the overall detection error is mostly not affected. By contrast, when the collision detector predicts no collision while in fact there is a collision, the overall detection error is highly affected. Hence, there is a trade-off between accuracy and computational complexity.

IV. PERFORMANCE EVALUATION
This section presents system performance results. A multisinusoid test signal with a specific occupancy level is generated at each sampling block. Sinusoid frequencies are randomly chosen from the set of all possible frequencies in the band. In addition, tone amplitudes and phases are chosen from (0.6, 1), and (− , ) respectively. The overall wideband signal is propagated over a frequency-selective Rayleigh fading channel that is constant during one sampling block. The channel tap delays and average path gains are set to (0, 1, 2, 4) ns, and (0, −0.5, −1.5, −2) dB, respectively. The Rayleigh fading process is normalized such that the average value of the path gains' total power is equal to one. The multi-sinusoid signal combined with the effect of the channel represents in (2). In addition, ∼ (0, 2 ) in (2) is added to , where 2 is the variance of , to generate the total multiband signal ( ). We define the SNR to be equal to 1/ 2 . All simulations are performed under the assumption of 1 GHz total bandwidth, while each branch is capable of sampling at a 100 MHz sampling rate. The time shift factors were chosen as follows, 0 = 0, 1 = 1, 2 = 2, and 3 = 3. Furthermore, = 5000, = 500, and = 10. An example of a subsampled signal is demonstrated in Fig. 3, and its reconstructed version along with the original signal are shown in Fig. 4. Due to the fact that this is a relatively high-SNR, low-occupancy signal, a three variable resolver is able to reconstruct it with high detection ratio.

A. Detection Performance
Figs. 5 and 6 show the performance results of Monte-Carlo simulations with a fixed two-variable resolver for different occupancy cases. Two of the simulated cases are carefully selected to show the effect of having a resolver with higher or equal number of variables in comparison to the collision order. Namely, in the first case listed in the legend, the tones are generated to produce no collision and each sub-Nyquist bin contains only one tone. On the contrary, all the sub-Nyquist bins contain two tones in the third case. Although we solve for two variables in both cases, is better in the first case in comparison to the third case. The higher in the first case comes at the cost of higher . This is due to the fact that we use two-variable resolver to resolve collisions of order one. Specifically, each solution may yield a false alarm since the two variable resolver gives a solution with two components and one of them is not occupied. This example demonstrates that the overall error probability bounds for this scheme are sophisticated and generally depend on delay set combinations, threshold values, and the order of the resolver relative to the average number of collision types. The tones in the second and the fourth cases are randomly placed in the entire bandwidth with 10% and 20% occupancies, respectively. The fourth case perform worse than the second case because it has significantly  larger number of collisions of third and higher order, which produce more of both false alarms and mis-detections. Fig. 7 and Fig. 8 show the detection performance of twoand three-variable resolvers with respect to occupancy level. Higher occupancy levels produce higher-order collisions. As a result, performance degrades as occupancy increases due to a higher number of unresolved signals. As expected, the three variables resolver can maintain higher detection ratios for more values of high occupancy in comparison to the two variables resolver. The improvement comes at the cost of requiring more branches and spending more time to solve for more combinations. While probability of detection is of highest priority for spectrum sensing, reconstruction latency can be considered one of the biggest obstacles when implementing sub-Nyquist sampling schemes. Therefore, any improvement in computation time is useful for practical implementation.

B. Collision Detector
The first step for implementing the reconstruction algorithm in Section III-C is to train the collision classifier described in Section III-B. We use three branches to generate training data based on the parameters provided in Section IV for multiple levels of sparsity and SNRs. The training data, meant to represent a practical scenario, is then used to train the classifier. Fig. 9      In general, FPR value of the classifier does not affect the overall signal detection because the resolver will use more variables to resolve the collision. As a result, it is of our interest to make the classifier more biased towards false positives. The bias can be imposed by either adding a cost function to the classifier with a higher cost given for FN decisions, or by training the classifier at a higher SNR than average. The latter method will set the classifier to produce more FP and less FN at low SNR scenarios because it considers the high noise cases as collisions. The classifier accuracy and error performance with respect to SNR are shown in Fig. 10.

C. Full Algorithm Performance
The main objective for implementing the reconstruction algorithm is to save time when reconstructing the signal.    Hence, we evaluate time saving relative to a fixed variable resolver. Two cases are considered for comparison. In the first case, the solution of fixed ℓ = 2 variables is compared to the reconstruction algorithm in which ℓ is set to 2 when a collision is detected, and ℓ = 1 otherwise. Fig. 11 shows the time saved by implementing the algorithm. Similarly, Fig. 12 shows the relative saved time by implementing the reconstruction algorithm for ℓ = 3. Obviously, the time saved for the second case is much higher due to the fact that for ℓ = 3 and = 10, 120 combinations of three variables, compared to 45 combinations of two variables for ℓ = 2 and = 10, are solved for each sub-Nyquist bin. The saved time is the result of solving for only 10 one variable combinations instead, whenever no collision occurs. In both cases, the saved time is inversely proportional to the occupancy level because of a lower number of collisions at low occupancy levels. Furthermore, the saved time is higher for lower threshold because more bins are taken into consideration. While a tradeoff between saved time and detection accuracy exists, Table I shows that the change in and values as a result of using the collision detector is insignificant. The first and second columns show the maximum and the mean of decrease in and values for Fig. 11. The third and fourth columns show the same values for Fig. 12.

V. CONCLUDING REMARKS
In this paper, we studied a sparse spectrum reconstruction method in which a delay-based multicoset system is employed to reconstruct a sub-sampled signal. While many other sub-Nyquist systems exist, this system is preferred for practical implementation due to its simplicity and relatively fast signal reconstruction time. We defined the general reconstruction model and derived the multi-combination least square approach. In general, solving for a higher order of variables even when there are less active tones in a sub-Nyquist bin produces more accurate estimates. However, higher order solutions require more time in addition to more branches.
In order to reduce the time needed to reconstruct the signal, we proposed a classifier to classify the aliased sub-Nyquist bins. While the total accuracy of the classifier decreases as SNR decreases, the reduction in the accuracy has minimum effect on the overall detection performance. However, more time is gained for more accurate classification. The saved time for the algorithm resolving a maximum of three variables is more than double that of algorithm resolving up to two variables. This is crucial since detection performance of the three-variable resolver is much higher than that of the twovariable resolver, provided that there are enough branches to solve for three variables.