Pipelined design of an instantaneous frequency estimation-based time-frequency optimal filter

Pipelined signal adaptive hardware design of an optimal time-frequency (TF) filter has been presented. It is based on the real-time results of TF analysis and on the TF analysis-based instantaneous frequency (IF) estimation. The implemented pipelining technique allows the filter to overlap in execution unconditional steps performing in neighboring TF instants and, therefore, to significantly enhance time performance. The improvement in execution time corresponding to the one clock cycle by a TF point (i.e. even 50% in some TF points) is achieved. The design is tested on multicomponent signals and compared with the other possible IF estimation-based TF filter's designs.


INTRODUCTION AND BACKGROUND
Efficient processing of nonstationary signals requires time-varying approaches that can be defined by using common time-frequency distributions (TFDs). Classical TF filters, related to the Richaczek distribution, [1], shorttime Fourier (STFT), [1]- [3], and Gabor transform, [4], as well as the Wigner distribution (WD), [5]- [8], exhibit serious drawbacks (useless in the nonstationary signals case, low resolution, and restriction to the halfband signals, respectively) that significantly limit their applicability. Extended versions of these filters, [1]- [3], suppress the noted drawbacks, but these solutions are numerically quite complex, require significant time for calculation, and thus unsuitable for real-time analysis. Hardware implementations, when possible, can overcome these problems, enabling applications of TF filters in practice.
Single-clock-cycle designs, [1], [2], [9], are quite complex and require repeating of basic calculation elements if they need to be used more than once. Their complexity strongly depends on the estimated signal duration, so they are capable to filter only signals with the predefined duration. By considering the noted drawbacks, here we develop a pipelined multiple-clock-cycle signal adaptive design of a WD-based optimal TF filter suitable for multicomponent FM signals and the real-time implementation.
Following the procedure of the optimal (Wiener) stationary filter development, [10], and considering the case of FM signals f i (n), i=1,…,q, highly concentrated in the TF plane around their instantaneous frequencies (IFs), and of the additive, widely spread white noise ε(n), not correlated with the estimated FM signals, the FRS of the optimal TF filter corresponds to the combination of IFs of signals f i (n) [8], [11]. Then, the optimal TF filtering of nonstationary FM signals can be reduced to the IF estimation in a noisy environment. In the TF analysis framework, the IF estimation is performed by determining frequency points k i , i=1,…,q, where the TFD of noisy signal has local maximum, [8], [11]- [13], where k i Q is the basic frequency region in TF plane around f i (n), the IF of which is IF i (n). Among all quadratic TFDs, the WD produces the best IF estimation characteristics for the highly nonstationary mono-component signals case, [12], but also emphatic cross-terms in the multicomponent signals case. Crossterms-free WD (CTFWD) retains the desired IF estimation characteristics of the WD in the mono-component signals case, but also, in the non-overlapping multicomponent signals case, the IF estimation characteristics of the CTFWD, obtained for each signal's component separately, remain the same as for the case when only that particular component exists, [13]. Besides, it is based on the same STFTs used in (1), has already been implemented in real-time, [14], and therefore, can be used as a base in an optimal nonstationary TF filter development, as performed in [11]. Here, the design from [11] is additionally improved by the pipelining technique application.

PIPELINED IMPLEMENTATION
Complete pipelined hardware design of an optimal TF filter, principally following (1) and based on the IF estimation (2), is given in Fig.1. It follows the signal adaptive design principles developed in [11], but additionally includes pipelined execution through the development of a new control for the filtering execution, Fig.1.
The design performs the estimation in L(n,k)+2 steps   . per frequency point, where each of these steps is executed in the corresponding CLK of the filtering execution. In the first L(n,k)+1 steps, the CTFWD sample is calculated in the STFT-to-CTFWD gateway, [14]. By an STFT_Load/CTFWD_Store cycle, it is stored into the ShMemBuff (used to move through the CTFWDs and to produce basic frequency region Q k , eq.(2)). The IF estimation (2) is then implemented in the COMP BLCK and in the (L(n,k)+1)-st-estimation-step, as described in [11]. As a main contribution of the paper, the estimation step is overlapped in execution by the 0-th step of the next frequency point k+1, since in each TF point, only the 0-th-SPEC execution-step and the estimation one are unconditional (to provide the SPEC-based IF estimation). Residual steps are conditional and depend on the estimated signal shape. They are used to improve the IF estimation quality up to the CTFWD-based one and are taken only in TF points existing inside the STFT auto-terms' domains, determined by the signal adaptive period of the STFT_AT_Reg signal. The STFT_AT_Reg signal crucially affects the signal adaptive CTFWD calculation, [14], and makes the estimation CLK from the conditional (L(n,k)+1)-st one. In this way, the STFT_AT_Reg signal allows the proposed design both to optimize the number of CLKs taken in different TF points within the execution and to produce the CTFWD-based IF estimation. It also controls the filtering completion in the observed frequency point. Following the partly pipelined development from [11], in the proposed pipelined implementation, the finalcompletion-step of a signal point n is also performed after the execution in each frequency point from the observed signal point n and is overlapped in execution with the However, as a main contribution of the paper, the design considered here additionally improves the execution time. It allows overlapping in execution of the unconditional steps between the neighboring frequency points k, k+1, k=-N/2+1,…,N/2, Figs.2, improving the execution time by one CLK, but per frequency point. This can be a significant development compared to the partly pipelined design from [11], because each signal point contains N frequency points. Residual steps cannot be included in pipelining, because they are conditional and do not have to exist. Signals STFT_Load/CTFWD_Store, RESET,, CumADD_CLK, Gateway_CLK and CumADD_RESET control the CTFWD calculation, the summation in the CumADD in a frequency point k, k=-N/2+1,…,N/2, the filtering completion in a signal point n, as well as the pipelining execution, as shown in detail in Fig.3. Generation of these signals, shown in Fig. 1, slightly increases hardware complexity, but also decreases capacity of the look-up-table memory required by the implementation, as presented and can be noted from Table 1.
By using pipelining technique, the design presented here improves throughput of the implementation that corresponds to a CLK per TF point. In comparison to the design from [11] and depending on the ShMemBuff size L Q and on the normalized signal rate, the improvement can reach values of about 15% (for L Q =7) in TF points existing around IFs, up to the 50% in TF points existing outside STFT auto-terms, as visually represented in Fig.5.

TESTING AND VERIFICATION
To provide a full qualitative comparison with the design from [11], our design is verified through the estimation of the same 3-component test signal considered in [11]:  For the used parameters, the design proposed here has been implemented in the Stratix II family EP1S10F780C5 device (when about 12% total logic elements, 41% total memory bits, and 83% total I/O pins are used). The longest path of the considered design corresponds to the generation of the STFT_AT_Reg signal in half of a CLK, through a multiplier, an adder and a comparator. It determines the maximum CLK rate of about 25 MHz for the case of the used parameters and the used FPGA device.
To visually represent the achieved improvement, distribution of CLKs taken by the proposed design per frequency point in the signal (3) case is shown in Fig.5, but also is compared with the design from [11]. For the observed case, the improvement can easily be noted, computed, and numerically expressed by 45.305%, Table 1.

COMPARISONS AND CONCLUSIONS
To derive appropriate conclusions, the pipelined design proposed here will be compared with the other possible IF-estimation-based TF filter designs.
The single-clock-cycle implementation (SCI) with a fixed CLK cycle, the classical multiple-clock-cycle one (MCI) with a fixed number of CLKs, the hybrid one and the signal adaptive one, but only partially pipelined, would also be considered as the possible implementation approaches of the IF estimation-based TF filter. The comparisons are summarized in Table 1. The SCI approach (when it is possible regarding its complexity, Table 1) would be based on the STFT-related gateway execution in the first half of a CLK cycle, [9], [15], [16], but also on the IF estimation performed in the second half of the same cycle. The classical MCI approach assumes the STFTbased gateway execution in a higher, but fixed number of L m +2 CLKs, [16], and the IF estimation in the next separate-estimation-CLK. Hybrid implementation approach would be designed to make a balance between the desired characteristics of the classical MCI and SCI approaches, [16]. It would be based on the SCI of the STFT-related gateway corresponding to the fixed convolution window width of L h (1≤L h <L m ), but also on achieving the desired TF representation (corresponding to the maximum convolution window width of L m ) in η=CEIL(L m /L h ) CLKs by a TF point, where operator CEIL(L m /L h ) rounds the value L m /L h to the nearest integer towards infinity. This approach would also assume the IF estimation in the separate (estimation) CLK. In addition, it would include LUT memory of η+1 locations, Table 1, to manage the execution in η+1 CLKs by a TF point, as well as very complex control to achieve the desired TF representation.
The proposed design retains desirable characteristics of the classical MCI and the partly pipelined signal adaptive design from [11] and [14], regarding calculation and implementation complexity. In addition, by applying the pipelining technique, it improves execution time of the  (1) implementations. T cP , T cH , T cSF , T cSA are CLK cycle times in the cases of the SCI design, hybrid one, classical MCI one (with a fixed number of CLKs) and the signal adaptive ones (partly pipelined, [11], one and the proposed pipelined one), respectively. T comp and T s are the comparison and 1-bit shift times, respectively. Execution times per signal point of the signal adaptive designs have been given for the considered signal (3) case and for N=256, L m =7.
partly pipelined signal adaptive design approximately up to the about 45%, depending on the estimated signal shape, but also it can improve the SCI design execution time (for T s ,T comp <<T m <12.353×T a ). From the other side, the hybrid implementation approach would improve SCI performances related to the hardware complexity (but not the corresponding MCI performances), as well as execution time of the classical MCI approach (but not SCI execution time), Table 1. However, since the signal adaptive approaches retain hardware complexity of the corresponding classical MCI approach, Table 1, and can improve execution time of the corresponding SCI approach, Table 1, [11], these approaches would also significantly improve performances of the hybrid implementation approach.
Finally, it can be readily concluded that the pipelined signal adaptive approach overcomes the corresponding IF estimation-based approaches regarding almost all critical design performances. Moreover, it enables high quality real-time TF filtering, based on the highest quality signal adaptive CTFWD-related IF estimation, unlike the nonadaptive designs, [1], [2], [9], [15]- [17], that cannot produce so high quality results.