Implementation of a novel architecture for DFT-based channel estimators in OFDM systems

A new architecture for Discrete Fourier Transform (DFT) based channel estimation has been analyzed, implemented and synthesized for ASIC. The core concept of the proposed estimation algorithm is to process the channel increments rather than the channel coefficients. With strong enough time correlation, we can reduce the wordlength of processing blocks compared to standard channel estimators and hence the resulting area and power. We provide an analytical tool to predict the potential gains in bit reduction for different mobility scenarios. Our simulations show that the wordlength can be reduced from 9 to 3 bits when operating in low mobility scenarios, with 5Hz Doppler frequency, while maintaining acceptable performance. Synthesis results show up to 40% reduction in area, compared to the original DFT-based approach, in a 65nm CMOS process.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal Take down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

I. INTRODUCTION
Many recent high-speed wireless communication techniques are based on OFDM, such as Wireless Local Area Network (WLAN), 3GPP Long-term Evolution (LTE), Digital Audio Broadcasting (DAB) and Digital Video Broadcasting (DVB) [1] [2]. A basic OFDM system is depicted in Fig. 1, which includes the transmitter (upper part), channel, and receiver (lower part). An important block in the receiver chain is the channel estimator, where the signal path (channel) from the transmitter to the receiver is estimated. Analysis [3] shows that 35-50% of the total base-band processing complexity (including memories) in a typical LTE receiver can be attributed to channel estimation and equalization. Therefore, it is of importance to find as efficient channel estimation implementations as possible.
Channel estimation in OFDM systems is a fairly old topic, with many different algorithms proposed over the years, e.g. [4] [5]. Channel estimation is most often based on known data on certain subcarriers at certain time instants, so called pilot signals. Channel estimation can be done directly in the frequency domain, by filtering and interpolating between the pilot positions. Another popular method is to convert the estimated channel values at pilot positions to the time domain, perform a windowing, and convert the estimates back to the frequency domain again. This method is often referred to as DFT based channel estimation [4] and is motivated by the possible complexity reductions. After channel estimation, the received signal is equalized (EQ) in the frequency domain and de-mapped.
The channel estimation method proposed in this paper, which is based on the work in [6], follows the principle of the DFT-based channel estimators, but further reduces complexity in slow-fading channels by exploiting the strong correlation between consecutive channel estimates. By processing the channel increment, rather than the channel itself, we obtain a signal with lower dynamic range that can be processed with a shorter wordlength without significant loss of performance. A property of the proposed method is that it contains feedback paths, which means that quantization errors will accumulate in a fixed point implementation. Regular channel estimation, for example a normal DFT based estimation, is therefore required occasionally to reset the accumulated errors. There are still several practical scenarios where this proposed algorithm can be an efficient alternative to traditional channel estimation.
The approach of processing the increment is most efficient in slow-fading scenarios, while mobile communication is often performed in a mix of scenarios. If the algorithm can be implemented with small enough foot-print, in terms of silicon area, it can serve as an extra, low power hardware accelerator for the low Doppler scenarios. With such an accelerator, a wireless terminal can reduce the power consumption of its base-band processing when used at home, the office, or other low mobility environments.
In more detail, a terminal using the proposed algorithm should be implemented with the possibility for a full channel estimation that could be multiplexed occasionally to avoid error propagation. The hardware for the full channel estimator could either be dedicated hardware, a software solution (Digital Signal Processor) or other hardware such as the main FFT intended for the normal transformation to the frequency domain, see Fig. 1. A small accelerator, like the one proposed, can therefore save power without increasing the total area significantly.
As mentioned above, the concepts described in this paper are based on the DFT class of channel estimators and two variants, a standard approach and a new solution, are compared against each other. The estimators are described in more detail in Section III and both hardware implementation and synthesis results are described in Section IV. Conclusions and acknowledgements are finally presented in sections VI and VII, respectively.

II. SYSTEM MODEL
A system model, along the lines of [2], including the processing blocks as well as the channel, is shown in Fig. 1. In the OFDM transmitter, the symbol mapper converts the input bitstream to symbols. The IFFT converts the symbols to time domain and a Cyclic Prefix (CP) is added to avoid Inter Carrier and Inter Symbol Interference (ICI & ISI). The resulting signal is transmitted through the Channel and the received data is expressed as where h is the the channel impulse response, x is the transmited signal, n is the noise, i is the time index, and ' * ' denotes convolution. Throughout the paper bold lowercase, bold uppercase, and normal letters indicate vectors, matrices, and scalars, respectively. The receiver processes the received signal by first removing the CP and then converting the data to the frequency domain using the FFT. The channel estimator typically estimates h using frequency domain data. The resulting channel estimateĤ k,l (in frequency domain) is passed to the Equalizer (EQ), here k is the OFDM symbol time index and l the subcarrier index. The final step is to demap the data from a given constellation to a bit stream. This model gives an overview of the OFDM system while the following sections will focus on one block in the model -the channel estimator.
The research presented in this paper is focused on the 3GPP LTE standard and in the evaluations we use the 5 Hz Doppler EPA and 70 Hz Doppler EVA channel models as specified in [7].

III. TIME DIFFERENCE BASED CHANNEL ESTIMATOR
This section will introduce the concept of channel estimation in an OFDM system followed by a description of a DFT based channel estimation. Finally, the time difference based channel estimation is explained.

A. Channel estimation
Channel estimation in an OFDM system is typically performed after the FFT, based on frequency domain pilot data, as shown in Fig. 1. The receiver has information of what the transmitter has sent (pilot data) and is able to estimate the channel (Ĥ k,l ) by comparing the received data Y k,l with where k and l are defined as before (time and subcarrier indices) and N k,l denotes the noise contribution in the frequency domain.

B. DFT based channel estimation
In a DFT based channel estimator, conversion of the pilot data, from frequency back to time is performed by the IFFT. There are several options for the processing in time domain [4], marked as 'R' in Fig. 2 which also can be seen as a linear transformation. The simplest approach is to use a time window where parts of the data (the output of the IFFT) if forced to zero according toĥ where is a limited identity matrix. This time windowing can also be interpreted as a low-pass filtering in the frequency domain. The length of the impulse response is in this paper set to the length of the normal cyclic prefix in LTE (M = 144) [8]. This simple windowing/filtering approach is assumed below. Other approaches could be used but the analysis of the wordlength reduction would not be affected to a large extent. The maximum number of sub-carriers is 2048 and the number of pilots per stream in LTE is 200. These numbers are presented to give a relation to the length of the cyclic prefix.
The FFT sizes are also dependent on these numbers. The FFT after the CP removal block is 2048 points as well as the FFT in the channel estimator. The IFFT in the estimator could be implemented as a 256 point IFFT since there are only 200 pilots.
The memory that holds theĤ values is needed in both the original DFT method and the proposed method and is used for time interpolation of the channel estimates. Further, the pilot data in frequency domain is defined as where p k is the pilot symbols in frequency domain and k is the OFDM symbol number. The two quantization points, Qĥ and QĤ , that are shown in Fig. 2, in addition to the different signals, are used and described in Sections III-E and IV.

C. Proposed method
In [6] we have introduced the theory for an alternative method of performing the DFT based channel estimation which can theoretically improve the area and power consumption in a hardware implementation. In this paper we further investigate, through an actual ASIC implementation, that improvements can be observed in a real implementation and we also introduce an analytical tool to predict the wordlength reduction in the FFT of the channel estimator.
The proposed method makes use of the high time correlation of channel estimates. Instead of processing the pilot data directly, as in the original algorithm, the difference between the two occurrences (in time) of pilot data, according to Fig. 3, is processed by the FFT according to whereĥ k is defined in Eq. 3 and ∆k is the distance in OFDM symbols between the pilot positions in time. In LTE, ∆k can either be 3 or 4 depending on the OFDM symbol number [8].
To provide a function for the theoretical bit reduction potential some basic concepts and equations need to be presented. For a Jakes-fading channel, the correlation between two samples of the channel, h k and h k−∆k , spaced T seconds apart, at a maximal Doppler shift of f d Hz is where σ 2 h is the variance of the channel samples and J 0 (·) is the zeroth-order Bessel function of the first kind [9]. If we were to process the channel samples directly, we would have to deal with the full variance σ 2 h . By processing the difference signal instead, we only have to deal with a variance The ratio between the variances of the difference and the original channel samples become For each 6.02 dB reduction of the variance, one bit less is required to maintain a fixed quantization-noise level. Therefore we estimate the potential difference in worldlength, when changing to processing of the differential signal, to be around which we show in Fig. 4 for a range relative Dopplers. The relative Doppler f d T in (11) is defined through standard LTE parameters as where T s = 70µs is roughly the LTE symbol time (varies somewhat) and ∆k = 4 is the longest distance between pilots, measured in LTE symbols. The no-gain region in Fig. 4 shows where we do not expect any gains by processing the differential signal. Two Doppler frequencies of special interest, 5Hz and 70Hz, are indicated in the figure since these are the ones used in the performance simulations in Section IV. However, it should be noted that the analysis is somewhat rough. For instance, it does not take into account the effect of taking the difference between samples with noise, which is what we do in a real situation. Realistic reductions in wordlength should therefore be somewhat smaller than prediced in Fig. 4, which is also shown in Sec. IV. The analysis shows that the wordlength of the FFT has the potential to be significantly reduced in low mobility scenarios, if the difference is processed rather than the channel samples directly. Some additional computations, compared to the original DFT solution, must be introduced and other considerations such as wordlength selection and noise contributions must be taken into account, which is analyzed throughout this paper. If we include noise contributions in (6) we get and it can be noted that the variance of the noise in ∆ĥ k will become twice as large, assuming the noise terms, n k and n k−∆k are uncorrelated. The proposed method must therefore be able to handle this extra noise contribution without significant performance degradation. Since the difference ∆ĥ k is processed by the FFT, instead ofĥ k , the wordlength of the FFT can be reduced, yielding area and power reductions. The fact that lower Doppler gives higher correlation makes this method more suitable in low Doppler  scenarios as shown in Fig. 4. The performance analysis in Section IV is threfore mainly focused on an EPA 5Hz Doppler channel as defined in the 3GPP specification [7].
The IFFT and the 'R' blocks are left unchanged compared to the original DFT approach as seen in Fig. 2 and Fig. 3. The FFT is still present but the input, ∆ĥ, has a reduced number of bits compared toĥ that is processed in the original DFT solution shown in Fig. 2. The additional hardware blocks required in the new approach are the 3 adders and the memory that stores the channel estimates for the previous time instance. The size of this memory should be the length of the impulse response of the channel, in this case 144 samples.
The feedback could be extended to also cover the IFFT block in Fig. 3 reducing the hardware complexity even further. The focus of this work has been of reducing the size of the FFT, which consumes the major part of the channel estimator, i.e. 2048 points compared to the IFFT which is 256 points.

D. Extension of the proposed method
It might be tempting to perform another discrete derivation, to reduce the word length even more, resulting in However, the noise contribution in ∆ 2ĥ k increases considerably, and it can be shown that the variance is six times larger than the original solution which makes this solution unpractical to implement.

E. Quantization
A fixed point hardware implementation requires a wordlength analysis for different parts of the algorithm. Both figures 2, and 3 show quantization points for the original and the new approach. The previous investigation in [6] showed that in the case of LTE, the input to the FFT could be reduced from 7 bits, for the original method, to 5 bits in the proposed method. This paper describes a complete fixed point implementation of the algorithm and shows slightly different quantization levels compared to the theoretical results in [6]. Quantization levels of interest are: the internal quantization of data in the FFT and the input and output of the FFT. These quantization levels together with architectural options are discussed in the next section.

IV. PERFORMANCE ANALYSIS
Throughout this work, standard approaches for the implementation of the FFT are used with the main purpose to analyze if the gain due to wordlength reduction remains in a full fixed-point hardware implementation [10]. A comparison between the standard and the proposed DFT based channel estimation is performed, with wordlengths that give limited performance degradation in terms of BER, in the different stages in the algorithm. The hardware architectures in this paper have been developed in synthesizable C++ and synthesized to VHDL with CatapultC and to netlist with Synopsis Design Compiler in a standard 65nm process.
The FFT architecture is based on a radix-2 pipelined architecture [10] [11], in which the internal quantization of the FFT is static with increasing wordlength in each stage [12]. All simulations in this paper have been performed with standard Rayleigh fading channels (EPA and EVA) specified for LTE by 3GPP [7] and with 64QAM constellation. Simulations for the original DFT method (DF T std ) presented in Fig. 5, show that the input (Qĥ) and output (QĤ DF T ) quantization levels before and after the FFT should both be 9 bits (marked with 9/9 in legend) with reasonable performance degradation compared to floating point (solid line), in an EPA 5Hz Doppler scenario. Perfect channel state information (CSI) simulations are also shown (dashed line) to give a baseline for the channel estimator. The proposed method, marked with DF T ∆ , has a quantization of 3 bits for the input (Q ∆ĥ ) and 4 bits for the output (Q ∆Ĥ ), marked as 3/4 in the Fig. 5. The reduction of bits from 9 to 3/4 is consistent with the theoretical results shown in Fig. 4. The difference between the theoretical and the actual simulations can be explained with the fact that the theoretical model does not include noise and uses perfect channel knowledge rather than estimates of the channel. The process of determining required wordlengths with acceptable performance degradation is performed in a step-wise process. Initially only the input and output are quantized while floating point precision is kept internally in the FFT, those results are shown in Fig. 5. Later, a fully fixed-point implementation show additional performance degradation. However, the internal wordlengths can be chosen to limit the additional degradation. Fig. 7 illustrates this extra degradation where the partially fixed point simulations from Fig. 5 (but with fewer simulation points) are compared with fully fixed point simulations. Scenarios with higher Doppler frequency will naturally show less difference in wordlengths between the two methods, as seen in Fig. 4. Fig. 6, where a 70Hz Doppler channel is simulated, shows that a quantization to 7 bits in both input and output is required to obtain the same performance as the standard DFT method with 9 bit quantization levels. This result also corresponds well with the theoretical results shown in Fig. 4. The gain in area, in the 70 Hz Doppler case, is too small to motivate a separate hardware accelerator. Fig. 7 shows results from a complete fixed point simulation and a partially fixed point simulation where only input and output are quantized. The details of the internal quantization levels are described in detail in Section V.

A. Limitations
A requirement for the proposed fixed-point implementation is that a complete DFT based channel estimation is done (without processing the difference) in certain intervals. This is to limit the effects of divergence and instability caused by the accumulation of quantization errors in the recursive structures of the proposed method. All simulations in this paper has been performed with a full channel estimation each 20:th time. This value is set it to a reasonable high value and the rate of divergence depending on the quantization levels should be analyzed to find out an optimal value and this is a subject for further studies.  The pipeline FFT structure is shown in Fig 8. A number of butterflies are connected in a pipelined fashion and each stage has a memory/registers associated with it. The wordlength should increase after each stage in the FFT pipeline to prevent overflow in the butterfly stages. Previous analysis show that the increase of bits can be stopped at an early stage [12] generally described by Fig. 9, where the wordlength increase is stopped at a certain stage, e.g. K=3. The relation between the different parameters is (15) Q input is the quantization level of the input data, P is the number of additional bits after the first stage in the FFT, K is the number of FFT stages where the wordlength is increased (by one bit in each stage) and L is the difference in bits between quantization level of the final stage compared to the output quantization level, Q output . A pipelined FFT structure with normal ordered inputs use memories of different size, see Fig. 8, in each stage. At a certain level the memory overhead needed becomes too large and registers are more advantageous in terms of area consumption. The memory required in the last stages of the pipelined FFT is mapped to registers and the threshold is set to 32 words. Another property in the pipelined FFT structure is that the number of unique twiddle factors decrease in each stage. This enable true constant multiplication in the The proposed method, marked by the dotted line in Fig. 3, and a regular FFT have been synthesized to demonstrate the area reduction. The same FFT architecture is used in both designs and they have been synthesized in a standard 65nm CMOS process from ST Microelectronics. A comparison with an existing FFT (the standard DFT approach) has also been included in Table I to benchmark the existing architecture. The P,K and L values are chosen based on the simulation results in Fig. 7. The clock frequency has been chosen to 100Mhz to approximately correspond to the required processing time for channel estimation operation. The value of the frequency is not the primary concern; rather the clock frequency for the proposed method and the standard method should be kept the same to provide a fair comparison. The design in [13], for the benchmarked architecture does not reveal the internal quantization method or coefficient: P , K and L. The reason for the difference in area might be that the benchmark architecture uses significantly higher internal quantization levels. The reduction in area is about 40% when comparing the standard method with the proposed one. The comparison is done on a subset of the channel estimator, the FFT, other parts of the estimator, such as memories, IFFT and other subblocks are not included. The power consumption is expected to be reduced by approximately the same amount as the area.

VI. CONCLUSION AND FUTURE WORK
This paper has described a fully fixed-point implementation of a new algorithm for channel estimation, operating on channel increments rather than directly on the channel itself. Analytical tools have been used to estimate the potential reduction of wordlength and both simulation and synthesis results demonstrate that the new estimation method achieves considerable gains in area. Around 40% gain in area is observed in the case of a 5Hz Doppler channel and power is expected to be reduced by the same amount. The actual wordlength reduction is consistent with the results from the analytical method. Future work includes power gain estimation P/K/L 1/3/3 1/2/3 - * The 2048 memory in Fig. 3 is excluded * * Normalization is done through factor: (65/95) 2 for 90nm and reducing the complexity further by including the IFFT in the feedback path.