Channel estimation and training design for hybrid multi-carrier MmWave massive MIMO systems: The beamspace ESPRIT approach

In this paper, we study the channel estimation problem for a cyclic prefix OFDM (CP-OFDM) based millimeter wave (mmWave) hybrid analog-digital MIMO system, where the analog processing is achieved using only phase shift networks. A three-dimensional (3-D) Standard ESPRIT in discrete Fourier transform (DFT) beamspace approach is developed to estimate the unknown frequency-selective channel. The required training protocol, analog precoding and decoding matrices, as well as pilot patterns are also discussed. Simulation results show that the proposed 3-D Standard ESPRIT in DFT beamspace based channel estimation algorithm provides accurate channel estimates when there is a sufficient number of snapshots.


I. INTRODUCTION
Millimeter wave (mmWave) massive MIMO techniques will not only gain from large chunks of underutilized spectrum in the mmWave band [1] but will also benefit from a significantly reduced form factor of the massive MIMO array [2]. However, one RF chain per antenna is impractical because the involved power consumption and the hardware cost are too high. To exploit the MIMO gain under a reasonable cost, one promising solution is to deploy hybrid analog-digital precoding schemes, realized using phase shifters or switches in the RF domain [3], and digital precoding schemes, implemented in the digital baseband domain as in conventional MIMO. If analog precoding is achieved using phase shifters only, the analog precoding matrix should have only constant modulus entries [4], [5], [6]. Furthermore, when a wideband multicarrier system is considered, we have to use the same phase shifts for all subcarriers [7]. These two constraints are stringent so that they lead to significant challenges not only for the precoding of the transmitted data but also for the required channel estimation tasks [4], [5], [8]. When a frequencyflat channel is considered, in [5] an adaptive compressed sensing (CS) based channel estimation algorithm is proposed to estimate the channel of a hybrid analog-digital massive MIMO system. This CS based channel estimation algorithm has been further extended in [9] by involving multiple measurement vectors (MMV) to improve the channel estimation accuracy. An adaptive multi-grid sparse recovery approach is applied in [10]. For a frequency-selective channel we have proposed multi-stage CS based channel estimation methods in [11] to reduce the involved computational complexity. Unfortunately, all the above CS based methods depend on the on-grid assumption of the channel parameters, which requires a gridoffset estimation for practical use. The grid-offset estimation itself is already a challenging task. Although in [12] a gridless CS mmWave channel estimation algorithm is developed based on the gridless CS theory [13], the proposed algorithm is not aware of the hardware constraint in a hybrid MIMO architecture and is only suitable for a one-dimensional estimation problem, e.g., single-antenna terminals. Hence, this motivates us to seek a gridless mmWave channel estimation method for hybrid mmWave massive MIMO systems.
In this paper we develop a gridless channel estimation algorithm for a hybrid point-to-point mmWave multi-carrier massive MIMO system. A cyclic prefix OFDM (CP-OFDM) based multi-carrier modulation scheme is used and training using pilot tones is considered. The resulting channel estimation problem is formulated as a three-dimensional (3-D) harmonic retrieval problem with compressed measurements. Inspired by the Unitary ESPRIT in DFT beamspace algorithm in [14], we develop a 3-D standard ESPRIT in DFT beamspace based channel estimation algorithm. When combined with our proposed training design, which involves the design of a training protocol, the analog precoding and decoding matrices, and the pilot patterns, the proposed algorithm provides highresolution estimates of the spatial frequencies and subsequently accurate channel estimates without the on-grid assumption. Only a few training symbols are used during the estimation. Simulation results also show that by using the proposed channel estimation algorithm the achievable system sum rate is only slightly affected in the low SNR regime.
Notation: Upper-case and lower-case bold-faced letters denote matrices and vectors, respectively. The expectation, transpose, conjugate, Hermitian transpose, inverse, and Moore-Penrose pseudo inverse are denoted by E{·}, {·} T , {·} * , {·} H , {·} −1 , and {·} + , respectively. The Euclidean norm of a vector and the absolute value are denoted by · and |·|, respectively. The Kronecker product and the Khatri-Rao product are denoted as ⊗ and ⋄, respectively. The m × m identity matrix and the exchange matrix are I m and Π m , respectively. The m × n matrix with all zero elements is 0 m×n .

A. System Model
We study a point-to-point mmWave massive MIMO system as in [11], where a base station (BS) transmits data to a multi-antenna user equipment (UE). The BS has M T transmit antennas and N T RF chains. The UE has M R receive antennas and N R RF chains. The number of RF chains is assumed to be much smaller than the number of antenna elements, i.e., M T ≫ N T and M R ≫ N R . A CP-OFDM based modulation scheme is applied to combat the multipath effect. The corresponding FFT size is N fft . Let s n [m] ∈ C NT represent the transmitted pilot vector on the n-th pilot subcarrier in the m-th OFDM symbol over N T RF chains (n ∈ {k 1 , · · · , k N f } ⊂ {1, · · · , N fft }, p ∈ {1, · · · , N f }, m ∈ {1, · · · , N t }). Thereby, the training procedure consists of N t OFDM symbols each with N f pilot tones. The pilot tones and the data tones are interleaved on all subcarriers and then passed through the IFFT filter. A CP of length N CP symbols is added, followed by an RF precoder F [m] ∈ C MT×NT using analog circuitry. The transmit power of the training vectors We consider a frequency selective quasi-static block fading channel. Assume that N CP has the same length as the maximum excess delay of the channel such that the inter-symbol interference is avoided. After passing through the channel, first, an RF decoder W H [m] ∈ C NR×MR is used at the UE. Afterwards, the CP is removed from the received signal. By using the FFT filter the time domain signal is transformed into the frequency domain. Let H n [m] ∈ C MR×MT denote the discrete channel transfer function (CTF) on the n-th pilot subcarrier in the m-th OFDM symbol of the UE. The received training signal on the n-th pilot subcarrier in the m-th OFDM symbol is given by [11]

B. Channel Model
We consider uniform linear arrays (ULAs) at both ends. Therefore, the array steering vector is a Vandermonde vector. We define an M -element Vandermonde vector a(µ) as Due to the lack of diffraction, a mmWave massive MIMO channel is modeled using a finite number of scatterers, e.g., L scatterers [15]. Each scatterer contributes to a single propagation path between the BS and the UE, which accounts for one time delay τ ℓ and one pair of spatial frequencies (µ T,ℓ , µ R,ℓ ) for ℓ ∈ {0, · · · , L − 1}. The frequency domain representation of the channel is given by [16] where α ℓ denote the complex gain of the ℓ-th path. The vectors a T (µ T,ℓ ) ∈ C MT and a R (µ R,ℓ ) ∈ C MR denote the array steering vectors of the BS and the UE, respectively.
Let T s = 1/(N fft · ∆ f ) represent the sampling period and ∆ f denote the subcarrier spacing. We model the sampled CTF on the n-th subcarrier in the m-th OFDM symbol as Note that equation (3) implies that the angular-delay profile remains unchanged while the complex channel gain might vary during different OFDM symbol periods.

C. Proposed Training Protocol
In our training protocol we divide the total number of N t training OFDM symbols into κ m training frames, each consisting of N T OFDM symbols. That is, N t = κ m · N T . We assume that the channel gains α ℓ [m], ∀ℓ are approximately the same in each training frame whereas they are different in different training frames, e.g., [12]. Under this assumption, we will use the notation α κ,ℓ instead of α ℓ [m] in the rest of the paper. Furthermore, the same analog precoding and decoding matrices are used in all κ m training frames. Thus, is a scaled orthogonal matrix for all n. Note that more than N T OFDM symbols in one frame can achieve the same goal, but at a cost of wasting more training resources. The same training matrix S n is used in different frames. Therefore, by pre-multiplying S H n from the right-hand side of the received signal matrix of N T consecutive OFDM symbols in each frame we obtain where κ ∈ {1, · · · , κ m } andZ n,κ = W H Z n,κ S H n denotes the effective noise, where Z n,κ ∈ C MR×NT comprises N T consecutive z n [m] in the κ-th frame. By vectorizing Y n,κ and stacking them on top of each other for n ∈ {k 1 , · · · , k N f } we obtain where Φ ∈ R N fft ×N f is a column selection matrix with ones on the (k p , p)-th elements and zeros otherwise, and The vectorz κ denotes the vectorized version of the effective noise matrix Z k1,κ · · ·Z kN f ,κ . Borrowing the array signal processing terminology [17], we see equation (4) corresponds to a single snapshot model.
We increase the number of snapshots by placing the y κ next to each other, ∀κ. Finally, we get (5) is rewritten as where B = B f ⋄ B T ⋄ B R ∈ C N f NtNR×L denotes a virtual array steering matrix and equation (6) corresponds to a κ msnapshot model.

III. STANDARD ESPRIT IN BEAMSPACE BASED CHANNEL ESTIMATION
Let µ ℓ = [µ R,ℓ , µ T,ℓ , µ f,ℓ ] T ∈ R 3 denote a tuple of 3-D frequencies. Using the reformulation in (6) the channel estimation task can be accomplished by firstly estimating the 3-D frequency vector µ ℓ and then the complex gain matrix Γ, e.g., the LS estimate of Γ is given by Γ = B +Ȳ . Without the compressed measurements in (6), the spatial frequency estimation problem yields a 3-D harmonic retrieval problem as in [17] problem and thus can be solved using the algorithm in the same paper. Nevertheless, inspired by the Unitary ESPRIT in DFT beamspace algorithm in [14], we develop a 3-D Standard ESPRIT in DFT beamspace to estimate spatial frequency vectors µ ℓ , ∀ℓ. Similarly as other ESPRIT-type algorithms, e.g., [17], the involved two major steps in our algorithm are the estimation of signal subspace and solving shift invariance equations. More precisely, we aim at designing Φ, F and W as well as selection matrices J x,i ∈ C M sel,x ×Nx for i = 1, 2 and x ∈ {f, R, T} such that the following three shift invariance equations exist J r,1 BΦ x = J r,2 B, r = 1, 2, 3, x ∈ {f, R, T}.
The spatial frequencies to be estimated are contained in the diagonal matrices Φ x = diag e jµx,0 · · · e jµx,L−1 , ∀x. Similarly as in [17], the compound selection matrices J r,i , ∀r, i, have Kronecker structures and are expressed as After the shift invariance equations are created, we compute an orthonormal basis of the column space of B, which is denoted by U s ∈ C N f NTNR×L from the measurement matrixȲ . Then we replace the unknown matrix B in (7) by U s , which yields where an eigendecomposition satisfies Ψ Consequently, the eigenvalues of Ψ x ≈ (J r,1 U s ) + J r,2 U s are estimates of e jµ x,ℓ , ∀x, ℓ. In this way the spatial frequency vectors µ ℓ , ∀ℓ can be computed using the algorithm in [17] and an automatic pairing of the frequency estimates is achieved.
In Section III-A we discuss how to build the shift invariance equations. In Section III-B we explain how to obtain the orthonormal basis U s .

A. Building Shift Invariance Equations
Thanks to the Kronecker structure in (8), we can design J x,i for all dimensions separately such that When x = f, equation (10) can be achieved by setting k p = p, i.e., Φ contains the first N f columns of I N fft . Then B f becomes a reduced-dimensional version of A f . We can use when a maximum overlap is used [17].
The same design cannot be applied when x = R, T because F and W must have constant modulus entries. Inspired by the Unitary ESPRIT in DFT beamspace in [14], we propose to construct F T and W H using N T and N R successive rows of M T × M T and M R × M R DFT matrices, respectively. By applying a similar manipulation as in [14] we have the subsequent statement. Lemma 1. When x = T, R and i = 1, 2, the shift invariance equation The proof is straightforward by following [14].
When N x < M x for x = T, R, the scope of the search, i.e., the spatial sector specified by µ x,ℓ , will be narrowed down because each row of J x,1 and J x,2 relates to two successive components of the DFT beamspace array steering vectors. Following the discussion in [18], it can be derived that the steering angles should fall in the region

B. Estimation of the signal subspace
When κ m ≥ L, by directly applying the SVD onȲ we can construct U s , i.e., the orthonormal basis of B, by using the first L left singular vectors ofȲ . However, a large κ m also implies a large number of training frames. To reduce the number of training symbols, we propose to use the spatial smoothing algorithm in [17] to extend the number of snapshots. The spatial smoothing technique can be applied over three dimensions separately. It increases the number of snapshots by reducing the dimension of effective measurements. When a DFT beamspace is considered, i.e., x = R, T, this also leads to a reduced effective aperture of the spatial domain. Therefore, we only apply the spatial smoothing along the frequency domain.
Similarly as described in [17], let us divide N f virtual sensors in the frequency domain into L f subarrays, each contains M sub,f = N f − L f + 1 elements. We define selection matrices J ℓ f ,1,1 =J Using the definitions above we express the final spatially smoothed data matrix as where Lastly, we compute the SVD ofȲ ss to obtain a new orthonormal basis U s ∈ C M sub ×L , which consists of the first L left singular vectors of J 1,1,1Ȳ . The selection matrix J f,i in (8) should be adjusted to yield a dimension of (M sub,f − 1) × M sub,f for i = 1, 2, when a maximum overlap is considered. By using the proposed 3-D Standard ESPRIT in DFT beamspace algorithm, the maximum number of resolvable scatterers is Remark 1. Note that it is also possible to extend other high-resolution line-spectra estimation tools based on our DFT beamspace principle, e.g., the MUSIC algorithm [19]. However, compared to ESPRIT-type algorithms, the MUSIC algorithm yields a 3-D search and thus is computationally much more expensive.

IV. SIMULATION RESULTS
The proposed 3-D Standard ESPRIT in DFT beamspace based channel estimation algorithm is evaluated using Monte-Carlo simulations. A total transmit power over all subcarriers in one OFDM symbol is set to unity. The SNR is defined as SNR = 1/(N fft σ 2 n ). The overall transmit power of the training vectors is defined as P pilot = N f N fft and it is equally allocated to N f pilot tones in one OFDM symbol. We set N fft = 64 and L = 2. The channel gain α ℓ , ∀ℓ, is ZMCSCG distributed with L ℓ=1 E{|α ℓ | 2 } = 1. Two training frames are used, i.e., κ m = 2. When the sum rate of the system is evaluated, we consider only N data = 2 active data subcarriers to be computationally efficient. The inter-element spacing of the ULA is equal to half of the wavelength. We set M T = 64, M R = 32, N T = 8, N R = 4, and N f = 8 in the simulations. According to the discussion in Section III-A, this setup corresponds to a search region of approximately π/4 for both µ R,ℓ and µ T,ℓ , ∀ℓ. Therefore, in our simulations we set µ x,ℓ ∈ (0, π/4) for x = R, T, ∀ℓ. We investigate the performance of the estimation algorithm for both spatial frequencies and the channel. For the former one the mean squared estimation error (MSE) is used as the criterion, i.e., Let us define h κ = (A f ⋄A T ⋄A R )·α κ . The channel estimation performance is evaluated using the normalized mean squared estimation error (NMSE), i.e., The proposed 3-D Standard ESPRIT in DFT beamspace algorithm is denoted as "B-ESPRIT". The simulation results are obtained by averaging over 500 channel realizations.
As depicted in Figures 1 and 2 both the MSE and the NMSE reduce when the number of virtual subarrays in the frequency domain increases. Moreover, when the number of virtual subarrays is fixed, the 3-D Standard ESPRIT in DFT beamspace algorithm provides more accurate estimates of the spatial frequencies than those of the channel. This advantage might be exploited when the hybrid analog-digital design is based on the spatial frequencies, e.g., [4]. Note that when L f = 1, i.e., the number of effective snapshots is equal to that of spatial frequencies, the performance curve of the proposed algorithm is not smooth. This phenomenon does not appear when the number of effective snapshots increases. This implies that to achieve a better performance we should have more snapshots than the number of spatial frequencies.
To evaluate the achievable sum rate of the system with estimated channel state information (CSI), we apply the channel matching based hybrid design as described in [11] and [6]. Since our focus is on the effects of imperfect CSI rather than the overall system spectral efficiency, the demonstrated sum rate in Fig. 3 accounts only for the data transmission phase, during which the analog as well as digital precoding and decoding schemes are modified according to the channel matching based hybrid design and the estimated CSI. The achievable sum rate increases when the number of virtual subarrays increases. But the performance difference under different simulation settings is obvious only in the low to medium SNR regime.

V. CONCLUSION
A gridless 3-D Standard ESPRIT in beamspace based channel estimation algorithm has been proposed to estimate the CSI of a hybrid mmWave massive MIMO system. The required training protocol, analog processing as well as pilot patterns have been discussed and developed. Simulation results show that the proposed 3-D Standard ESPRIT in beamspace algorithm provides accurate estimates of both the CSI and the spatial frequencies using only a few training symbols especially when combined with spatial smoothing technique. Moreover, the channel estimation error only slightly affects the achievable sum rate when the channel matching based hybrid design is applied.