Tackling Pilot Contamination in Cell-Free Massive MIMO by Joint Channel Estimation and Linear Multi-User Detection

—In this paper we consider cell-free (CF) massive MIMO (MaMIMO) systems, which comprise a very large number of geographically distributed access points (APs) serving a much smaller number of users. We exploit channel sparsity to tackle pilot contamination, which originates from the reuse of pilot sequences. Speciﬁcally, we consider semi-blind methods for joint channel estimation and data detection. Under the challenging assumption of deterministic parameters, we determine sufﬁcient conditions and necessary conditions for semi-blind identiﬁability, which guarantee the non-singularity of the Fisher Information Matrix (FIM) and the existence of the Cramer-Rao bound (CRB). We propose a message passing (MP) algorithm which determines the exact channel coefﬁcients in the case of semi-blind identiﬁability. We show that the system is identiﬁable if the Karp-Sipser algorithm yields an empty core. Additionally, we propose a Bayesian semi-blind approach which results in an effective algorithm for joint channel estimation and multi-user detection. This algorithm alternates between channel estimation and linear multi-user detection. Numerical simulations verify the analytical derivations.


I. INTRODUCTION
Recently, cell-free (CF) massive MIMO (MaMIMO) systems are attracting extensive research interests as an effective and promising approach for next generation wireless systems thanks to their potential to reap the benefit of both MaMIMO and distributed antenna systems (DAS).CF MaMIMO systems consist of a massive number of access points (APs) which serve a much smaller number of single-antenna users and are geographically distributed over a large coverage area.All the APs are connected through a back-haul network to a central processing unit (CPU).The massive number of antennas improves spectral efficiency [1] whereas energy efficiency [2], [3] and macro-diversity gain result from the distributed topology and ultra-densification.Additionally, since each user is surrounded by a large number of serving APs, with high probability all the users enjoy good channel conditions [4].Therefore, CF MaMIMO systems are expected to provide significant improvements in terms of spectral/energy efficiency and coverage probability.In [1], [5], the performance of CF MaMIMO and small-cell systems were compared under the assumption of employing maximum ratio (MR) processing.In [6]- [10], the authors advocated the use of more effective processing than MR processing in CF MaMIMO to guarantee superior performance of CF MaMIMO systems compared to small-cell systems.The performance of CF MaMIMO systems is critically affected by the so-called pilot contamination.This impairment originates from the reuse of training sequences or pilots utilized in channel estimation, which prevents the possibility of obtaining an adequate estimate of the channel state information (CSI).The detrimental effects of pilot contamination were highlighted in [11] for centralized MaMIMO systems.Specific features of centralized MaMIMO channels such as channel hardening and favorable propagation or limited angular spread could be exploited to "separate" user channels in power domain [12], angular domain [13], [14], or jointly in power and angular domain [15] and thus, mitigate or annihilate pilot contamination.However, these appealing properties of channels in centralized MaMIMO systems are destroyed in a distributed setting and pilot contamination is still an open and challenging problem in CF MaMIMO systems.Several pilot assignment (PA) methods for mitigating pilot contamination in CF MaMIMO systems were proposed recently in [1], [16]- [18].In [1], a greedy pilot assignment (GPA) based on knowledge of large-scale fading channel coefficients was proposed.In [16], a location-based greedy (LBG) pilot assignment scheme utilized the location information in a GPA algorithm.The structured PA approach proposed in [17] maximized the minimum geographical distance between users sharing the same pilot sequences.An additional PA method based on graph coloring was proposed in [18].All these techniques address the pilot contamination problem via a careful assignment of pilots and do not exploit the inherent structure of channels and data in CF MaMIMO systems in contrast to blind or semi-blind estimation and detection techniques.A blind pilot decontamination approach was proposed first in [12] for centralized MaMIMO systems and utilized asymptotic orthogonality of user channels to remove undesired interference including pilot contamination from the received signal.The same property was also exploited for semi-blind channel estimation, e.g., [15], in centralized MaMIMO but it does not hold in CF MaMIMO systems [8], [9].Blind and semi-blind channel estimation have been thoroughly investi-gated in general settings, see, e.g., [19]- [22] and references therein.In this context, the concept of identifiability was very relevant since it guarantees the non-singularity of the Fisher information matrix (FIM) and thus, the existence of the Cramer-Rao bound (CRB).The corresponding conditions provide fundamental insights into the feasibility of reliable communications in the analyzed system.Conditions under which channel and data signals are blindly and semi-blindly identifiable have been thoroughly studied in various settings for centralized systems, see, e.g., [23], [24].
CF MaMIMO channels are inherently sparse due to the distribution of APs over a large area and the natural path loss of wireless channels.In this paper, we study semi-blind joint channel estimation and data detection for exploiting the sparsity of the channel support in CF MaMIMO systems to combat pilot contamination.The potential of this approach is analyzed via system identifiability and sets of sufficient conditions and necessary conditions under which channels and data are identifiable are provided.We define a graph that has APs and users as factor and variable nodes and propose a message passing (MP) algorithm over this graph which computes the channel coefficients if the identifiability conditions are satisfied.As by product, we also show that the conditions for semi-blind identifiability are satisfied if the Karp-Sipser algorithm [25]- [27] yields an empty core.Additionally, we derive the FIM and CRB for joint channel estimation and data detection and propose a Bayesian semiblind method that alternates between channel estimation and linear multi-user detection.
The remainder of this paper is organized as follows.We describe the system and channel model in Section II.The CRB and the identifiability conditions under the assumption of deterministic parameters are presented in Section III and IV, respectively.In Section V, we propose a Bayesian semi-blind iterative algorithm that alternates between channel estimation and linear multi-user detection.Numerical results are illustrated in Section VI.Finally, concluding remarks are drawn in Section VII.
Notation: In the following, superscripts T , * , and H stand for transpose, conjugate, and conjugate transpose, respectively.Uppercase and lowercase bold symbols denote matrices and vectors, respectively.The expectation operator is indicated by E{.} and I P is the P ×P identity matrix.Here, • and diag(.)denote the Euclidean norm operator and the squared diagonal matrix consisting of the diagonal elements of matrix argument, respectively.vec(.) where A :,j is the j-th column of matrix A and tr{.} is the trace operator.The Kronecker operator is denoted by ⊗.Finally, N (µ, σ 2 ) and CN (µ, σ 2 ) denote a real and a complex Gaussian distribution with mean µ and variance σ 2 , respectively.

II. SYSTEM MODEL
We consider the uplink of a CF MaMIMO system consisting of K users and M APs equipped with a single antenna and randomly distributed over a D×D square area.We assume that M ≥ K.The M APs are connected to a central processing unit (CPU) via a back-haul network.The channel matrix between the APs and users is given by H ∈ C M ×K , whose (m, k)-element h mk is the channel coefficient between AP m and user k and is modeled as follows where β mk represents the large-scale fading coefficient which accounts for path loss and shadowing effects and g mk represents the small-scale fading.We assume that and identically distributed (i.i.d.) complex normal random variables, i.e., g mk ∼ CN (0, 1).Additionally, we assume perfect knowledge of the large-scale fading coefficients Due to the path loss, the channel coefficients are assumed to be negligible at distances higher than a given threshold γ.Then, for each AP m, the CPU is required to estimate only the channels of the users in a disc centered around AP m with radius γ while the signals transmitted from users external to the disc are treated as additive noise.We denote by K I (m) and K 0 (m) the sets of users inside the disc centered around AP m and remaining users, respectively.At a global level, this determines a partition of the channel coefficients into two groups, the channel coefficients that have to be detected Consistently with this partition, we decompose the channel matrix H into two matrices H I and H 0 such that H = H I + H 0 .Then, H I and H 0 of size M × K denote the matrices of the relevant and negligible channel coefficients, respectively.Throughout this paper, we assume that γ D and the APs are distributed over the whole region such that matrix H I has a large number of zero elements.
In the uplink transmission, each user sends one of P pilot sequences known by the CPU followed by L−P unknown data symbols.The pilot sequences are assumed to be ortho-normal, i.e., orthogonal with unit norm.The L received symbols at the M APs are given by

III. CRB FOR SEMI-BLIND JOINT CHANNEL ESTIMATION
AND DATA DETECTION To analyze the performance of the semi-blind channel estimation, we derive the CRB in a deterministic framework.In the deterministic framework, both data signal X d and relevant channel coefficients H I are modeled as unknown deterministic quantities.Thus, we have y ∼ CN m y (θ), C yy where y = vec(Y) and θ = [h H I vec H (X d )] H is the complex unknown parameter vector to be estimated.Here, h I is a vector deduced from the non-zero elements of the matrix H I , whose support is known.Mean and covariance of received signal y are given by m y (θ) = √ ρ vec(H I X) and respectively, with C YY = I M + ρ C 0 and covariance matrix C 0 specified in the following: The probability density function 1 (pdf) of the observations Y in the parameter θ is given by Computing the Jacobian of m y (θ) with respect to θ, the deterministic complex Fisher information matrix (FIM) denoted as J d θ,θ on the basis of the data Y is given by where The FIM J d θ,θ is a 2 × 2 block matrix.The deterministic CRB d is obtained as the inverse of the Fisher information matrix J d θ,θ ( The blocks (1, 1) and (2, 2) of the CRB d in (5) relative to the estimation of the channel coefficients h I and data symbols vec(X d ), respectively are given as follows where P A = A A H A −1 A H and P ⊥ A = I − P A denote the projection matrices on the column space of matrix A and its orthogonal complement, respectively.In the deterministic identifiability analysis that follows, we shall ignore C 0 (C 0 = 0) and hence C YY = I M , C yy = I M L .

IV. IDENTIFIABILITY
In this section, we derive sets of both sufficient and necessary conditions for the identifiability of vector parameter θ under the assumption that θ is a deterministic unknown parameter.Then, we propose an MP algorithm over a graph that determines the exact channel coefficients if the sufficient 1 For the sake of compactness, we adopt an identical notation f (Y|θ) to indicate the pdf of random variable (r.v.) Y in vector parameter θ or conditioned to r.v.θ when θ is assumed to be a deterministic unknown vector parameter or a r.v., respectively.identifiability conditions are satisfied.Finally, we show that the system is identifiable via semi-blind algorithms if the Karp-Sipser algorithm applied to the same graph yields an empty core paving the way to an analysis of asymptotically large networks based on core percolation properties.
In the framework of deterministic identifiability, we assume that vector parameter θ is deterministic and consider channel H 0 negligible.Then, the observation y is Gaussian distributed, i.e., y ∼ CN (m y (θ), I M L ) with covariance matrix independent of θ.The identifiability of θ relies only on the known mean m y (θ) and, for semi-blind methods, X d and h I are said to be identifiable [23] if Let m Y ∼0 be the expectation of Y in (2) obtained assuming H 0 negligible.The identifiability problem reduces to analyze the following bi-linear system of equations in the unknowns and determine under which conditions this system admits a unique solution, which is assumed to exist.These identifiability conditions are summarized in the following proposition.
PROPOSITION 1 Sufficient Identifiability Conditions -Let S k denote the support of the channel of user k, i.e., the set of all the indices m such that H I,m,k = 0, and let |S k | be its cardinality.In a semi-blind joint data detection and channel estimation method, the unknown parameters h I and X d are identifiable if (i) the K × L matrix X, with L ≥ K has full row rank K, (ii) the channel of each user is sparse and |S k | ≤ M − K + 1, and (iii) for each group of users G p utilizing the same ortho-normal pilot sequence x p , it is possible to identify a sequence {G p,1 , G p,2 , . . .G p,s } satisfying the following properties: 1) s j=1 G p,j ≡ G p , i.e., the sequence of subsets is a partition of G p 2) In the support of the channel of each user k ∈ G p,i there exists at least an index j ∈ S k that is not contained in any of the channel supports of other users in the same group G p,i or in the following groups of the sequence G p,i+1 , . . .G p,s .
REMARK 1 Condition iii-2 implies that the signal transmitted by each user k in G p,i impinges an AP in the disc M k centered around user k with radius γ and no other signal transmitted by other users in G p,i or subsequent subsets G p,i+1 , G p,i+2 , . . .G p,s impinges the same AP.

REMARK 2
The assumption that X has full row rank K implies that X d has at least rank K − P.
Proof: Observe that since in CF MaMIMO systems M K, and the channel matrix H I consists of independent channels, we can assume that it has full row rank equal to K with probability 1. Thanks to the assumptions of Proposition 1, also matrix X has full row rank equal to K as well as matrix m Y ∼0 .Then, the singular value decomposition (SVD) of the noise-free system is given by where U ∈ C M ×K and V ∈ C L×K are the matrices of the left and right singular-vectors and Σ is the K × K diagonal matrix of singular values.Additionally, the left and right singular value matrices U and V span the channel subspace H I and the signal space X, respectively.Then, the problem of identifiability reduces to determine a K × K non-singular matrix T such that H I = UT and then, also matrix X is unequivocally given by X = T −1 ΣV H .In order to determine matrix T, we utilize the following properties and information: • The support of each user channel is known and sparse and at least K − 1 channel coefficients are zero.• The contaminated channel.More specifically, let us consider the linear system of equations corresponding to the transmission of the pilot sequences, i.e., By post-multiplying both sides of the system by the pilot sequence x p , equal to one and and zero elsewhere.Then, it is apparent that this system of equations enables to determine exactly at each AP the sum of all the nonzero channel coefficients of the users in each group G p , p = 1, . . .P , i.e., 1  √ ρ m Y ∼0 p x (p) p = H 1 Gp .Then, let us focus on a user k in G p,1 .Thanks to the assumptions on the partition of G p , there exists at least an AP m such that H I,m,: p = 0, where H I,m,: denotes the m-th row of the matrix H I .Furthermore, thanks to the assumption on the sparsity of the channels, we can obtain from the system of equations H I,:,k = UT :,k K − 1 equations where the channel of user k is zero.Then, we can construct a non-homogeneous system of equations in the unknown T :,k and the vector of constant terms consisting of zeros and at least the non-zero element h mk .This system can be unequivocally solved to determine T :,k .Thanks to the properties of the sequence G p,1 , G p,2 , . . .G p,s , it is possible to determine sequentially, the columns of matrix T corresponding to a certain group, compute exactly the corresponding channels of the users in the group and cancel them from the contaminated channel for group G p until the complete computation of all the columns of matrix T corresponding to all the users in G p and the corresponding channels.This approach can be repeated for all the groups up to the complete computation of matrix T and channel H I .Then, we observe that T has full rank K since H I has full row rank.The inverse of T exists and enables the computation of X d .This concludes the proof.
In the following, let (H) Gp denote a reduced version of the matrix H containing only the columns corresponding to the users in G p .PROPOSITION 2 Necessary Identifiability Conditions -Iden-tification of h I and X d from the product H I X leads to the global necessary identifiability condition or the per pilot necessary identifiability condition Proof: Consider again the SVD in ( 9), Introducing again the unknown K × K mixture T, this leads to the equations which together represent K(M + P ) equations in the K k=1 |S k | unknowns h I and the K 2 unknowns T. The proper conditions for solvability of the equations (12), that the number of equations needs to be at least equal to the number of unknowns, then leads to (10).If now we consider the equations for group of users G p , multiplying T X p = Σ V H p by p and exploiting X p x (p) p = 1 Gp then we get which represents Gp and (T) Gp , hence leading to (11).
It is worth noting that the proof of Proposition 1 along with the sufficient conditions for identifiability of the deterministic parameters, provides also a constructive method to determine the unknown parameters H I and X d if for each p = 1, . . .P, the sequence {G p,1 , G p,2 , . . .G p,s } partitioning set G p were known.In the following, we address this problem and provide an MP algorithm that enables to identify at iteration i the set G p,i and determine the channel coefficients of all users in the set.Let us focus on the set G p and associate to each user k and AP m variable node k and factor node m, respectively.We construct a bipartite graph by connecting a variable node with a factor node if the distance between the corresponding user and AP is lower than γ.We further assume that the factor nodes are initialized with the values of the vector h c I,p = H I 1 Gp , i.e., the sum of all the channel coefficients of users in the corresponding γ-neighborhood.Each variable node knows the matrix U that spans the channel subspace.The initial step of the MP algorithm starts at the factor nodes.Each factor node m that is a leaf transmits its initialization value h c I,p,m to its neighbor.It transmits an erasure ∆ if it is not a leaf.At iteration i, each variable node k that has received at least a message that is not an erasure solves the system of equations UT :,k = H I,:,k utilizing that value.The construction of a system of K equations to determine T :,k is detailed in the proof of Proposition 1 and exploits the channel sparsity.Once T :,k is known, it is possible to determine all the non-zero channel coefficients H I,:,k .Then, variable node k transmits to all its neighbors the corresponding channel coefficients.Variable node k transmits the same messages in all the following iterations.If variable node k receives all erasures it transmits erasures to all its neighbors.The second step of iteration i determines the messages at the factor nodes.A factor node m computes a message for the output edge < m, k > as the difference between its initialization value h c I,p,m and all the incoming messages.The resulting message is not an erasure if all the incoming messages are not erasures otherwise the factor node transmits an erasure.The MP algorithm ends when all the channel coefficients have been determined and in this case the identifiability conditions are satisfied or when no additional erasure can be determined and thus the system is not identifiable.Set G p,i includes all the users/variable nodes that compute their channel coefficients at iteration i.Interestingly, this algorithm is closely related to the MP algorithm for decoding of low density parity check (LDPC) codes in transmissions through binary erasure channels in [28].It is worth noting that also for random generated CF MaMIMO systems with nodes independently generated, the corresponding graphs have edges intrinsically correlated due to the underlying geometric constraints and the corresponding sparse graphs do not have tree-like neighborhoods in asymptotic conditions.Then, the performance analysis of LDPC codes based on density evolution, see [28], is not directly applicable although the graph is sparse and the message passing yields exact results thanks to the noiseless nature of the considered system and thus the absence of error propagation.
Additionally, let us consider the Karp-Sipser or greedy leaf removal procedure [25]- [27] which consists in removing from a graph sequentially all the leaves and observe that sequential or simultaneous removal of leaves is equivalent in asymptotic conditions.Then, the sufficient identifiability conditions in Proposition 1 are satisfied if the greedy leaf removal procedure yields an empty core.

V. BAYESIAN SEMI-BLIND
Whereas deterministic parameter identifiability allows for consistency in SNR in the approximated model which ignores C 0 , in practice performance can be improved by furthermore exploiting prior information.Hence, exploiting the Rayleigh fading channel prior and capturing the uncorrelatedness and constant variance of the data symbols with an i.i.d.Gaussian prior, we get the overall log-likelihood ln f (Y|θ) + ln f (h where c t denotes a scalar constant.Alternating optimization with respect to h I and X d leads to where Q = Q( X d ) and H I denotes the estimate of the matrix H I .The relation between h I and H I is the same as the relation described for h I and H I .This alternating procedure can be initialized with X d = 0.

VI. NUMERICAL RESULTS
First, we describe the path loss and shadow fading models used in numerical simulations for performance evaluation.The large-scale fading coefficient β mk in (1) models path loss and shadow fading as follows where PL mk represents the path loss (expressed in dB), and 10 σ sh z mk 10 represents the shadow fading with standard deviation σ sh and z mk ∼ N (0, 1).The three-slope model in [29] is adopted for the path loss.The uplink transmit power is p = 100 mW, for all users.The performance of the Bayesian estimation is assessed by the normalized mean squared error (NMSE), avg h I − h I 2 avg h I 2 where avg stands for average.Fig. 1  , with M = 100 and K = 20.The Bayesian estimation outperforms the deterministic CRB.

VII. CONCLUSION
In this paper, we tackled the problem pilot contamination in CF MaMIMO systems leveraging only the channel sparsity.We considered semi-blind methods for joint channel estimation and data detection and derived the FIM and the CRB.Additionally, we determined sufficient conditions and necessary conditions for semi-blind identifiability under the assumption of deterministic parameters.An MP algorithm to verify identifiability and compute the channel coefficients was proposed and the relation with the Karp-Sipser procedure was highlighted.Finally, we proposed a Bayesian semi-blind approach resulting in an algorithm which alternates between channel estimation and linear multi-user detection.We verified the analytical derivations via numerical simulations.
where ρ denotes the transmit power at each user terminal normalized by the noise variance.Y ∈ C M ×L is a matrix of the L received symbols at the M APs and X ∈ C K×L is a matrix of the transmitted symbols.Note that the k-th row corresponds to the signals transmitted by user k.The matrix W ∈ C M ×L is the additive white Gaussian noise (AWGN) with i.i.d.components having zero mean and unit variance.Let X p ∈ C K×P and X d ∈ C K×(L−P ) denote the pilot sequences and data symbols, respectively.Then, X = [X p X d ].Similarly, Y = [Y p Y d ] where Y p ∈ C M ×P and Y d ∈ C M ×(L−P ) represent the matrices of received training and data signals, respectively.

= 1
and exploiting the ortho-normality of the training sequences, X p x (p) p Gp where 1 Gp is the K-dimensional vector with elements with indices in G p , i.e., indices corresponding to users transmitting pilot x (p)

PL mk 10 10 σ sh z mk 10
shows the NMSE versus disc radius γ and compares the NMSE of Bayesian estimation and deterministic CRB, avg tr{CRB d h I } avg h I 2