On the Multiplexing Gain of Discrete-Time MIMO Phase Noise Channels

The capacity of a point-to-point discrete-time multi-input-multiple-output (MIMO) channel with phase uncertainty (MIMO phase noise channel) is still open. As a matter of fact, even the pre-log (multiplexing gain) of the capacity in the high signal-to-noise ratio (SNR) regime is unknown in general. We make some progress in this direction for two classes of such channels. With phase noise on the individual paths of the channel (model A), we show that the multiplexing gain is <inline-formula> <tex-math notation="LaTeX">$\frac {1}{2}$ </tex-math></inline-formula>, which implies that the capacity <italic>does not</italic> scale with the channel dimension at high SNR. With phase noise at both the input and output of the channel (model B), the multiplexing gain is upper-bounded by <inline-formula> <tex-math notation="LaTeX">$\frac {1}{2} \min \{{ n_{\text {t}}}, (n_{\text {r}}-2)^{+}\! + 1\}$ </tex-math></inline-formula>, and lower-bounded by <inline-formula> <tex-math notation="LaTeX">$\frac {1}{2} \min \left\{{ n_{\text {t}}, \lfloor \frac {n_{\text {r}}+1}{2} \rfloor }\right\}$ </tex-math></inline-formula>, where <inline-formula> <tex-math notation="LaTeX">$n_{\text {t}}$ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$n_{\text {r}}$ </tex-math></inline-formula> are the number of transmit and receive antennas, respectively. The multiplexing gain is enhanced to <inline-formula> <tex-math notation="LaTeX">$\frac {1}{2}\min \{ n_{\text {t}}, n_{\text {r}}\}$ </tex-math></inline-formula> without receive phase noise, and to <inline-formula> <tex-math notation="LaTeX">$\frac {1}{2}\min \{2 n_{\text {t}}-1, n_{\text {r}}\}$ </tex-math></inline-formula> without transmit phase noise. In all the cases of model B, the multiplexing gain scales linearly with <inline-formula> <tex-math notation="LaTeX">$\min \{ n_{\text {t}}, n_{\text {r}}\}$ </tex-math></inline-formula>. Our main results rely on the derivation of non-trivial upper and lower bounds on the capacity of such channels.


I. INTRODUCTION
The capacity of a point-to-point multiple-input-multipleoutput (MIMO) Gaussian channel is well known in the coherent case, i.e., when the channel state information is available at the receiver [1], [2]. The capacity of the noncoherent MIMO channels, however, is still open in general. Nevertheless, asymptotic results of such channels, e.g., at high signal-to-noise ratio (SNR), have been obtained in some important cases.
In the seminal paper [3], Lapidoth and Moser proposed a powerful technique, called the duality approach, that can be applied to a large class of fading channels and derived the exact high SNR capacity up to an o(1) term. In particular, when the differential entropy of the channel matrix is finite, i.e., h(H H H) > −∞, it was shown in [3] that the pre-log (a.k.a. multiplexing gain), of the capacity is 0 and the high-SNR capacity is log log SNR + χ(H H H) + o(1) where χ(H H H) is the socalled fading number of the channel. In addition, capacity upper and lower bounds for the MIMO Rayleigh and Ricean channels were obtained and shown to be tight at both low and high SNR regimes. In [4], Zheng  The work of S. Shamai was supported by the Israel Science Foundation (ISF). The material in this paper was presented in part at the 2016 IEEE Information Theory Workshop.
with coherence time T , the pre-log is M * (1 − M * /T ) where M * min n t , n r , T 2 with n t and n r being the number of transmit and receive antennas, respectively. In this work, we are interested in the MIMO phase noise channels in which the phases of the channel coefficients are not perfectly known.
Applying the duality approach and the "escape-to-infinity" property of the channel input, Lapidoth characterized the high-SNR capacity of the discrete-time phase noise channel in the single-antenna case [5]. It was shown in [6] that the capacityachieving input distribution is in fact discrete. Recently, capacity upper and lower bounds of the single-antenna channels with Wiener phase noise have been extensively studied in the context of optical fiber and microwave communications (see [7], [8], [9] and the references therein). In these works, the upper bounds are derived via duality and lower bounds are computed numerically using the auxiliary channel technique proposed in [10]. In particular, in [9], Durisi et al. investigated the MIMO phase noise channel with a common phase noise, a scenario motivated by the microwave link with centralized oscillators. The SIMO and MISO channels with common and separate phase noises are considered in [11]. The 2 × 2 MIMO phase noise channel with independent transmit and receive phase noises at each antenna was studied in [12], where the authors showed that the multiplexing gain is 1 2 for a specific class of input distributions. For general MIMO channels with separate phase noises, estimation and detection algorithms have been proposed in [13], [14]. However for such channels, even the multiplexing gain is unknown, to the best of our knowledge.
In this work, we make some progresses in this direction. We consider two classes of discrete-time stationary and ergodic MIMO phase noise channels: model A with individual phase noises on the entries of the channel matrix, and model B with individual phase noises at the input and the output of the channel instead. The phase noise processes in both models are assumed to have finite differential entropy rate. For model A, we obtain the exact multiplexing gain 1 2 for any channel dimension, which implies that the capacity does not scale with the channel dimension at high SNR. For model B with both transmit and receive phase noises, we show that the multiplexing gain is upper-bounded by 1 2 min{n t , (n r −2) + +1}, and lower-bounded by 1 2 min{n t , nr+1 2 }. The upper and lower bounds coincide for n r ≤ 3 or n r ≥ 2n t − 1. Further, when receive phase noise is absent, the multiplexing gain is improved and we obtain the exact value of 1 2 min{n t , n r }. If the transmit phase noise is absent instead, the multiplexing gain becomes 1 2 min{2n t − 1, n r }.
The main technical contribution of this paper is two-fold. First, we derive a non-trivial upper bound on the capacity of the MIMO phase noise channel with separate phase noises. The novelty of the upper bound lies in the finding of a suitable auxiliary distributions with which we apply the duality upper bound [15], [16], [3]. It is worth mentioning that, the class of single-variate Gamma output distributions, as the essential ingredient that led to the tight capacity upper bounds on previously studied channels, are not suitable for MIMO phase noise channels in general. In this paper, we introduce a class of multi-variate Gamma distributions that, combined with the duality upper bound, allows us to obtain a complete pre-log characterization for model A and partially for model B. The second contribution is the derivation of the capacity lower bounds for model B, based on the remarkable property of the differential entropy of the output vector in this channel. Namely, we prove that, at high SNR, the pre-log of the said entropy can go beyond the rank of the channel matrix, min {n t , n r }, and scales as n r log SNR as long as n r ≤ 2n t − 1. The upper and lower bounds suggest that, with n r ≥ 2n t − 1 receive antennas, n t transmitted real symbols can be recovered at high SNR. This result has an interesting interpretation based on dimension counting. Let us consider the example of independent and memoryless transmit and receive phase noises uniformly distributed in [0, 2π). In this case, phases of the input and the output do not contain any useful information, only the amplitudes matter. Note that the n r output amplitudes are (nonlinear) equations of 2n t − 1 unknowns, namely, the n t input amplitudes and the n t − 1 relative input phases, assuming the additive noises are negligible at high SNR. It is now not too hard to believe that with n r = 2n t − 1 equations, the receiver can successfully decode the n t input amplitudes by solving the equations. This is however not possible with n r < 2n t − 1, in which case there are too many unknowns as compared to the number of equations. Nonetheless, we can reduce the number of active transmit antennas to n t < n t such that 2n t −1 ≤ n r , which means that the achievable multiplexing gain is . A formal proof in Section VI validates such an argument.
The remainder of the paper is organized as follows. The system model and main results are presented in Section II. Some preliminaries useful for the proof of the main results are provided in Section III. The upper bounds are derived in Section IV and Section V. We prove the lower bound for model B in section VI. Concluding remarks are given in Section VII. Most of the proofs are presented in the main body of the paper, with some details deferred to the Appendix.

Notation
Throughout the paper, we use the following notational conventions. For random quantities, we use upper case letters, e.g., X, for scalars, upper case letters with bold and non-italic fonts, e.g., V V V, for vectors, and upper case letter with bold and sans serif fonts, e.g., M M M, for matrices. Deterministic quantities are denoted in a rather conventional way with italic letters, e.g., a scalar  is a k-tuple or a column vector of (x n+1 , . . . , x n+k ); for brevity sometimes x k replaces x k 1 . For convenience, wherever confusion is improbable, elementary scalar functions applied to a vector, e.g., |x x x| or cos(θ θ θ), stand for a point-wise map on each element of the vector, and return a vector with the same dimension as the argument. We use (θ) 2π to denote (θ mod 2π), and (x) + = max{x, 0}. Γ(x) is the gamma function. We also use c 0 to represent a bounded constant whose value is irrelevant but may change at each occurrence. Similarly, c H is a constant that may depend on H H H but the value is irrelevant and bounded for almost all H H H.

A. Channel model
In this paper, we are interested in a class of discrete-time MIMO phase noise channels with n t transmit antennas and n r receive antennas, defined by where the deterministic channel matrix H H H belongs to a set H ⊂ C nr×nt of generic matrices 1 ; x x x t ∈ C nt×1 is the input vector at time t, with the average power constraint 1 N N t=1 x x x t 2 ≤ P ; the additive noise process {Z Z Z t } is assumed to be spatially and temporally white with Z Z Z t ∼ CN (0, I I I nr ); Θ Θ Θ t is the matrix of phase noises on the individual entries of H H H at time t; the phase noise process {Θ Θ Θ t } is stationary and ergodic, and is independent of the additive noise process {Z Z Z t }. Both {Z Z Z t } and {Θ Θ Θ t } are unknown to the transmitter and the receiver. Since the additive noise power is normalized, the transmit power P is identified with the SNR throughout the paper. The end-to-end channel is captured by the random channel matrix H H H h ik e Θ ik i,k . In this paper, we consider two types of discrete-time phase noise processes 2 according to the spatial structures, as shown in Fig. 1: • Model A refers to channels with phase uncertainty on the individual paths (path phase noise), such that the sequence {Θ Θ Θ t } has finite entropy rate It corresponds to the case where the phase information of the channel cannot be obtained accurately, e.g., in optical fiber communications. This model covers the channel with spatially independent phase noises as a special case. • Model B refers to channels with phase noises at the input and/or output, i.e., Θ ik = Θ R,i + Θ T,k . The vector Θ Θ Θ T Θ T,i nt i=1 contains the n t phase noises at the transmit antennas, and Θ Θ Θ R Θ R,k nr k=1 is the vector of the n r  phase noises at the receive antennas. This model captures the phase corruption at both the transmit and receive RF chains, e.g., caused by imperfect oscillators. We consider three cases of model B: leftmargin=.5in B1) with both transmit and receive phase noises such that Note that model B1 covers the case where both the transmitter and receiver use separate (and imperfect) oscillators for different antennas, whereas models B2 and B3 contain the case with centralized oscillators at one side and separate oscillators at the other side.
The capacity of such a stationary and ergodic channel is [3], [18] where the supremum is taken over all distributions with the average power constraint Our work focuses on the multiplexing gain r of such a channel, defined as the pre-log of the capacity C(P ) as P → ∞, r lim P →∞ C(P ) log P .

B. Main results
The main results of this work are summarized as follows, and are illustrated in Fig. 2. First, the case with common phase noise is rather straightforward from [9]. Proposition 1. With common phase noise, i.e., Θ Θ Θ t = Θ t 1 1 1 nr×nt and h({Θ t }) > −∞, the multiplexing gain is min{n t , n r } − 1 2 . Proof: The proof is provided in Appendix A.
Then our new results are on channels with separate phase noises either on the individual paths (model A) or at the input/output (model B) of the channel. Theorem 1. The multiplexing gain of model A is 1 2 . The above result shows that extra transmit and receive antennas do not improve the multiplexing gain of a channel with phase uncertainty on each path of the channel. The achievability of the single-antenna case was shown in [5]. Our main contribution lies in the converse, as will be shown in Section IV.

Theorem 2. The multiplexing gain of model B is
• upper-bounded by 1 2 min{n t , (n r − 2) + + 1}, and lowerbounded by 1 2 min{n t , nr+1 2 } with both transmit and receive phase noises, the upper bound is achievable when n r ≤ 3 or n r ≥ 2n t − 1; • min{ nr 2 , nt 2 }, with only transmit phase noise; • min{ nr 2 , n t − 1 2 }, with only receive phase noise. Interestingly, the multiplexing gain of model B depends on the number of transmit and receive antennas differently, which is rarely the case for previously studied point-to-point MIMO channels.
Remark II.1. As shown in Fig. 2, transmit phase noise is more detrimental than receive phase noise, and strictly so when n r > n t > 1. Intuitively, with transmit phase noise each transmitted symbol is accompanied by a different phase noise symbol, which means that no more than half of the total spatial degrees of freedom is available for useful signal. On the other hand, with receive phase noise, although half of the received signal dimension is occupied by phase noises, it is enough to increase the number of receive antennas to recover almost all transmitted symbols.
Remark II.2. Obviously, the multiplexing gain of model B1 is upper-bounded by that of models B2 and B3. Such a "trivial" upper bound is given by min{ nt 2 , nr 2 , n t − 1 2 } = min{ nt 2 , nr 2 }. When n r ≤ n t , the optimal multiplexing gain is nr 2 with phase noises at either side of the channel, whereas no more than is achievable with phase noises at both sides. These are the cases for which model B1 is strictly "worse" than both models B2 and B3. When n r ≥ 2n t − 1, with transmit phase min{n t − , n r } min{n t , n r } noise, the optimal multiplexing gain is nt 2 regardless of the presence of receive phase noise.
Remark II.3. Theorem 2 shows that, when n t = n r = 2 and 3, the exact multiplexing gain of model B1 is (nr−2) + +1 2 which gives 1 2 and 1, respectively. In contrast, the trivial upper bound provides 1 and 3 2 , respectively. These are the two cases of model B1 for which we obtain exact multiplexing gain that is strictly lower than that of models B2 and B3.
The remainder of the paper is dedicated to the proof of the main results. We start with some mathematical preliminaries.
Proof: The case with k = 2n is known and has been proved in [3]. In the following, we provide a simple proof for the general case of k, although we are only interested in the case k = 1 later in the paper. Let us define Then it readily follows from the definition of f k (λ) that To prove that f k (λ) is increasing with λ, it is enough to show that the derivative of f k (λ) with respect to λ is positive. Indeed, is the Jacobian matrix.
Lemma 3. If each element of the n-vector X X X is circularly symmetric with independent phases, and the probability density function (pdf) of X X X exists with respect to the Lebesgue measure, then Proof: Applying Lemma 2 twice, we readily obtain (4) To prove (5), we introduce Φ Φ Φ that is uniformly distributed in [0, 2π) n and independent of X X X and Θ Θ Θ, then Hence, (5) holds with the constant c 0 corresponding to max {|h(Θ Θ Θ) − n|, n log π}. where Proof: To prove (6), we introduce an auxiliary dis- is the Kullback-Leibler divergence, which yields (6). We proceed to prove (7), where we partition [0, 2π) n in such a way that cos(Θ Θ Θ) is a bijective function of Θ Θ Θ in each partition indexed by Ω; the first equality is from Lemma 2; the last inequality is from the boundedness of h(Θ Θ Θ), the fact that Ω only takes a finite number of values, and the application of (6).
Proof: This is a straightforward adaptation of the result in [3, Lemma 6.7-f] for the complex case. The real case can be proved by following the same steps. To be self-contained, we provide an alternative proof as follows. Define V x |V V V T x x x| 2 , and one can verify from the assumptions that We introduce an auxiliary pdf q(V x ) based on the Gamma distribution defined in (2) with some α ∈ (0, 1).

IV. CAPACITY UPPER BOUND FOR MODEL A
The capacity C(P ) in (1) of a stationary and ergodic channel is upper-bounded by the capacity of the corresponding memoryless channel up to a constant term. Following the footsteps of [3], [5], we have is the capacity of a memoryless phase noise channel with the same temporal marginal distribution as the original channel, and the supremum is over all input distributions such that E X X X 2 ≤ P ; using the fact that where r Θ is the differential entropy rate of the phase noise process, we can set c 0 = log(2π) − r Θ . Since we are mainly interested in the multiplexing gain, the constant c 0 does not matter, and it is thus without loss of optimality to consider the memoryless case in this section. The main ingredients of the proof are the genie-aided bound and the duality upper bound. In the following, we detail the five steps that lead to Theorem 1.

A. Step 1: Genie-aided bound
Let us define the auxiliary random variable U as the index of the strongest input entry, i.e., 3 Thus, we use X U to denote the element in X X X with the largest magnitude. It is obvious that U ↔ X X X ↔ Y Y Y form a Markov chain, and that U does not contain more than log n t bits. Assuming that a genie provides U to the receiver, we obtain the following upper bound 3 When there are more than one such elements, we pick an arbitrary one.

B.
Step 2: Canonical form Definition 2 (Canonical channel). We define the canonical form u, u = 1, . . . , n t , of the channel H H H as Note that the elements in the u th column of G G G (u) has normalized magnitudes. Now, with the information U from the genie, the receiver can convert the original channel into one of the canonical forms, namely, the form U .
where a min k,u |h −1 k,u |; (16) is due to the fact that reducing the additive noise increases the mutual information; we definẽ X X X a −1 X X X, and accordingly, In the following, we focus on upper-bounding the mutual information I(X X X; W W W | U ). Note that where the last equality comes from the fact that U is a function of X X X and thus a function ofX X X, sinceX X X is simply a scaled version of X X X. Therefore, it is enough to lower-bound h(W W W |X X X) and upper-bound h(W W W | U ) separately.

C.
Step 3: Lower bound on h(W W W |X X X) whereX U andX V have the largest and second largest magnitudes inX X X, respectively.
Proof: See Appendix B. It is worth mentioning that the above bound depends not only on the strongest but also on the second strongest input of the channel.

D. Step 4: Upper bound on h(W W W | U )
Upper-bounding h(W W W | U ) by a non-trivial but tractable function of the input distribution is hard in general. A viable way for that purpose is through an auxiliary distribution, also called the duality approach. The duality upper bound was first proposed in [15] and [16] for discrete channels and then derived for arbitrary channels in [3]. Namely, for any 4 pdf q(w w w), we have due to the non-negativity of the Kullback-Leibler divergence D(p W W W|U =u q). Hence, the key is to choose a proper auxiliary pdf q(w w w) in order to obtain a tight upper bound on the capacity of our channel. The commonly used auxiliary distributions for MIMO channels are mostly related to the class of isotropic distributions [3], [5], [9]. Unfortunately, the isotropic distributions are not suitable in our case. To see this, let us assume that an isotropic output W W W was indeed close to optimal. On the one hand, the pdf of an isotropic output W W W would only depend on the norm W W W which would be dominated by the largest input entry X U at high SNR. Therefore, the value of E − log q(W W W) would be insensitive to the number of active input entries. On the other hand, the lower bound on the conditional entropy h(W W W |X X X) is increasing with both of the largest input entries X U and X V , according to (19). Therefore, with an isotropic distribution q(w w w), the capacity upper bound E − log q(W W W) − h(W W W |X X X) would become larger when the second strongest input went to zero, i.e., only one transmit antenna was active. But this is in contradiction with the isotropic assumption, since if only one transmit antenna was active, then the output entries would be highly correlated and the output distribution would be far from being isotropic. In light of the above discussion, we are led to think that a good choice of q(w w w) should reflect not only the strongest input entry, but also the weaker ones. We adopt the following pdf built from the multivariate Gamma distribution in Definition 1, whereŵ 1 , . . . ,ŵ nr are the ordered version of w i 's with increasing magnitudes. Essentially, we let each W i be circularly symmetric and let the ordered version of (|W 1 | 2 , . . . , |W nr | 2 ) follow the multivariate Gamma distribution defined in Definition 1. Applying (3) in Lemma 3 and the order statistics (whence the term n r !) [20], we can obtain the pdf of W W W as written in (21). Remarkably, the differences between |W i | 2 and |W j | 2 , i = j, are introduced into the upper bound, which is crucial for bringing in the impact of individual input entriesX i 's other than the strongest entry as will be shown in the following.
Lemma 10. By choosing 0 < α i < 1, i = 1, . . . , n r , and 4 Formally, we should state that the probability measure Q corresponding to the density q(w w w) is such that P (· | U = u) is absolutely continuous with respect to Q. Throughout the paper, for brevity, we implicitly make the assumption to avoid such formalities. µ = min{P −1 , 1}, we have for model A, whereX U andX V are the strongest and second strongest elements inX X X, respectively.
Proof: The calculation is straightforward from the pdf (21), details are provided in Appendix C.

E. Step 5: Upper bound for model A
Combining (18), (19), (20), and (22) from the previous steps, we have where the inequality (24) comes from removing the negative term in (23); to obtain the last inequality, we apply (13) in Lemma 8 with p = 2 and the power constraint E |X U | 2 ≤ a 2 E X X X 2 ≤ a 2 P . Finally, we conclude from (14) and (17) that, for model A, which implies that the multiplexing gain is upper-bounded by By taking the infimum over α α α, we have r A ≤ 1 2 .

V. CAPACITY UPPER BOUND FOR MODEL B
In this section, we derive upper bounds for the three cases of model B, where the phase noises are on the transmitter and receiver sides of the channel. As in the previous section, it is enough to consider the memoryless case for our purpose.
A. Case B1: Transmit and receive phase noises Note that the multiplexing gain of this case is upper-bounded by that of case B2 and case B3, since we can enhance the channel by providing the information on the transmit or receive phase noises to both the transmit and receiver. In other words, the upper bound min{ nr 2 , nt 2 , n t − 1 2 } = min{ nr 2 , nt 2 } is still valid for this case. In the following, we show that we can tighten the upper bound nr 2 to (nr−2) + +1 2 with the duality upper bound using the multi-variate Gamma distribution. The proof is in the same vein as the proof for model A. Specifically, the first four steps are exactly the same as for model A, except for Step 3 in which the conditional entropy has a different lower bound, as shown below.
Lemma 11. For model B1, we have whereX U andX V have the largest and second largest magnitudes inX X X, respectively.

B. Case B2: Transmit phase noise
In this case, the received signal is Y Y Y = H H H(e jΘ Θ ΘT • X X X) + Z Z Z. The channel is characterized by the random matrix H H H = H H Hdiag{e jΘ Θ ΘT }. We shall show that the upper bound is min nt 2 , nr 2 . First, with more receive antennas than transmit antennas, i.e., when n r ≥ n t , we can inverse the channel without losing information,  (27) is maximized when X X X is circularly symmetric with n t independent phases. To see this, we introduce a vector of independent and identically distributed (i.i.d.) phases Φ Φ Φ uniformly distributed in [0, 2π) nt and show that, for any X X X, where we use the fact thatZ Z Z is circularly symmetric, and has the same distribution asZ Z Z e −jΦ Φ ΦZ Z Z. Therefore, to derive an upper bound, it is without loss of optimality to assuming that X X X is circularly symmetric with m independent phases. With this assumption, we have I(X X X; e jΘ Θ ΘT • X X X +Z Z Z) = I(|X X X|; e jΘ Θ ΘT • X X X +Z Z Z) + I(∠ X X X ; e jΘ Θ ΘT • X X X +Z Z Z | |X X X|) ≤ I(|X X X|; e j(Θ Θ ΘT+∠ X X X ) • |X X X| +Z Z Z) + I(∠ X X X ; e jΘ Θ ΘT • X X X | |X X X|) ≤ I(|X X X|; e j(Θ Θ ΘT+∠ X X X ) • |X X X| +Z Z Z | Θ Θ Θ T + ∠ X X X ) where the second inequality is obtain by providing Θ Θ Θ T + ∠ X X X to the output and the independence between Θ Θ Θ T +∠ X X X and |X X X|; the last inequality is from the capacity upper bound for a real-value Gaussian channel, and the fact that (Θ Θ Θ T + ∠ X X X ) 2π is uniformly distributed in [0, 2π); we defineZ Z Z e −j(Θ Θ ΘT+∠ X X X ) • Z Z Z. From (28), we get the upper bound nt 2 of the pre-log. In the following, we assume n r ≤ n t , and follow closely to the proof for model A in Section IV-A. We first apply a genie-aided bound, by providing the set of indices of the n r strongest inputs to the receiver. This information, also denoted by U , does not take more than log nt nr bits. Then we also associate with each U a canonical form G G G ( where a (σ max (H H H)) −1 ; we defineX X X a −1 X X X and accordingly, The next step is to derive a lower bound on h(W W W |X X X), where we assume that U = {1, . . . , n r } for notational convenience, and the last inequality is from Lemma 7. Finally, we derive an upper bound on h(W W W | U ) via duality using the following auxiliary distribution on the output W W W, where g α α α is the normalization factor which only depends on α α α and n r . Essentially, we let each W i be independent and circularly symmetric with the squared magnitude following a single-variate Gamma distribution with parameter (µ, α i ), as defined in (2) from Definition 1.
Proof: The following is straightforward from (30), Applying Jensen's inequality on the expectation over Z i , we get Plugging it back to (32), we readily have (31).
Finally, putting together (29) and (31), we obtain where, to obtain the last inequality, we apply (13) in Lemma 8 with p = 2, and the power constraint E |X i | 2 ≤ a 2 E X X X 2 ≤ a 2 P . Therefore, the multiplexing gain is upperbounded by Taking the infimum over α α α, we get nr 2 .

C. Case B3: Receive phase noise
First it is not hard to show the upper bound n t − 1 2 . It is enough to provide the n r − 1 relative angles, {Θ R,k − Θ R,1 } k=2...nr , to the receiver. The channel is now equivalent to the case with common phase noise Θ R,1 . Then we can apply Proposition 1, since Next, we show the upper bound nr 2 . To that end, we write where we define Z Z Z e −jΘ Θ ΘR • Z Z Z which is independent of Θ Θ Θ R since Z Z Z is circularly symmetric; the last equality follows since Y Y Y = e jΘ Θ ΘR • (H H HX X X + Z Z Z ) and thus Θ Θ Θ R is independent of (|Y Y Y|, H H HX X X + Z Z Z , H H HX X X). It remains to show that I(H H HX X X; |Y Y Y|) ≤ nr 2 log + P + c H . To prove this, it is enough to apply h(|Y Y Y|) ≤ nr 2 log + P + c H and to use the fact that h(|Y Y Y| | H H HX X X) = nr k=1 h(|Y k | | H H HX X X) is lower-bounded by some constant according to (9) in Lemma 7.

VI. CAPACITY LOWER BOUND FOR MODEL B
In this section, we derive a lower bound on the capacity of model B. For simplicity, we consider the class of memoryless Gaussian input distributions. Although the optimal input distribution has been proved to be discrete in [6], the use of a simple Gaussian input provides tight lower bounds on the prelog, which is enough for our purpose here. In the following, we only consider the memoryless phase noise channel which can be shown to have a lower capacity than the general stationary and ergodic channel when memoryless input is used. To see this, we write Thus, we focus on the single-letter mutual information I(X X X; Y Y Y) in the rest of the section. As in the previous section, we investigate the three cases separately.

A. Case B1: Transmit and receive phase noises
In this case, we use all the inputs with equal power, i.e., X X X ∼ CN (0, P nt I I I nt ). For convenience, let us rewrite the received signal as where X X X 0 ∼ CN (0, I I I nt ) is the normalized version of X X X; The mutual information of interest can be written as First the following lemma, which provides a lower bound on h(Y Y Y) in (33), is crucial for the achievability proof.
Proof: See Appendix D. Next, we derive upper bounds on the two negative terms in (33) as follows. The conditional differential entropy can be upper-bounded as where the second inequality is due to Lemma 7 and the third inequality is from Lemma 8 and the power constraint Note that the above lower bound holds when we substitute n t by any n t ≤ n t , i.e., by activating only n t transmit antennas. It is clear that when n r − n t + 1 ≥ n t , i.e., n r ≥ 2n t − 1, we should let n t = n t . Otherwise, we should decrease n t to balance between n r − n t + 1 and n t , which gives n t = nr+1 2 . This completes the proof of the lower bound for model B1.

B. Case B2: Transmit phase noise
In this case, we use n t min{n t , n r } input antennas and deactivate the remaining ones. The active inputs, denoted by X X X , are i.i.d. Gaussian, i.e., X X X ∼ CN (0, P n t I I I n t ). We rewrite the output vector as Y Y Y = H H H (e jΘ Θ Θ T • X X X ) + Z Z Z where H H H ∈ C nr×n t is the submatrix of H H H corresponding to the active inputs, and Θ Θ Θ T is similarly defined. It follows that I(X X X ; Y Y Y) = I(X X X ; . The latter is further upper-bounded by n t k=1 E log + |X k | + c H ≤ n t 2 log + P + c H according to (8) in Lemma 7 and (13) in Lemma 8. This shows the lower bound 1 2 min{n t , n r } on the multiplexing gain.

C. Case B3: Receive phase noise
As in Case B1, we let X X X ∼ CN (0, P nt I I I nt ). First h(Y Y Y) is lower-bounded in Lemma 13. Next, it readily follows from since we are in the same situation as in Case B1 whenΘ Θ Θ T is known. Finally, combining (34) and (39), we obtain a lower bound on the mutual information which provides the desired multiplexing gain.

VII. CONCLUSIONS AND DISCUSSIONS
In this work, we investigated the discrete-time stationary and ergodic n r × n t MIMO phase noise channel. We characterized the exact multiplexing gain when phase noises are on the individual paths and when phase noises are at either side of the channel. With both transmit and receive phase noises, upper and lower bounds have been derived. In particular, the upper bound results in this paper have been obtained via the duality using a newly introduced multi-variate Gamma distribution.
For model B1, the upper and lower bounds derived in this paper do not match for n r ∈ [4 : 2n t − 2]. We conjecture that the upper bound 1 2 min {n t , n r − 1} is indeed loose. Let us recall that the upper bound is obtained by lower-bounding h(W W W |X X X) with (25), and by upper-bounding E − log q(W W W) with q(w w w) being the multi-variate Gamma distribution. We believe that both bounds are loose for model B1 in general. First we can show that To see this, we can first write h(W W W |X X X) = h(W W W |X X X,Θ Θ Θ T ) + I(Θ Θ Θ T ; W W W |X X X), then upper-bound the first term with n r E log + |X U | + c H by following closely the steps as in (35)-(36), and upper-bound the second term with k =U E log + |X k | + c H by following closely the steps as in (37)-(38). As compared to the lower bound (25), the upper bound (40) differs only in the terms involvingX k , k ∈ {U, V }. In the following, we argue that even if the lower bound h(W W W |X X X) was the RHS of (40) -which is the largest that one could get as lower bound since it is also an upper bound -we still would not be able to tighten the multiplexing gain upper bound 1 2 min {n r − 1} with the same choice of auxiliary distribution q(w w w). In other words, for the given q(w w w), (25) is tight enough with respect to the upper bound on E − log q(W W W)} − h(W W W |X X X). To prove this, it is enough to observe that E − log q(W W W) does not involve any terms ofX X X other thanX U andX V in such a way to change the high SNR behavior, whereas h(W W W |X X X) is increasing with the strength of eachX k . Therefore, the maximization of E − log q(W W W) − h(W W W |X X X) overX X X will always put allX k , k ∈ {U, V }, to zero, even if h(W W W |X X X) hits the highest value (40). To sum up, if the current upper bound 1 2 min {n t , n r − 1} was indeed loose as we conjecture, one would have to first find a new auxiliary distribution q(w w w) in order to get a tighter upper bound. In particular, the new auxiliary distribution should be such that E − log q(W W W) depends onX k , k ∈ {U, V } at high SNR in a non-trivial way. With such a distribution, the second challenge is to find a lower abound on h(W W W |X X X) that also depends onX k , k ∈ {U, V }, in a non-trivial way. In fact, we conjecture that (40) holds with equality.
For model B2 and B3, the results have the following alternative chain rule interpretation. With transmit phase noise (model B2), the mutual information can be written as where the first term scales as min {n t , n r } log P as if the phase noise were part of the transmitted signal whereas the second part scales as 1 2 min {n t , n r } log P as if Θ Θ Θ were the input with a fixed distribution and X X X were the "fading" known at the receiver side. With receive phase noise (model B), the mutual information can be written differently as Here the first term corresponds to the rate when the phase noise is known, while the second term can be considered as the rate of a "reverse" channel with input Θ Θ Θ R , output X X X, and known fading Y Y Y. In both cases, the original problem of characterizing I(X X X; Y Y Y) boils down to subproblems involving channels without phase noise (i.e., I(X X X, Θ Θ Θ T and I(X X X; Y Y Y | Θ Θ Θ R )) and communications with fixed phase signaling (i.e., There are a few interesting future directions. First, it is possible to extend the results to multi-user channels and study the impact of phase noise to such systems. Second, the lower bound on model B1 suggests the following dimension counting argument: one can recover n t real information with 2n t − 1 real observations, since the remaining n t − 1 dimensions are occupied by the n t − 1 relative phase noises. How to design decoding algorithms that "solve" efficiently the 2n t − 1 nonlinear equations is a question of both theoretical and practical importance. Finally, a more refined analysis should lead to tighter upper and lower bounds on the capacity, beyond the pre-log characterization.

A. Proof of Proposition 1
With common phase noise, we can perform unitary precoding without losing information, and the channel is equivalent to a parallel channel with common phase noise Y Y Y t = e jΘt Σ Σ Σ x x x t + Z Z Z t = e jΘt σ σ σ • x x x t + Z Z Z t , where Σ Σ Σ is a diagonal matrix with the min{n t , n r } non-zero singular values of the matrix H H H and σ σ σ is a vector of these elements. From [9], we know that the multiplexing gain of a M × M channel is upper-bounded by M − 1 2 . This upper bound applies here with M = min{n t , n r }. The lower bound is achieved by using the Gaussian memoryless input X X X t ∼ CN (0, P nt I I I nt ), from which we have Applying a unitary transformation on e jΘ σ σ σ•X X X+Z Z Z, we obtain N h(e jΘ σ σ σ• X X X + Z Z Z | X X X) = N h(e jΘ σ σ σ • X X X + Z 1 | X X X) + N M k=2 h(Z k ) ≤ N E log + σ σ σ • X X X + N c H ≤ N 2 log + P + N c H where Z Z Z is the rotated version of Z Z Z and remains spatially white, the first inequality is from Lemma 7 and the second one is from Lemma 8. Finally, we have 1 N I(X X X N ; Y Y Y N ) ≥ min{n t , n r } − 1 2 log + P + c H , which completes the proof.

B. Proof of Lemma 9 and 11
In the following we shall derive the lower bounds (19) and (25) on the conditional differential entropy h(W W W |X X X) for model A and model B1, respectively.
First we shall show that, for both models, To that end, we analyze h(W i |X X X = x x x) with |x 1 | > |x 2 | > · · · > |x nt | ≥ 0 without loss of generality, i.e., we assume that U = 1 and V = 2. A lower bound of h(W i |X X X = x x x) can be obtained by considering the following cases separately.
• When |x 1 | ≥ 1 and |x 2 | ≤ 1, where g i1 e jΘi,1 is from the matrix G G G (1) defined in (15) since U = 1 by assumption; the second inequality is from Lemma 7 and the third inequality is from Lemma 8. • When |x 1 | ≥ 1 and |x 2 | ≥ 1, where the first inequality is from conditioning reduces entropy; we partition [0, 2π) 2 in such a way that e jΘi,1 g i1 x 1 + e jΘi,2 g i2 x 2 is a bijective function of (Θ i,1 , Θ i,2 ) in each partition indexed by Ω which takes a finite number of values; then we applied the change of variables from Lemma 2 and obtain (42) with φ ∠ gi1x1 − ∠ gi2x2 ; finally, we use the fact that |g i1 g i2 | is bounded for almost every H H H and the application of Lemma 5 to get the last inequality. Note that log |x k | = log + |x k | for k = 1, 2 by assumption. Combining the three cases above and taking expectation over X X X, we get (41).
where (43) is from the fact that W i only depends on W i−1 through the inputX X X and the phase noises {Θ l,1 , . . . , Θ l,nt } l<i ; where the last inequality is from a modified version of (41) by introducing {Θ l,1 , . . . , Θ l,nt } l<i in the condition.
2) Proof of the lower bound (25) for model B1: For model B1, we write where, according to (41), the first term is lower-bounded by In the following, we derive a lower bound on the second term. Let B i nt k=1 g ikXk e jΘ T,k where g ik is the channel coefficient without phase noise from the canonical form U defined in (15). Then where the first inequality is from conditioning reduces entropy; (46) is from Lemma 7. The conditional expectation can be lower-bounded as follows whereΦ ik ∠ g ikXk ; (48) is obtained by applying Lemma 6 where the equality is the application of the change of variables from Lemma 2; the last inequality is from Lemma 5. From (47) and (49), we get where the last inequality is from the application of (12) in Lemma 8 with p = 2 c H . Plugging (45) and (50) into (44), the lower bound (25) is obtained.
• The squared magnitude of each output where G G G T i is the i th row of the canonical matrix G G G (U ) defined in (15); (52) is due to Cauchy-Schwarz; and λ H H H is defined as • The difference of the squared magnitudes Note that the above upper bounds does not depend on i and k. Then, with the above bounds, we take expectation of the terms in (51), and obtain where the last inequality is from Lemma 8. Similarly, basic calculations lead to Taking expectation over X X X in (51), and plugging (53), (54), and (55) into it, we readily obtain (22).

D. Proof of Lemma 13
To prove Lemma 13, we deal with the cases n r = 2n t − 1 and n r = 2n t − 1 separately. Let us defineŶ Y Y andỸ Y Y such that For notational convenience, we define n n r and m n t in the following proof.
1) Case n = 2m − 1: First we show that (34) holds for where S S S ∈ R n−1 with S i |Ŷ i | 2 for i = 1, . . . , n − 1; the second inequality is from the chain rule and that adding the condition on the phase ofŶ n reduces entropy; the last equality is due toŶ n ∼ CN (0, m −1 h h h n 2 ). Next we need to show that h(S S S |Ŷ n ) > −∞. Intuitively, givenŶ n , S S S can be expressed as n − 1 = 2(m − 1) real functions of the 2(m − 1) real random variables Re{Ŷ m−1 }, Im{Ŷ m−1 } . Since h Re{Ŷ m−1 }, Im{Ŷ m−1 } = h(Ŷ m−1 ) is finite for almost every H H H, as long as the mapping is not degenerated, h(S S S |Ŷ n ) should be finite too. This argument is proved formally in the following.
Since for any generic H H H ∈ C n×m , any m rows of the matrix are linear independent, the remaining n − m rows can be written as linear combinations of these rows. Let us take the rows where (56) is due to the fact that the complex gradient of a real-valued function is a unitary transformation of the real gradient (see, e.g., [21, App.A6]); to obtain the last equality, we apply the identity det C C C D D D E E E F F F = det(C C C)det(F F F − E E EC C C −1 D D D). SinceŶ 1 , . . . ,Ŷ m−1 ,Ŷ n are jointly circularly symmetric Gaussian with finite and non-degenerate covariance for any generic H H H, there exists aŶ n circularly symmetric with non-zero bounded variance and independent ofŶ m−1 , such that where the last inequality is from E log |Ŷ 1,R − P 1,1Ŷ1,I | ≥ E log |Ŷ 1,R | ≥ c H due to the independence betweenŶ 1,R and Y 1,I and the application of Lemma 1.
Finally, recalling that T T T I Im{diag{b b b}B B B * }, we have log |det(T T T I )| > −∞ for any generic H H H, it follows from (58) that E log |det(N N N I )| is lower-bounded. By now, we have (12) in Lemma 8 with p = c H n . This completes the proof for the case n = 2m − 1.
2) Case n = 2m−1: Note that if (34) holds for n = 2m−1, then it also holds for n < 2m − 1 and n > 2m − 1. To see this, in the case with n < 2m − 1, we can add 2m − 1 − n receive antennas to have (Y Y Y, Y Y Y ) with Y Y Y being the extra outputs. Since (34) holds for h(Y Y Y, Y Y Y ) by assumption, then we have ≥ (2m − 1) log + P − (2m − 1 − n) log + P + c H = n log + P + c H , where the second inequality is from (34) and the fact that h(Y Y Y ) ≤ (2m − 1 − n) log + P + c H . When n > 2m − 1, we where the second inequality is from Lemma 7; the equality (59) is from the fact that h h h T k (e jΘ Θ ΘT • X X X 0 ) ∼ CN (0, h h h k 2 ).

ACKNOWLEDGEMENT
S. Yang would like to thank G. Durisi for helpful discussions and comments during the early stage of this work.