Information Bottleneck for an Oblivious Relay with Channel State Information: the Vector Case

This paper considers the information bottleneck (IB) problem of a Rayleigh fading multiple-input multiple-out (MIMO) channel. Due to the bottleneck constraint, it is impossible for the oblivious relay to inform the destination node of the perfect channel state information (CSI) in each channel realization. To evaluate the bottleneck rate, we provide an upper bound by assuming that the destination node can get the perfect CSI at no cost and two achievable schemes with simple symbol-by-symbol relay processing and compression. Numerical results show that the lower bounds obtained by the proposed achievable schemes can come close to the upper bound on a wide range of relevant system parameters.


I. INTRODUCTION
For a Markov chain X → Y → Z and an assigned joint probability distribution p X,Y , consider the following information bottleneck (IB) problem max where C is the bottleneck constraint parameter and the optimization is with respect to the conditional probability distribution p Z|Y of Z given Y .Formulation (1) was introduced by Tishby in [1], and has been used to interpret the behavior of deep learning neural networks [2].From a more fundamental information theoretic viewpoint, the IB arises from the classical remote source coding problem [3], [4] under logarithmic distortion [5].
An interesting application of the IB problem in communications consists of a source node, an oblivious relay, and a destination node, which is connected to the relay via an errorfree link with capacity C. The source node sends codewords over a communication channel and an observation is made at the relay.X and Y are respectively the channel input from the source node and output at the relay.The relay is oblivious in the sense that it cannot decode the information message of the source node itself.This feature can be modeled rigorously by assuming that the source and destination nodes make use of a codebook selected at random over a library, while the relay is unaware of such random selection.For example, in a cloud radio access network (C-RAN), each remote radio head (RRH) acts as a relay and is usually constrained to implement only radio functionalities while the baseband functionalities are migrated to the cloud central processor, particularly as the network size gets large [6].
Due to the oblivious feature, the relaying strategies which require the codebooks to be known at the relay, e.g., decodeand-forward, compute-and-forward, etc. [7]- [9] cannot be applied.Instead, the relay has to perform oblivious processing, i.e., employ strategies in forms of compress-and-forward [10]- [13].In particular, the relay must treat X as a random process, produce some useful representation Z, and convey it to the destination node subject to the link constraint C.Then, it makes sense to find Z such that I(X; Z) is maximized.
The IB problem for this kind of communication scenario has been studied in [14]- [17].In [14], the IB method was applied to reduce the fronthaul data rate of a C-RAN network.References [15] and [16] respectively considered Gaussian scalar and vector channels with IB constraint, and investigated the optimal trade-off between the compression rate and the relevant information.However, all references [14], [15], and [16] considered block fading channels, and assumed that the perfect channel state information (CSI) was known at both the relay and the destination node.In [17], the IB problem of a scalar Rayleigh fading channel was studied.Due to the bottleneck constraint, it is impossible to inform the destination node of the perfect CSI in each channel realization.An upper bound and two achievable schemes were provided in [17].
In this paper, we extend the work in [17] to the multipleinput multiple-out (MIMO) channel with independent and identically distributed (i.i.d.) Rayleigh fading.To evaluate the bottleneck rate, we first obtain an upper bound by assuming that the channel matrix is also known at the destination node with no cost.Then, we provide two achievable schemes where the first scheme transmits the compressed noisy signal as well as the quantized noise levels to the destination node, while the second scheme only transmits a compressed estimate.Numerical results show that with simple symbol-by-symbol relay processing and compression, the lower bounds obtained by the proposed achievable schemes can come close to the upper bound on a wide range of relevant system parameters.

II. PROBLEM FORMULATION
We consider a system with a source node, an oblivious relay, and a destination node.For convenience, we call the sourcerelay channel, 'Channel 1', and the relay-destination channel, 'Channel 2'.For Channel 1, we consider the following Gaussian MIMO channel with i.i.d.Rayleigh fading arXiv:2101.09790v2 [cs.IT] 7 May 2021 where x ∈ C K×1 and n ∈ C M ×1 are respectively zeromean circularly symmetric complex Gaussian input and noise with covariance matrices I K and σ 2 I M , i.e., x ∼ CN (0, I K ) and n ∼ CN (0, σ 2 I M ).H ∈ C M ×K is a random matrix independent of both x and n, and the elements of H are i.i.d.zero-mean unit-variance complex Gaussian random variables, i.e., H ∼ CN (0, I K ⊗ I M ).Let ρ = 1 σ 2 denote the signal-tonoise ratio (SNR).Let z denote a useful representation of y produced by the relay for the destination node.x → (y, H) → z thus forms a Markov chain.We assume that the relay node has a direct observation of the channel matrix H, while the destination node does not.Then, we consider the following IB problem max p(z|y,H) where C is the bottleneck constraint, i.e., the link capacity of Channel 2. In this paper, we call I(x; z) the bottleneck rate and I(y, H; z) the compression rate.Obviously, for a joint probability distribution p(x, y, H) determined by (2), problem ( 3) is a slightly augmented version of IB problem (1).In our problem, we aim to find a conditional distribution p(z|y, H) such that bottleneck constraint (3b) is satisfied and the bottleneck rate is maximized, i.e., as much as information of x can be extracted from representation z.

III. INFORMED RECEIVER UPPER BOUND
As stated in [17], an obvious upper bound to problem (3) can be obtained by letting both the relay and the destination node know the channel matrix H.We call the bound in this case the informed receiver upper bound.The IB problem in this case takes on the following form max p(z|y,H) In [15], the IB problem for a scalar Gaussian channel with block fading has been studied.In the following theorem, we show that for the considered MIMO channel with Rayleigh fading, (4) can be decomposed into a set of parallel scalar IB problems, and the informed receiver upper bound can be obtained based on the result in [15].
Theorem 1.For the considered MIMO channel with Rayleigh fading, the informed receiver upper bound is where T = min{K, M }, the probability density function (pdf) of λ, i.e., f λ (λ), is given by (53), and ν is chosen such that the following bottleneck constraint is met Proof: See Appendix A.
Proof: See Appendix B.

IV. ACHIEVABLE SCHEMES
In this section, we provide two achievable schemes where each scheme satisfies the bottleneck constraint and gives a lower bound to the bottleneck rate.
A. Quantized channel inversion (QCI) scheme when K ≤ M In our first achievable scheme, the relay first gets an estimate of the channel input using channel inversion and then transmits the quantized noise levels as well as the compressed noisy signal to the destination node.
In particular, we apply the pseudo inverse matrix of H, i.e., (H H H) −1 H H , to y, and get the zero-forcing estimate of x as follows For a given channel matrix H, ñ ∼ CN (0, A), where where A 1 and A 2 respectively consist of the diagonal and off-diagonal elements of A, i.e., A 1 = A I K and A 2 = A − A 1 .If H can be perfectly transmitted to the destination node, the bottleneck rate could be obtained by following similar steps in Appendix A. However, since H follows a non-degenerate continuous distribution and the bottleneck constraint is finite, this is not possible.To reduce the number of bits per channel use required for informing the destination node of the channel information, we only convey a compressed version of A 1 and consider a set of independent scalar Gaussian sub-channels.Specifically, we force each diagonal entry of A 1 to belong to a finite set of quantized levels by adding artificial noise, i.e., by introducing physical degradation.We fix a finite grid of J positive quantization points Then, by adding a Gaussian noise vector ñ ∼ CN (0, which is independent of everything else, to (8), a degraded version of x can be obtained as follows where n ∼ CN (0, A 1 + A 2 ) for a given H and Obviously, due to A 2 , the elements in noise vector n are correlated.
To evaluate the bottleneck rate, we consider a new variable where ng ∼ CN (0, A 1 ).Obviously, (11) can be seen as K parallel scalar Gaussian sub-channels with noise power a k B for each sub-channel.Since each quantized noise level a k B only has J possible values, it is possible for the relay to inform the destination node of the channel information via the constrained link.Note that from the definition of A in (8), it is known that a k , ∀ k ∈ K {1, • • • , K} are correlated.The quantized noise levels a k B , ∀ k ∈ K are thus also correlated.Hence, we can jointly source-encode a k B , ∀ k ∈ K to further reduce the number of bits used for CSI feedback.However, since the joint entropy of the quantization indices is difficult to obtain (even numerically, since it is a discrete joint distribution over J K possible values), in this work we consider the (slightly) suboptimal, but far more practical, entropy coding of each sub-channel quantization index separately.The resulting optimization problem becomes where H k denotes the entropy of a k B .In Appendix C, we show that a k , ∀k ∈ K are marginally identically inverse chi squared distributed with M −K+1 degrees of freedom.Hence, H k = H 0 − J j=1 P j log P j , where P j = Pr a B = b j and a follows the same distribution as a k .The pdf of a is given in (68), based on which the probability mass function (pmf) P j can be calculated as (69).In the following theorem, we give a lower bound to the bottleneck rate by solving IB problem (12).Theorem 2. If A 1 is conveyed to the destination node for each channel realization, by solving IB problem (12), the following lower bound to the bottleneck rate can be obtained where ρ j = 1 bj , c j = log ρj ν + , and ν is chosen such that the following bottleneck constraint is met Proof: See Appendix C. Since (11) can be seen as K parallel scalar Gaussian subchannels, according to [15, (16)], the representation of xg , i.e., ẑg , can be constructed by adding independent fading and Gaussian noise to each element of xg .Denote where Φ is a diagonal matrix with positive and real diagonal entries, and n g ∼ CN (0, I K ).Note that xg in (11) and its representation ẑg in ( 15) are only auxiliary variables.What we are really interested in is the representation of x and the corresponding bottleneck rate.Hence, we also add fading Φ and Gaussian noise n g to x in (10) and get its representation as follows In the following lemma we show that by transmitting quantized noise levels a k B , ∀k ∈ K and representation z to the destination node, R lb1 is an achievable lower bound to the bottleneck rate and the bottleneck constraint is satisfied.

Lemma 2.
If A 1 is forwarded to the destination node for each channel realization, with signal vectors x and xg in ( 10) and ( 11), and their representations z and ẑg in ( 16) and ( 15), we have where (17) indicates that I( x; z|A 1 ) ≤ C − KH 0 and (18) gives I(x; z|A 1 ) ≥ R lb1 .
Proof: See Appendix D.
Lemma 3. When M → +∞ or ρ → +∞, we can always find a sequence of quantization points where the expectation can be calculated by using the pdf of a in (68) and I(x; y, H) is the capacity of Channel 1.
Proof: See Appendix E.
For the sake of simplicity, we may choose the quantization levels as quantiles such that we obtain the uniform pmf P j = 1 J .The lower bound (13) can thus be simplified as and the bottleneck constraint (14) becomes where B = log J can be seen as the number of bits required for quantizing each diagonal entry of , from the strict convexity of the problem, we know that there must exist a unique integer 1 Hence, ν can be obtained from and R lb1 can be calculated as follows Then, we only need to test the above condition for l = 1, 2, 3, 21) has to be positive, i.e., B < C K .Moreover, though choosing the quantization levels as quantiles makes it easier to calculate R lb1 , the results in Lemma 3 may not hold in this case since the choice of quantization points

B. MMSE estimate at the relay
In the second achievable scheme, we assume that the relay first produces the MMSE estimate of x given (y, H), and then source-encode this estimate. Denote The MMSE estimate of x is thus given by Then, we consider the following modified IB problem max p(z| x) Note that since matrix HH H + σ 2 I K in (25) is always invertible, the results obtained in this subsection always hold no matter K ≤ M or K > M .
To evaluate the bottleneck rate I(x; z), we define an auxiliary Gaussian vector xg ∼ CN 0, E x xH , let zg denote its representation, and choose p(z| x) as well as p( zg | xg ) to be conditionally Gaussian distribution, i.e., where q ∼ CN (0, DI K ) is independent of everything else.Let Then, rate I( xg ; zg ) is achievable and D can be calculated from (29).Since I( x; z) ≤ I( xg ; zg ), I( x; z) is thus also achievable.
In the following, we obtain a lower bound to I(x; z) by evaluating h(z|H) and h(z|x) separately, and then using First, since z is conditionally Gaussian given H, we have (31) Next, using the fact that conditioning reduces entropy and Gaussian distribution maximizes the entropy over all distributions with the same variance [18, Theorem 8.6.5],we have where Combining ( 30), (31), and (32), we can get a lower bound to I(x; z) as shown in the following theorem.
Theorem 3.With MMSE estimate at the relay, a lower bound to I(x; z) can be obtained as follows where and the expectations can be calculated by using the pdf of λ in (53).
Proof: See Appendix F.

V. NUMERICAL RESULTS
In this section, we investigate the lower bounds obtained by the proposed achievable schemes and compare them with the upper bound derived in Section III.When performing the QCI scheme, we choose the quantization levels as quantiles for the sake of convenience.
In Fig. 1, the upper and lower bounds are depicted versus SNR ρ.It can be found that when ρ is small and 4 or 8 bits are applied to quantize the noise levels, the QCI scheme outperforms the MMSE scheme.As ρ grows large, R lb2 obtained by the MMSE scheme approaches C and is larger than R lb1 .This is because when ρ is small, the bottleneck rate is mainly limited by the capacity of channel 1, and the QCI scheme works well in this case since partial CSI, i.e., the noise level of each sub-channel, is conveyed to the destination node.When ρ is large, the MMSE scheme can get an accurate estimate and it does not require CSI feedback.The MMSE scheme thus performs better when ρ is large.
The effect of the bottleneck constraint C is investigated in Fig. 2. It can be found that as C increases, all bounds grow and converge to different constants, which can be calculated based on Lemma 1, Lemma 3, and Lemma 4, respectively.Fig. 2 also shows R lb2 virtually achieves the upper bound when C is small, while when C is large, the QCI scheme outperforms the MMSE scheme thanks to CSI feedback.
Fig. 3 depicts the bounds versus the number of relay antennas M .As M increases, R lb2 quickly approaches R ub .It is also shown that the result for the limit case in Lemma 3, i.e., when M → +∞, we can always find suitable quantization points B = {b 1 , • • • , b J } such that R lb1 → C, does not hold here.This is because when performing the QCI scheme, we choose the quantization levels as quantiles.The choice of quantization points

VI. CONCLUSIONS
This work extends the IB problem of the scalar case in [17] to the case of MIMO Rayleigh fading channels.Due to the information bottleneck constraint, the destination node cannot get the perfect CSI from the relay.Our results show that with simple symbol-by-symbol oblivious relay processing and compression, we can get bottleneck rate close to the upper bound on a wide range of relevant system parameters.

APPENDIX A PROOF OF THEOREM 1
Before proving Theorem 1, we first consider the following scalar Gaussian channel where x ∼ CN (0, 1), n ∼ CN (0, σ 2 ), and s ∈ C is the deterministic channel gain.With bottleneck constraint C, the IB problem for (37) has been studied in [15] and the optimal bottleneck rate is given by In the following, we show that (4) can be decomposed into a set of parallel scalar IB problems, and (38) can then be applied to get upper bound R ub in Theorem 1.
According to the definition of conditional entropy, problem (4) Then, for a given channel realization H = H, ŷ is conditionally Gaussian, i.e., we work with ŷ instead of y in the following.
Based on (39) and (41), it is known that MIMO channel p( ŷ|x, H) can be first divided into a set of parallel channels for different realizations of H, and each channel p( ŷ|x, H = H) can be further divided into T independent scalar Gaussian channels with SNRs ρλ t , ∀t ∈ T .Accordingly, problem (4) can be decomposed into a set of parallel IB problems.For a scalar Gaussian channel with SNR ρλ t , let c ub t denote the allocation of the bottleneck constraint C and R ub t denote the corresponding rate.According to (38), we have Then, the solution of problem (4) can be obtained by solving the following problem max Assume that λ t , ∀t ∈ T are unordered positive eigenvalues of HH H . 1 Then, they follow the same distribution.For convenience, define a new variable λ which follows the same distribution as λ t .The subscript 't' in c ub t and R ub t can thus be omitted.In order to distinguish from R ub in (5), we use R ub 0 to denote the bottleneck rate corresponding to c ub , i.e., Then, we have Problem (44) thus becomes This problem can be solved by the water-filling method.
Consider the Lagrangian where α is the Lagrange multiplier.The KKT condition for the optimality is Then, where ν = α/(1 − α) and it is chosen such that the following bottleneck constraint is met The informed receiver upper bound is thus given by (52) From the definition of H in (2), it is known that when K ≤ M (resp., when K > M ), H H H (resp., HH H ) is a central complex Wishart matrix with M (resp., K) degrees of freedom and covariance matrix I K (resp.,I M ), i.e., H H H ∼ CW K (M, I K ) (resp., HH H ∼ CW M (K, I M )) [20].Since λ can be seen as one of the unordered positive eigenvalues of H H H or HH H , its pdf is thus given by [20, Theorem 2.17], [19] where S = max{K, M } and the Laguerre polynomials are Substituting ( 53) and ( 54) into ( 52) and ( 51), ( 5) and ( 6) can be obtained.Theorem 1 is thus proven.

APPENDIX B PROOF OF LEMMA 1
In order to prove that R ub approaches C as M → +∞, we first look at the special case with K = 1.In this case, S = M and T = 1.From ( 54) and (53), we have L S−T 0 = 1 and the pdf of λ which shows that λ follows Erlang distribution with shape parameter M and rate parameter 1, i.e., λ ∼ Erlang(M, 1).
The expectation of λ is thus M .As M → +∞, f λ (λ) becomes a delta function [21].Hence, for a sufficiently small positive real number , lim Then, when M → +∞, the bottleneck constraint ( 6) based on which we get Using ( 5), (56), and (58), it is known that when M → +∞, Next, we consider the general case.For any positive integer K, when M → +∞, based on the definition of H and the strong law of large numbers, we almost surely have H H H − M I K → 0. Since HH H and H H H have the same positive eigenvalues, λ − M → 0 almost surely.(56) thus also holds for this general case.Then, based on which we get Hence, when M → +∞, Now we prove that R ub approaches C as ρ → +∞.From ( 6), it can be seen that ∞ ν ρ log ρλ ν f λ (λ)dλ reduces with ν.Therefore, when ρ → +∞, to ensure that constraint (6) holds, ν becomes large.Then, we have In addition, when C → +∞, it can be found from ( 6) that ν → 0. Using (5), we can get (7), which is the capacity of Channel 1.This completes the proof.

APPENDIX C PROOF OF THEOREM 2
Since ng ∼ CN (0, A 1 ) and a k B has J possible values, i.e., b 1 , • • • , b J , the channel in (11) can be divided into KJ independent scalar Gaussian sub-channels with noise power a k B = b j for each sub-channel.For the sub-channel with noise power a k B = b j , let c k,j denote the allocation of the bottleneck constraint C and R k,j denote the corresponding rate.According to (38), we have where ρ j = 1 bj .Since b J = +∞, we let R k,J = 0 and c k,J = 0. Note that based on [15, (16)], the representation of xg , i.e., ẑg , can be constructed by adding independent fading and Gaussian noise to each element of xg in (11).Denote Then, the optimal I(x; ẑg |A 1 ) is equal to the objective function of the following problem where Then, (H H H) −1 follows a complex inverse Wishart distribution and the diagonal elements of (H H H) −1 are identically inverse chi squared distributed with M − K + 1 degrees of freedom [22].Let η denote one of the diagonal element of (H H H) −1 .The pdf of η is thus given by Since A = σ 2 (H H H) −1 , the diagonal entries of A, i.e., a k , ∀k ∈ K, are marginally identically distributed.Let a denote a new variable which has the same distribution as a k .a thus follows the same distribution as σ 2 η and its pdf is given by In addition, P k,j , R k,j , and c k,j can be simplified to P j , R j , and c j by dropping subscript 'k'.Using (68), pmf P j can be calculated as follows Problem (66) thus becomes J−1 j=1 where Analogous to problem (47), (70) can be optimally solved by the water-filling method.The following lower bound to the bottleneck rate can thus be obtained where c j = log ρj ν + and ν is chosen such that the bottleneck constraint is met.Theorem 2 is then proven.

APPENDIX D PROOF OF LEMMA 2
Since Φ is a diagonal matrix with positive and real diagonal entries, it is invertible.Denote For a given A 1 , each element in n is Gaussian distributed with zero mean and variance a k B .However, n is not a Gaussian vector since H is unknown.Hence, z is not a Gaussian vector.As for ẑ g , from ( 11) and (15), it is known that ẑ g ∼ CN (0, We first prove inequation (17).
≤ E log det where (a) holds since Gaussian distribution maximizes the entropy over all distributions with the same variance, and (b) follows by using Hadamard's inequality.Denote Then, we prove inequation (18).Using the chain rule of mutual information, where (a) holds since for a given A 1 , both z k and ẑ g,k follow CN 0, 1 , and (b) follows since the elements in x and ẑ g are independent.
APPENDIX E PROOF OF LEMMA 3 As stated in Appendix B, when M + , and b 2 = +∞, where is a sufficiently small positive real number.Since A − σ 2 M I K → 0, we have P 1 → 1 and H 0 → 0.Then, from ( 13) and ( 14), When ρ → +∞, σ 2 → 0 and A → 0. By setting J = 2 and b 1 small enough, it can be proven as above that R lb1 → C.
When C → +∞, we could choose quantization points B = {b 1 , • • • , b J } with sufficiently large J such that the diagonal entries of A 1 , which are continuously valued, can be represented precisely using the discretely valued points in B, and the representation indexes of all diagonal entries can be transmitted to the destination node since C is large enough.On the other hand, as shown in (15), a representation of xg is where Φ is a diagonal matrix with positive and real diagonal entries, and n g ∼ CN (0, I K ).As C → +∞, according to [15, (17) and ( 20)], the diagonal entries of Φ Since Φ is a diagonal matrix with positive and real diagonal entries, it is invertible.Denote From (80) it is known that the elements in noise vector Φ −1 n g have zero mean and very small (approaches 0) power when C → +∞.Hence, (x, ẑ g ) → (x, xg ) in distribution.Then, based on [23], we have In addition, since Gaussian noise vector ng (defined in (11)) is independent of x and Φ −1 n g in (81) is independent of both x and ng , x → xg → ẑ g forms a Markov Chain.Then, according to data-processing inequality, we have Combining ( 83) and (82), we have showing that the limit lim inf C→+∞ I(x; ẑ g |A 1 ) exists and it is equal to I(x; xg |A 1 ).Then, when C → +∞, On the other hand, the capacity of Channel 1 is given by To prove that (85) is upper bounded by (86), we first give and prove the following lemma.
Lemma 5.For any K-dimensional positive definite matrix N , let N 1 = N I K , i.e., N 1 consist of the diagonal elements of N .Then, Proof: Obviously, (87) is equivalent to To prove (88), we introduce an auxiliary function g 1 (x) = log det (xI K + N 1 ) − log det (xI K + N ) and show that g 1 (x) decreases monotonically w.r.t.x when x ≥ 0. By taking the first-order derivative to g 1 (x), we have To prove g 1 (x) ≤ 0, we show in the following that for any positive definite matrix O, we always have where O 1 consists of the diagonal elements of O, i.e., Since O is a positive definite matrix, the entries of o and θ are real and positive.In addition, according to the Schur-Horn theorem, o is majorized by θ, i.e., o ≺ θ.
(91) Define a real vector u Using (92), we have based on which we get g 1 (x) ≤ 0 and (87) can then be proven.
Then, from (85), (86), and Lemma 5, it is known that when C → +∞, where the expectation can be calculated by using the pdf of a in (68).Lemma 3 is thus proven.

APPENDIX F PROOF OF THEOREM 3
As stated in Appendix A, U ΛU H is the eigendecomposition of HH H and λ t , ∀t ∈ T are unordered positive eigenvalues of HH H .To derive R lb2 , we further denote the singular value decomposition of H by U LV H , where V ∈ C K×K is a unitary matrix and L ∈ R M ×K is a rectangular diagonal matrix.In fact, the diagonal entries of L are the non-negative square roots of the positive eigenvalues of HH H .Then, from (25), we have where 0 K−T is a (K − T )-dimensional all '0' column vector.
Based on (95), where 1 K−T is a (K − T )-dimensional all '1' column vector.Since Λ is independent of U , L is independent of U as well as V , and λ t , ∀t ∈ T are unordered, we have Then, we calculate G in (33).For this purpose, we have to calculate E F H H , E F H HH H F , and E F H F .To get these expectations, we consider two different cases, i.e., the case with K ≤ M and the case with K > M .When K ≤ M , from (95), we have When K > M , denote V = (v 1 , • • • , v K ).Then, from (95), Since v m is the eigenvector of matrix H H H and is independent of unordered eigenvalue λ m , we have Similarly, we also have Using ( 98), (100), (101), and (33), G can be calculated as   97) and ( 103) into (31) and (32), respectively, and using (30), we can get (34).
and define the following ceiling operation a B = arg min b∈B {a ≤ b}.
Lemma 1.When M → +∞ or ρ → +∞, upper bound R ub tends asymptotically to C. When C → +∞, R ub approaches the capacity of Channel 1, i.e., Let U ΛU H denote the eigendecomposition of HH H , where U is a unitary matrix whose columns are the eigenvectors of HH H , and Λ is a diagonal matrix whose diagonal elements are the eigenvalues of HH H . Since the rank of HH H is no greater than T = min{K, M }, there are at most T positive diagonal entries in Λ. Denote them by λ t , where t ∈ T and T