Converse Results for the Downlink Multicell Processing With Finite Backhaul Capacity

In this paper, we study outer bounds on the capacity region of the downlink multicell processing model with finite backhaul capacity for the simple case of two base stations and two mobile users. It is modeled as a two-user multiple access diamond channel. It consists of a first hop from the central processor to the base stations via orthogonal links of finite capacity and the second hop from the base stations to the mobile users via a Gaussian interference channel. The outer bound is derived using the converse tools of the multiple access diamond channel and that of the Gaussian MIMO broadcast channel. Through numerical results, it is shown that our outer bound improves upon the existing outer bounds greatly in the medium backhaul capacity range, and as a result, the gap between the outer bounds and the rate of the time-sharing of the known achievable schemes is significantly reduced.


Converse Results for the Downlink Multicell
Processing With Finite Backhaul Capacity Tianyu Yang, Nan Liu , Wei Kang , and Shlomo Shamai (Shitz) Abstract-In this paper, we study outer bounds on the capacity region of the downlink multicell processing model with finite backhaul capacity for the simple case of two base stations and two mobile users. It is modeled as a two-user multiple access diamond channel. It consists of a first hop from the central processor to the base stations via orthogonal links of finite capacity and the second hop from the base stations to the mobile users via a Gaussian interference channel. The outer bound is derived using the converse tools of the multiple access diamond channel and that of the Gaussian MIMO broadcast channel. Through numerical results, it is shown that our outer bound improves upon the existing outer bounds greatly in the medium backhaul capacity range, and as a result, the gap between the outer bounds and the rate of the time-sharing of the known achievable schemes is significantly reduced.
Index Terms-Multi-cell processing, diamond channel, converse, MIMO broadcast channel.

I. INTRODUCTION
T HE multi-cell processing system, as reviewed in [1], has been used to increase the throughput and to cope with the inter-cell interference. The downlink multi-cell processing system, when first considered, consists of different base stations linked to the central processor via backhaul links of unlimited capacity, and therefore, the amount of cooperation among the different base stations is unbounded. This network can be modeled by a MIMO broadcast channel and the sum-rate characterization was found in [2]. Later on, due to the impracticality of unlimited capacity backhaul links, Simeone et al. [3], Park et al. [4], Hong and Caire [5], Manuscript  Liu and Kang [6], and Yi and Liu [7] studied the problem of finding the capacity region of the downlink multicell processing system when the capacities of the backhaul links are finite, see Fig. 1. The papers proposed various achievable schemes to efficiently utilize the finite capacity backhaul links. More specifically, in [3], a compressed dirty-paper coding scheme is proposed, where the base stations are treated as the antennas of the central processor and the dirty-paper coding codewords for each antenna are compressed and transmitted on the backhaul links. The scheme is improved in [4] by allowing the quantization noise of the base stations be correlated. The scheme of reverse compute-and-forward was proposed in [5] where linear precoding is performed at the central processor and the backhaul links are used to transmit linear combinations of the messages over a finite field. Such linear precoding transforms the channel seen at each mobile user into a point-to-point channel where integer-valued interference is eliminated by precoding and the remaining noninteger residual interference is treated as noise. By regarding the network model as a multi-user diamond channel, an achievability scheme is proposed in [6] and [7] by combining Marton's achievability for the broadcast channel [8] and the achievability of sending correlated codewords over a multiple access diamond channel [9], [10]. The outer bound on the capacity region for this network is unknown except for the simple cut-set bound [11] on the sum rate, which is the minimum of the capacity between the first hop from the central processor to the base stations and that of the second hop from the base stations to the mobile users. Utilizing the simple cut-set bound, some constant-gap results for the multicell processing system with finite backhaul capacity 0018-9448 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. have recently been obtained [12]- [14]. When the capacity of the backhaul links are relatively large, the achievable sum rate of the scheme of compressed dirty-paper coding approaches that of the simple cut-set bound. On the other hand, when the capacity of the backhaul links are relatively small, the scheme of reverse compute-and-forward reaches the simple cut-set bound [6]. In the medium capacity region, there is still a relatively large gap between the simple cut-set upper bound and the sum rate of the time-sharing of the known achievable schemes, see Fig. 2. So it is unknown how well the proposed achievable schemes are and whether further efforts are needed in proposing better achievable schemes than existing ones for the downlink multicell processing system.
In this paper, we derive a novel outer bound on the capacity region of the downlink multicell processing network consisting of two base stations and two users. Similar to [6], we regard the network as a 2-user multiple access diamond channel. As a result, the converse is derived using the converse tools of the multiple access diamond channel used in [15] and [16] and that of the Gaussian MIMO broadcast channel used in [17]. The derived outer bound is expressed in terms of the capacity region of the Gaussian MIMO broadcast channel given input covariance constraint, which has been found in [17]- [19].
Comparing numerically the proposed outer bound, the cutset bound and the performance of various achievable schemes for the multicell processing system in terms of the sum-rate, we see that our outer bound improves upon the cut-set bound greatly in the medium backhaul capacity range, and as a result, the gap between the outer bounds and the time-sharing of the known achievable schemes is significantly reduced.
The remainder of this paper is organized as follows. In Section II, we provide the system model. In Section III, we derive a new outer bound for the capacity region as well as an upper bound on the sum capacity. The proof of the main result is provided in Section IV. Numerical results are provided in Section V, and this is followed by conclusions in Section VI.

II. SYSTEM MODEL
In this paper, we consider the downlink multicell processing system with two base stations and two users. This network model can be seen as the 2-user Gaussian multiple access diamond channel [6], see Fig. 3. The source node (central processor) can communicate with Relays (base stations) 1 and 2 via backhaul links of capacities C 1 and C 2 , respectively. The channel between the two relay nodes and the two destination nodes (mobile users) is characterized by where Y 1 and Y 2 are the received signal at Users 1 and 2, respectively, X 1 and X 2 are the input signals from Relays 1 and 2, respectively, U 1 , U 2 are two independent zero-mean unit-variance Gaussian random variables that are independent to (X 1 , X 2 ), and a, b ∈ R are the channel gains from Relay 1 to Destination 2 and Relay 2 to Destination 1, respectively. The transmitted signals at the two relays must satisfy the average power constraints: for any x n k that Relay k sends into the channel, it must satisfy Let W 1 and W 2 be two independent messages that the source node would like to transmit to Destination nodes (mobile users) 1 and 2, respectively. Assume that W k is uniformly distributed on {1, 2, · · · , M k }, k = 1, 2.
An (M 1 , M 2 , n, n ) code consists an encoding function at source node: two encoding functions at the relay nodes: f n k : {1, 2, · · · , 2 nC k } → X k , k = 1, 2, and two decoding functions at the destination nodes: The average probability of error is defined as Rate pair (R 1 , R 2 ) is said to be achievable if there exists a sequence of (2 n R 1 , 2 n R 2 , n, n ) code such that n → 0 as n → ∞. The capacity region of the 2-user Gaussian multiple access diamond channel is the closure of the set of all achievable rates pairs.

III. AN OUTER BOUND FOR THE CAPACITY REGION OF THE 2-USER GAUSSIAN MULTIPLE ACCESS DIAMOND CHANNEL
The existing simple cut-set upper bound on the sum capacity is where C sum MIMO (ρ) max denotes the capacity region of the broadcast channel described in (1) and (2) under a given covariance constraint, wherē X X 1 , X 2 T is the transmitted signal of the 2 antennas of the transmitter, and Y 1 and Y 2 are the received signals of the single-antenna Receivers 1 and 2, respectively. The input of the transmitter must satisfy a covariance constraint, i.e., where A B means that the matrix B − A is positive semidefinite. The capacity region of the MIMO broadcast channel, i.e., C MIMO (ρ), has been found in [17]- [19]. Note that the existing simple cut-set bound, i.e., (3), is the minimum of the capacity of Cuts C and D of Fig. 4. To improve upon the existing simple cut-set bound, our proposed outer bound on the capacity region of the 2-user Gaussian multiple access diamond channel needs the following definitions: 1) Since we are considering the entire capacity region and not just the sum capacity, we generalize C sum By varying over ν, C ν MIMO12 (ρ) and C ν MIMO12 (ρ) trace the capacity region of the 2-user broadcast channel described in (1) and (2) under the given covariance constraint in (4), whereX X 1 , X 2 T is the transmitted signal of the 2 antennas of the transmitter. 2) Since we are considering the cross-cuts A and B of Fig. 4, as well as Cuts C and D in the existing simple cut-set bound, we define for ρ ∈ [−1, 1] where f A (ρ) and f B (ρ) can be intuitively seen as the capacity of Cuts A and B of Fig. 4, respectively, when the transmitted signal of the two base stations are jointly Gaussian with correlation ρ.
Note that the cut-set bound for Cut C is C 1 + C 2 = f C (0). We would like to improve upon this bound by proving that when the transmitted signals of the second hop becomes correlated with correlation coefficient ρ, the rate of the first cut decreases from The intuition is that when independent information is sent via the two orthogonal links of the first hop, the rate sent over the first hop is C 1 + C 2 . To increase the rate sent over the second hop, we would like the two relays to send correlated codewords through the Gaussian channel. In order to do this, correlated information has to be sent via the two orthogonal links of the first hop. As a result, the information rate sent through the first hop can not be as high as C 1 + C 2 , and we will show that instead, the rate sent over the first hop decreases from f C (0) to f C (ρ). We are able to prove the decrease from f C (0) to f C (ρ) only for small enough ρ that satisfy ρ ∈ A x , where and ρ x is defined as where sgn(·) is the sign function of ·. With the above definitions, we are ready to state the main result of this paper.
Theorem 1: The rate pair (R 1 , R 2 ) is achievable only if it satisfies for μ ≥ 1, and where The proof of Theorem 1 is provided in Section IV.
As can be seen, the outer bound on the capacity region of the 2-user multiple access diamond channel in Theorem 1 is expressed in terms of the capacity region of the Gaussian MIMO broadcast channel with input covariance constraint. For completeness, we write the expressions for C ν MIMO12 (ρ) and which is achieved by dirty-paper coding. Here, and K is defined in (4). Remarks: 1) In Theorem 1, the upper bound of T m A (ρ), m = 12, 21, is proved using the cut-set bound from the four cuts, i.e., Cuts A, B, C and D of Fig. 4. The more difficult part is to prove that when 2 , which indicate that when correlated signals of correlation ρ are sent through the first hop, the rate sent across the first hop is reduced from C 1 + C 2 to C 1 + C 2 − 1 2 log 1 1−ρ 2 . The converse techniques we use to prove this include 1) the bounding of the correlation between the transmitted signals of the two relays via an auxiliary random variable [15], [16], which was inspired by Ozarow in solving the Gaussian multiple description problem [20]; 2) the single-letterization technique from [21, pp.  2) When C 1 and C 2 goes to infinity, which include the terms C 1 and/or C 2 also go to infinity. Therefore, the minimum over κ for T 12 Thus, the upper bound in (6) becomes and similarly, we have Hence, Theorem 1 coincides with the result of the capacity region of the MIMO broadcast channel where Relays 1 and 2 are the antennas of the MIMO transmitter, with individual antenna power constraints of P 1 and P 2 , respectively. From Theorem 1, we may obtain the following corollary which characterizes an upper bound for the sum capacity of the 2-user Gaussian multiple access diamond channel.
Corollary 1: An upper bound for the sum capacity of the 2-user Gaussian multiple access diamond channel is where T 1 (ρ) and T 2 (ρ) are defined as Proof: In Theorem 1, take μ = 1. We have , m = 12, 21 when we take κ = 0 in the minimum of κ in (8). When we take κ → 1, we have Noting that C 1 MIMOm (ρ) = C sum MIMO (ρ), m = 12, 21, we have proved Corollary 1. Comparing the result of Corollary 1 to the existing simple cut-set bound in (3), we see that Corollary 1 implies that the upper bound of is true since we have The upper bound of (10) is tighter than the existing simple cut-set bound of (3), as it further considers the capacities of Cuts A and B. Furthermore, the result of Corollary 1, i.e., (9), is strictly tighter than the cut-set bound in (10) because when ρ ∈ A x , x = a, b, the upper bound of T 2 (ρ) exists, which is smaller than T 1 (ρ). Thus, Corollary 1 provides a novel upper bound that is tighter than the existing simple cut-set bound of (3) on the sum-capacity of the channel.
To conclude this section, we make some discussions about the generalization of our upper bound of Theorem 1. By utilizing the same techniques as those used in the proof of Theorem 1, one can generalize the result of Theorem 1 to the case of 2 users and N relays, N ≥ 3. The improvement over the simple cut-set bound is still two folds: 1) improve upon the simple cut-set bound by considering all possible cuts, and 2) improve the cut-set bound of the first-hop from , with the mutual informations evaluated for jointly Gaussian random variables X 1 , X 2 , · · · , X N . Note the similarity of the N-relay case, N ≥ 3 and the 2-relay case where the cut-set bound of the first hop is improved from where the mutual information is evaluated for jointly Gaussian random variables X 1 and X 2 . We expect the improvement from to scale linearly with the number of antennas, as for the jointly Gaussian distribution scales linearly with N.
The idea and techniques of improving the rate upper bound of the first-hop from can be used in other related multi-cell processing scenarios. For example, the recent proposition of F-RAN [22] over C-RAN is to save the fronthaul usage by enabling caching and signal processing capabilities at the base station. The technique presented in this paper may provide a better characterization on the upper bound of the capacity of the F-RAN system. Another example is the wiretapped diamond channel problem [23], which considers a multicell processing problem such as the one depicted in Fig. 3 except that Destination node 2 is an eavesdropper, eavesdropping on the message requested by Destination node 1, and does not require any message of its own from the base stations. We believe the technique in this work will be useful in deriving an upper bound on the secret capacity in the wiretapped diamond channel problem. For

IV. PROOF OF THEOREM 1
Without loss of generality, we may assume a > 0. Since the case of a < 0 can easily be transformed into the case of a > 0 by defining X 2 = −X 2 and Y 2 = −Y 2 . This results in the following equivalent channel where −a is positive. The channel in (11) and (12) is equivalent to the channel in (1) and (2) because the transformation is a one-to-one mapping. For any sequence of (2 n R 1 , 2 n R 2 , n, n ) code, let X n k denote the input of Relay k into the n uses of the channel p(y 1 , y 2 |x 1 , x 2 ), and Y n k denote the corresponding output received at Receiver k, k = 1, 2. Due to the power constraint, we have Define a random variable Q that is independent of everything else and uniformly distributed on {1, 2, · · · , n}, further define Define the correlation coefficient between X 1 and X 2 as .
Hence, ρ * ∈ [−1, 1]. Define X X 1 X 2 T , and further define K as We can see that We first derive a modified cut-set bound that considers the cross-cuts, i.e., Cuts A and B of Fig. 4, as well as the cut of the first and second hop, i.e., Cuts C and D. We give this result in the following Lemma.
Lemma 1: When ρ * satisfies ρ * ∈ [−1, 1], we have The proof of Lemma 1 is provided in Appendix A, where converse techniques from the physically degraded broadcast channel [11] are used.
Next, we will use (19) to derive an upper bound on R 1 + μR 2 , for μ ≥ 1. The case of μ ≤ 1 can be derived similarly by swapping the indices 1 and 2 and the numbers b and a.
For any κ ∈ [0, 1), we have where the last step follows because C Using the upper bound for sum-rate R 1 + R 2 in (19), we have Note that (20) is valid when ρ * satisfies ρ * ∈ [−1, 1]. We remark here that we did not consider Cut D of Fig. 4 in our derivations because this cut would give us the upper bound of C μ MIMO12 (ρ * ), which we would obtain by setting κ = 0 in (20). Hence, it is redundant to consider this cut.
We now proceed to derive another upper bound on R 1 + μR 2 which is valid when ρ * satisfies ρ * ∈ A b . Using Fano's inequality, we have where (21) follows from the Markov Chain (W 1 , W 2 ) → (X n 1 , X n 2 ) → (Y n 1 , Y n 2 ), and (22) follows from [15, eq. (33)]. Using Fano's inequality, we further have and Thus, omitting the n term which will go to zero and n → ∞, from (22), (23) and (24), we have that for any λ ∈ 0, min μ 2 , 1 and λ = 1, where (25) follows from (23), (24) and the fact that λ ≤ 1 and μ ≥ 1, and (26) is because of W 2 → (X n 1 , X n 2 ) → (Y n 1 , Y n 2 ) forms a Markov chain. Furthermore, we have where (27) follows from (22) and the fact that λ ≥ 0, (28) follows from (26), and (29) follows by introducing a sequence of auxiliary random variables Z n and utilizing the fact that [15, eq. (34)] I (X n 1 ;X n 2 ) = I (X n 1 , X n 2 ; Z n )− I (X n 2 ; Z n |X n 1 )−I (X n 1 ; Z n |X n 2 ). where Z n is the output of the following memoryless Gaussian channel with Y n 2 being the input: and U 3 is a Gaussian random variable with zero mean and variance N 3 . Further define Z Z Q . We single-letterize (29) by single-letterizing three terms using the following Lemma.
Lemma 2: We have the following single-letterization: and λ(I (X n 1 ; Z n |X n 2 ) + I (X n 2 ; Z n |X n 1 )) ≤ nλ(I (X 1 ; Z |X 2 ) + I (X 2 ; Z |X 1 )) and The proof of Lemma 2 is provided in Appendix B, where techniques from [21, pp. 314, eq. (3.34)] is used. Then from (29), Lemma 2, and the fact that λ ≤ min( μ 2 , 1), we obtain the following single-letteriztion: where the mutual informations are evaluated using the joint distribution of defined random variables (X 1 , Next, we derive an upper bound on (32) by using the fact that the term p(y 1 , y 2 |x 1 , x 2 ) in (33), which refers to the channel in (1) and (2), and the term p(z|y 2 ) in (33), which refers to the channel in (30), are Gaussian channels. To derive an upper bound on (32), we provide an upper bound for the following three terms: . The upper bounds for the first two terms make use of standard Gaussian techniques, and are given by the following Lemma.
Lemma 3: We have The proof of Lemma 3 is provided in Appendix C, where the extremal inequality of [19, Corollary 4] is used. As for the third term where (34) follows because U , X 1 and X 2 defined in (14) and (31) satisfy the constraint of the optimization in (34) due to (18), and according to [17,Sec. III.A], we have (35). From (32), Lemma 3, and (36), we have: The above is true for any N 3 ≥ 0. Take N 3 as It can be seen that the value of N 3 in (38) is non-negative because we consider the case where ρ * satisfies ρ * ∈ A b . Plugging (38) into (37), we obtain . The above is true for any λ ∈ 0, min μ 2 , 1 and λ = 1. Thus, we have when ρ * satisfies ρ * ∈ A b . From (20) and (39), we have proved (6) of Theorem 1. The result of (7) of Theorem 1 can be derived similarly by swapping the indices 1 and 2 and changing the number b to a.

V. NUMERICAL RESULTS
To illustrate the tightness of the derived upper bound in Theorem 1, we plot and compare the upper and lower bounds on the sum capacity. More specifically, we plot the existing simple cut-set upper bound on the sum capcity in (3), the new cut-set upper bound of (10) implied by our result in Corollary 1, the new upper bound of Corollary 1, and the achievable sum rates of existing schemes for the 2-user Gaussian multiple access diamond channel.
The results are shown in Fig. 5 for the symmetric case of a = b = 0.9, P 1 = P 2 = 10 and C 1 = C 2 = C. We only plot the region of C ∈ [1, 3], since this is the interesting case where the existing simple cut-set upper bound and the existing lower bounds on the sum capacity do not meet. As can be seen, in the region of C ∈ [1.2, 2.55], the new cut-set bound of (10) improves upon the existing simple cut-set bound of (3), which means that in this region, it is beneficial to consider the cross-cuts in the cut-set bound, i.e., Cuts A and B of Fig. 4. In the region of C ∈ [1.05, 2], the upper bound of Corollary 1 improves upon the new cut-set bound of (10), which means that in this region, the derived upper bound of 1 The sum rate achieved by the achievable schemes of sending correlated codewords by the relays [6], the compressed dirty-paper coding allowing correlated quantization noise [4] and the reverse compute-and-forward scheme [5] are denoted by the solid, circled, and dashed lines, respectively. Furthermore, the sum rate of the time-sharing of all the existing achievable schemes, which is the largest known lower bound for the sum capacity, is denoted by the dot-dashed line. In the gap between the derived upper bound in Corollary 1, i.e., the diamond line, and the largest known lower bound for the sum capacity, i.e., the dot-dashed line, lies the sum capacity of the 2-user Gaussian multiple access diamond channel for this symmetric case. The gap is greatly reduced due to the outer bound derived in this paper, for example, at C = 1.4, the new gap using Corollary 1 of this paper is only 1/3 of the gap between existing inner and outer bounds. Since the gap between the existing inner bound and the new outer bound is not large, we may conclude that the existing achievable schemes perform reasonably well in this scenario.
In the case of a = b = 0.3, P 1 = P 2 = 10 and C 1 = C 2 = C, the results are shown in Fig. 6. Though the new cut-set bound in (10) still improves upon the simple existing cut-set bound, the improvement of the new upper bound in Corollary 1 over the new cut-set bound is insignificant. This is because the reduction from (10) to Corollary 1 depends on the set A x , x = a, b, and for smaller a, b values, A x is Upper and lower bounds on the sum capacity for the case of a = b = 0.3, P 1 = P 2 = 10 and C 1 = C 2 = C. Upper bounds on R 1 + 2R 2 for the case of a = b = 0.9, P 1 = P 2 = 10 and C 1 = C 2 = C. a smaller region around 0, where the improvement of f C (ρ) over f C (0) is small. However, note that multicell processing is more likely used for base stations that have strong links to all the users. Hence, the values of larger a, b is of more practical interest.
Finally, we plot the new cut-set upper bound and the result of Theorem 1 for the rate R 1 + 2 R 2 , for the case of a = b = 0.9, P 1 = P 2 = 10 and C 1 = C 2 = C in Fig. 7. The existing cut-set bound and the achievable schemes characterizes the sum rate only, and thus, are not plotted. Hence, we conclude that Theorem 1 offers a first converse result for arbitrary linear combinations of R 1 and R 2 . The triangle line depicts the upper bound on R 1 + 2 R 2 based on new cut-set bound, i.e., considering T m A (ρ), m = 12, 21 only, and the diamond line depicts the upper bound in Theorem 1, i.e., considering T m B (ρ) as well as T m A (ρ), m = 12, 21. The improvement of the diamond line over the triangle line illustrates the need to characterize the rate reduction of the first hop due to correlated codewords sent by the relays.

VI. CONCLUSION
In this paper, we derive a novel outer bound for the capacity of the 2-user Gaussian multiple access diamond channel. Through numerical results, we show that the derived outer bound greatly reduces the gap between known inner and outer bounds when the capacities of the backhaul links are in the medium range. Future research directions include deriving converse results for the multi-user Gaussian multiple access diamond channel, the case where there are receiver side information at one or more of the receivers, and the uplink multicell processing system by possibly exploiting uplink-downlink duality.
Based on the four cuts demonstrated in Fig. 4, we have the following cut-set upper bound on the sum capacity, i.e., R 1 + R 2 : 1) Considering Cut C, we have 2) Considering Cut B, we have two cases: a) For the case of |b| ≤ 1, where (41) follows from the fact that without loss of generality, we consider deterministic encoding at the source node, i.e., (X n 1 , X n 2 ) is a deterministic function of (W 1 , W 2 ), (43) follows from Fano's inequality, and (44) follows from the fact that we consider deterministic encoders and the Markov Chain W 1 → (X n 1 , X n 2 , W 2 ) → Y n 1 . DefineỸ n 2 as the following channelỸ whereŨ n is an i.i.d. sequence of Gaussian random variables with zero mean and variance 1 b 2 − 1, and it is independent of everything else. Note that given X n 2 ,Ỹ n 2 is a physically degraded version of Y n 1 . Furthermore, note the similarity betweenỸ n 2 = X n 1 + which means that we have Thus, from (44), we continue to write as follows while for the simplicity of presentation, we have dropped the 2 n n term, where (45) follows from the fact that given X n 2 ,Ỹ n 2 is a physically degraded version of Y n 1 . Define auxiliary random variables Following from (46), we have where (47) follows from the definition in (14) and Note that the sum capacity of the degraded broadcast channel where the input of the channel is X 1 given X 2 = x 2 and the outputs of the channel is Y 1 andỸ 2 , respectively, is given by [11] max Hence, for the particular p(v, x 1 ) as defined by the codebook and (49), we have Hence, following from (48) and (50), we have where (51) follows from the convexity of the log(·) function, (52) follows from the fact that the mean-squared error (MSE) of the optimal Bayes least square (BLS) estimator is smaller than that of the linear least squared (LLS) estimator, and (53) follows from (15) and (17). b) Similarly, for the case of |b| > 1, following from (42), we have = H (X n 2 ) + H (W 1 , W 2 |X n 2 ) = H (X n 2 ) + H (W 2 |X n 2 , W 1 ) + H (W 1 |X n 2 ) ≤ nC 2 + I (X n 1 ; Y n 2 |X n 2 , W 1 ) + I (W 1 ; Y n 1 |X n 2 ) + 2n n .
APPENDIX C PROOF OF LEMMA 3 First, we provide the upper bound on I (X 1 , X 2 ; Y 2 |Z ). We have where (69) follows from the distribution of (33), and (70) follows from [19,Corollary 4] and the fact that where in (71), we have used (15), and (72) follows from the definition of ρ * in (16). Then, using the above result, we have where (73) follows from (72), the fact that given variance, the Gaussian distribution maximizes the differential entropy, (70), and the fact that 0 ≤ λ ≤ μ 2 . 2) I (X 1 ; Z |X 2 ) + I (X 2 ; Z |X 1 ) Let us first calculate where (74) follows from the fact that given the covariance, the Gaussian distribution maximizes the differential entropy, (75) follows from the fact that the mean-squared error (MSE) of the optimal Bayes least square (BLS) estimator is smaller than that of the linear least squared (LLS) estimator, (76) follows from (15), and (77) follows from (17). Then, we have where (78) follows from (77), and finally, we have Similarly, we have Thus, from (79) and (80), we have I (X 1 ; Z |X 2 ) + I (X 2 ; Z |X 1 )