Oblivious Fronthaul-Constrained Relay for a Gaussian Channel

We consider systems in which the transmitter conveys messages to the receiver through a capacity-limited relay station. The channel between the transmitter and the relay station is assumed to be a frequency-selective additive Gaussian noise channel. It is assumed that the transmitter can shape the spectrum and adapt the coding technique so as to optimize performance. The relay operation is oblivious (nomadic transmitters), that is, the specific codebooks used are unknown. We find the reliable information rate that can be achieved with Gaussian signaling in this setting, and to that end, employ Gaussian bottleneck results combined with Shannon’s incremental frequency approach. We also prove that, unlike classical water pouring, the allocated spectrum (power and bit rate) of the optimal solution could frequently be discontinuous. These results can be applied also to a MIMO transmission scheme. We also investigate the case of an entropy-limited relay. We show that the optimal relay function is always deterministic, present lower and upper bounds on the optimal performance (in terms of mutual information), and derive an analytical approximation.


I. INTRODUCTION
R ELAYING exploits intermediate nodes to achieve commu- nication between two distant nodes.Elementary relaying can be coarsely divided into compress-and-forward (of which amplify-and-forward is viewed as a special case) and decode-andforward, depending on whether the relays decode the transmitted message or just forward the received signal to the destination.In this paper we examine the "oblivious" relay system.The oblivious approach constructs universal relaying components serving many diverse users and operators and is not dependent on a priori knowledge of the modulation method and coding.This approach might benefit systems used in 'cloud' communication and was investigated, for example, in [1].
Consider the system in Fig. 1.The information source U, H(U ) ≤ R[nats/sec] is encoded into Gaussian symbols X and transmitted via a Gaussian scalar channel; the relay compresses the received symbols, Y , encodes them into a bitstream B and forwards it (without errors) to the final user's destination by a finite rate link H(B) ≤ C [nats/sec].At the destination, a decompressor decodes the bit-stream into symbols Z which are now an input for the receiver for estimation of U i.e.Û .
For the user, the relay operation is hidden as it transmits the symbol X and receives symbol Z, while the effective channel is governed by the transition probability P Z|X (z|x).This setting provides the user a memoryless communication channel that forwards symbols from the transmitter to the receiver.We choose X to be Gaussian because of its optimality subject to a large bitrate constraint C and because of its ubiquitous applications.In this setting, the user faces the familiar memoryless communication  channel and can choose freely how to utilize it, e.g. the user can select a good error correcting code and change the codes after the oblivious system was already implemented.The serving system is oblivious of the channel code used (see [2] for a more rigorous presentation of obliviousness).The relay performs lossy compression of the output of the Gaussian channel and is implemented by source coding.The trade-off between compression rate and mutual information between channel input and compressed channel output has closed-form expressions for the scalar and vector case using the Gaussian Information Bottleneck (GIB) theorem [3], [4] and [5].This deviates from the classical remote ratedistortion approach [6], [7], [8] (rate distortion for sub-Nyquist sampling scheme) and [9] (sampling stationary signals subject to bit-rate constraints), since the distortion is measured by the equivocation h(X|Z) instead of by the M M SE = E(X − Z) 2 .Since the distribution of X is fixed, minimizing h(X|Z) means maximizing I(X; Z) = h(X) − h(X|Z).
We further discuss the oblivious relay and focus our attention on the quantization process.We examine simpler quantizers which can be implemented by the standard Lempel-Ziv algorithm instead of source coding.The performance of such quantizers that are optimal relative to an entropy constraint was studied for a wide class of memoryless sources (e.g.[10], [11] and [12]).Notwithstanding, it is interesting to investigate the effect of such a constraint on the relay operation.
Our Contribution: In this paper, we provide a further generalization of the GIB for the case of a frequency selective additive Gaussian channel (some preliminary results were presented in [13]).We find the reliable information rate that can be achieved in this setting, and to that end employ Gaussian bottleneck results [3] combined with Shannon's incremental frequency approach arXiv:1510.08202v3[cs.IT] 5 Jan 2018 2: A finite rate relaying operation over a fronthaul AWGN frequency selective channel.[14].The incremental approach leads to a clear solution for the frequency-selective channel setting.Analysis of frequency-flat channels and MMSE optimization was reported in [15] and [16].Furthermore, in Section V we present lower and upper bounds for the mutual information between the transmitter and the receiver when the entropy constraint is placed on the relay.
The remainder of this paper is organized as follows: Section II provides the system model.Section III outlines preliminaries in which we summarize quantization alternatives (III-A), demonstrate the advantages of stochastic quantizers (III-B), show that the optimal transmitting scheme dictates independent z i (III-C), provide the required background and definitions for the GIB (III-D) and review the classical water-pouring method (III-E).In Section IV, we review the main results relevant to frequency-flat channels from [4], [14] and present the new derivation for frequency selective channels and infinite-processing-time.Section V is dedicated to the finite entropy quantizer.Further derivations and proofs can be found in Section VI. Conclusions and proposals for future work are found in Section VII.
Notation: X is a random variable.x is a realization of a random variable.We use boldface letters for column vectors and sequences.The expectation operator is denoted by E[•] and we follow the notation of [17] for entropy H(•), differential entropy h(•), and mutual information I(•; •).A probability mass/distribution function is denoted by P (•) or p(•), respectively.All logarithms are natural and the unit of information is nats unless stated otherwise.

II. SYSTEM MODEL
Consider the system depicted in Fig. 2. x(t) is the input signal, assumed to be Gaussian, H(f ) is the frequency response of the channel linear filter, and the impulse response F −1 [H(f )] = h(t) (here, F, F −1 designate the Fourier response and its inverse) where n(t) is normalized additive white Gaussian noise with onesided power spectral density N 0 = 1[W att/Hz], and * designates convolution.We are interested in the normalized mutual information when standard coding theorems [18] guarantee that the associated rate can be reliably transmitted through the system Denote X T −T as (X(t), −T ≤ t ≤ T ), Z is the output vector (containing the compressed channel outputs Z i ), which is entropy constrained by H(Z) ≤ nC[nats/sec], The information in ( 1) is also measured in terms of [nats/sec].n denotes the number of symbols in a transmitted block and is of the dimension of Z. Again, we seek the (one-sided) power spectral density of the input Gaussian process S x (f ) which maximizes I C n (X; Z) under an average power constraint in some bandwidth W :

A. Quantization Alternatives
Denote by X, Y, Z the channel input, the received signal and the quantized output, respectively.Our system will try to maximize I(X; Z) which clearly determines the maximal information rate R of the whole system if the user utilizes good error correcting codes, while minimizing the bit-rate of the sequence B.Here we list some possible approaches to quantization: 1) Using the channel code in the serving subsystem: If the serving subsystem would not be oblivious, it would decode the original information ( Û ) and send it as the sequence B. In this case R = min(C, radio channel capacity) would be achieved.But in this work the serving subsystem is oblivious.
2) MUtual Information Constrained -Stochastic Quantizer (MUIC-SQ): A class of oblivious quantizers is stochastic, as mentioned for example in [19].For each channel output Y i , a compressed representation Z i is obtained by a stochastic quantizer characterized by the probability mass function P Z|Y (z i |y i ) chosen to maximize I(X i ; Z i ); then Z i is compressed and sent to the user's decoder using the bit rate C = I(Y i ; Z i ).The practical implementation is by means of source coding on sequences.The received sequence Y is encoded into the sequence of bits B and the destination recovers the sequence Z from B. The serving system bit rate can be limited to C = I(Y i ; Z i ).A proof following the steps of the source coding theorem [17] can be constructed.The probability mass function P Z|Y (z i |y i ) will set I(X i ; Z i ) and, thus, enable a system communication rate of R = I(X i ; Z i ) by the classic channel coding theorem.
Letting the quantizer be stochastic improves performance, similarly to a corresponding advantage of source coding over memoryless deterministic quantization [17].In [20], Koch treats a stochastic quantizer, where the randomness is limited to dither known to the quantizer; this is a special case and may be considered a deterministic time-varying quantizer.
The optimal stochastic quantizer for Gaussian signals is the GIB, and was thoroughly analyzed in [4] and [21].
The GIB is a corner stone in this paper and its attributes will be specified in the upcoming section.
3) Entropy Constrained Stochastic Quantizer (EC-SQ): The entropy constrained stochastic quantizer (EC-SQ), works in the same way as MUIC-SQ, except that the entropy of the compressed channel output Z i , H(Z i ) is bounded to be less than C. Entropy compression schemes such as Huffman or Lempel-Ziv are added after the quantizer, as suggested in the literature.
It is evident that in terms of mutual information I(X i ; Z i ), the EC-SQ is inferior to MUIC-SQ since thus enforcing a tight constraint on P Z|Y (z i |y i ).For the Gaussian case the upper bound is the I GIB .
4) Entropy Constrained Deterministic Quantizer (EC-DQ): This quantizer assumes deterministic mapping , where f (•) is some function on the channel output Y i .It is clear that for general channels it is inferior to the EC-SQ, as the deterministic domain is a subset of the stochastic domain.In the AWGN channel, there is a deterministic quantizer with identical performance.This can be proven using the following steps: • Split the range of the channel output Y i into small segments.
• Perform a hair splitting operation on each segment in order to have deterministic mapping that would yield the desired transfer function from Y i to Z i .See rigorous proof in Appendix VIII-A.

5) Memoryless deterministic quantizer:
The received signal Y is mapped to a discrete valued variable Z by a deterministic function.The function is optimized for mutual information I(X; Z) per symbol with a constraint on the number of bits, or alphabet size of Z, required to represent the quantizer output symbol.The optimization can be done by the Lloyd algorithm.This is well covered by published papers, e.g.[22], which also show that the optimal probability distribution function of the transmitted signal X is discrete in many cases.
6) Vector quantizers: Assume a vector compression scheme in which we group a few variables Z into small n -length vectors Entropy coding will still be possible, now over Z k instead of the scalars Z.This possibility leads to the following observations: • With large n we can implement the full GIB by compressing the sequence Y into sequence Z by the MMSE criterion under the constraint of bit-rate C. • Vector quantizers provide many intermediate performance levels starting at the deterministic entropy-constrained quantizer (n = 1) and up to the GIB quantizer.The advantage of the stochastic quantizer over the entropy constrained quantizer is the advantage of source coding over a scalar quantizer.Next, we shall present some attributes of the stochastic quantizer.

B. Demonstrating the advantages of the stochastic quantizer
The advantage of the stochastic quantizer is demonstrated by a numerical example, see Fig. 3.We examine the case of a Gaussian X over an AWGN channel with a quantization rate C = 1[bits/symbol].In the memoryless deterministic quantizer case, the quantizer is the sign of the received signal.Using Kindler [23], the sign of the received signal is the optimal one bit memoryless deterministic quantizer, and not necessarily the optimal entropy constrained deterministic quantizer.The curve in Fig. 3 is the numerical evaluation of E X,Z log P (z|x) P (z) .In the stochastic case we have, from [21], the GIB The results in Fig. 3 show the clear superiority of the stochastic quantization over the deterministic one.Modifying the distribution of X would improve the rate [22] (see the improved performance with a binary input in Fig. 3).

C. Independent Z i achieve the optimal performance
We might still wonder if the stochastic quantizer, while evidently better than the deterministic one, is optimal.That is, the Z i in our scheme are statistically independent.Could this scheme be outperformed if dependence between the Z i was permitted?For example, the channel from X to Y could be a BSC and the relay could convey information to the destination by setting Z i to be parity bits obtained by the XOR operation on pairs of Y i .We shall show next that the independent Z i , each statistically dependent on a single Y i only, achieve the best performance possible.To show this, we consider the scheme as in Fig. 1, but instead of producing Z, the bit sequence B = Z is derived directly from the sequence Y and passed to the decoder together with the compression scheme.Thus, we want to maximize I(X; Z), that is, the mutual information of whole sequences, with a constraint on I(Y; Z).The first term is the information rate of the whole system and the second term is an achievable lower bound on the backhaul bit-rate C. We can restate this question as an equivalent bottleneck problem: Let X, Y, Z be sequences, each comprising n elements X i , Y i and Z i .Also, let the elements of X, Y be i.i.d. and the channel X − Y be memoryless.In this case, the bottleneck problem is finding P Z|Y (z|y) which maximizes I(X; Z) with a constrained I(X; Y), and the question on hand is: The answer was already proved positive by Witsenhausen and Wyner [24] for discrete alphabets of X, Y and also for a Gaussian X over the AWGN channel in Tishbi [21] (An alternative proof for continuous alphabets is available in [25]).

D. Gaussian Information Bottleneck (GIB)
1) Information rate -scalar channel: The GIB and its derivation for the discrete-time signaling case was thoroughly studied in [3], [4], [5], [21] and [26].We will now give a brief overview of the GIB.The interested reader is referred to [4], [5] for a full treatment.A complete derivation of the information rate function for the vector case, as well as the difference between the information rate function and the rate-distortion function, namely, I(R) ≥ I RD (R), is presented in [5].
Consider the system in Fig. 4. The GIB addresses the following variational problem [19]: In the context of the information bottleneck method, X is called the relevance variable and I(X; Z) is termed relevant information.The trade-off between compression rate and relevant information is determined by the positive parameter β.It has been shown that the optimal z is jointly Gaussian with y and can be written as where α ∈ R is scalar and ξ ∼ N (0, σ ξ ) is independent of Y .Let us present I(C) for the channel depicted in Fig. 4. Since X and Y are real zero-mean jointly Gaussian random variables, they obey I(C) has the following properties: The proof is found in [4].It should be noted that it can also be proved using the I-MMSE relation [27, Chapter 5, Section 7.1.3].Fig. 5 illustrates the effect of limited-rate processing.It is clear that the total mutual information is upper bounded by the capacity for AWGN channels derived by Shannon [14].

E. Water-pouring
We recall the classical water-pouring approach which yields the maximum I ∞ n (x; z) for C → ∞.The idea of splitting the channel into incremental bands appears in [14] and [17], where each incremental band of bandwidth df is treated as an ideal (independent due to Gaussianity) band-limited channel with response H(f )df , and the result yields Optimizing this over S x (f ) under the power constraint yields (using the standard Euler-Lagrange method [28]) Thus, the result is (see [14, chapter 8]) and the frequency region B is given by IV. WATER-POURING WITH THE OPTIMAL QUANTIZER A. Processing under limited bit-rate C As before, we adopt Shannon's incremental view, taking advantage of the fact that disjoint frequency bands are independent under the Gaussian law and stationarity.Let 1  2 C(f ) designate the number of [nats/channel use] assigned for delivering (processing) the band (f, f + df ).Since we have 2 • df independent channel uses (Nyquist) per second, the total rate per second in each band is and, hence, Culminating this view and incorporating (4), we reach the equation (for simplicity we denote S x (f ) as S(f )) leading to the following optimization problem: The solution of Eq. ( 5) follows the standard Euler-Lagrange [28] reasoning.To that end, we follow the notation presented in [28].
is the mutual information spectral density [nats/sec/Hz].Also, Ŝ S(f ), Ĉ C(f ).The Lagrangian is where {λ c , λ s } ∈ are Lagrange coefficient multipliers.Differentiating Eq. ( 6) with respect to Ĉ, Ŝ and letting Q exp(− Ĉ), will lead to the following equation (see complete derivation in VI-A): The quadratic equation ( 7) produces two curve sets A rigorous proof can be found within Sec.VIII-B, where we derived that for each frequency f , the optimal values for S(f where B l is the set of frequencies that have non-zero resource allocation (bit-rate and power).In general, B l is unique unless the channel has a flat sub-band response.The algorithm for constructing B l can be found in Sec.VI-C.
In order to find the appropriate values for {λ c , λ s } we had to use a grid search and the following proposition was used: Proof: See Sec.VI-B for a rigorous proof.In stark contrast to classical water-pouring [9] and [17], the optimal solution will frequently be discontinuous.As shown in Fig. 14, zero resources is a singular point inside the non-concave region.Since C(f ) and S(f ) can never drop gradually down to zero, the transition will always have an abrupt part.
A simple example is the case where H(f) is constant over f , the SNR is sufficient and C is rather low.In this case an attempt to use frequency-constant S(f ) and C(f ) will place us in the non-concave region; a better solution will use only part of the available spectrum and utilize the available nats better by transmitting less information about the channel noise (see similar behavior in [15]).Fig. 6 demonstrates this idea, assuming a flat channel (i.e.H(f ) = 1).For a given total power P = 2[W att] , capacity C = 0.5[nats/sec] and allocated user's bandwidth W = 100[Hz], we calculated the mutual information rate when distributing the power and bit-rate uniformly over the bandwidth B used: It is clear that the best course would be to use only part of the spectrum, namely B/W ≈ 0.3[%], which is the maximum of the oblivious curve (blue).

B. Numerical Analysis
The proposed method has been applied on different types of channels (denoted as "Channel A") of the form ) is the Gaussian curve with P = 100[W att] and C = 9[nats/sec], while W = 10[Hz] is the allocated user bandwidth.We also tested the "reciprocal" channel -denoted as "Channel B" (i.e, H B (f In each scenario, we compared the overall information rate using the following methods: • The proposed method • Uniform allocation of rate and power • Classical water-pouring, as presented in [14], for the case of C → ∞ • "Limited-Rate Water-Pouring", which is: 1) Calculate S(f ) using the classical water-pouring approach.
2) The allocated rate is: The results are summarized in Table I.Figs. 7 and 8 contain curves of S(f ), C(f ), H(f ) normalized to a unity average, and also a (normalized) classical water-pouring power allocation curve, S(f ) W ater−P ouring , for comparison to the proposed approach.
It should be noted that the curves S(f ), C(f ) of Fig. 8 are not unique (algorithm dependent) since the channel has a flat response; however, the total mutual information is maximized nonetheless.
It is clear from the results that: • The proposed approach for allocating the power S(f ) and rate C(f ) is indeed optimal and superior to the other methods that were presented.Evidently, the rate is upper bounded by the classical water-pouring result (C → ∞).It is evident that • The price of obliviousness is demonstrated; as for a cognitive relay the reliable rate is min(I ∞ n (X; Z), C), achieved by a relay that decodes the signal and then transmits the decoded information at the maximum allowable rate (C).

V. FINITE OUTPUT ENTROPY H(Z)
In this section we analyze the performance of finite output entropy quantizers that can be implemented by a standard Lempel-Ziv algorithm, at a small cost in terms of performance.Analytic solutions for optimal information bottleneck quantizers are rarely available; here, we investigate optimization algorithms, since most practical algorithms cannot guarantee reaching a global optimum [29].

A. Quantizer Model and Preliminaries
Reviewing the scalar bottleneck problem, we assume where X and N are unit variance independent Gaussian signals, and, hence, Y designates the output of a scalar Gaussian channel with a Gaussian input.The Finite-Entropy-Bottleneck Problem, reads: Find the maximum of I(X; Z) under the Markov condition X − Y − Z, where H(Z) = C.In mathematical form: As mentioned in the preliminaries, the deterministic solution is optimal.In order to make computation feasible, the search was carried out for a K-bin or (K-level) deterministic quantizer First we list a few definitions: 1 + snr and hence the probabilities P Z|X (z|x) and P Z (z) are The resulting H(Z|X), H(Z) and I(X; Z) are Since both the information source and the noise are symmetric, we limit ourselves to the class of symmetric quantizers such as Eq. ( 12).The optimal quantizer problem can be stated as follows: max {qi}:H(Z)≤C

I(X; Z).
The maximization is performed over the quantizer thresholds, {q i }.In the following subsections we present numerical results of the problem under various conditions, and will gain some insights on the nature of the optimal quantizer and develop bounds and an analytical approximation.

B. Numerical Analysis
Our numerical optimization yields a 3-bit symmetric quantizer with thresholds  The performance with six and eight levels is nearly identical.
The thresholds were optimized to maximize the mutual information I(X; Z) for various types of SNR and C. In Fig. 11 we see the resulting mutual information, as well as upper and lower bounds.From the results we see that: • The mutual information I(X; Z) increases with SNR and C • The mutual information is bounded by the GIB.

C. The Effect of an Entropy Constraint on the Deterministic Quantizer Operation
We examine the case of an entropy constraint deterministic quantizer (C ≤ log 2 |χ| , when Z ∈ χ).From Fig. 9 it is evident that increasing the number of levels of the quantizer above the entropy constraint has almost no effect on the mutual information; thus the number of bins used was sufficient.The mutual information is bounded, as expected, by log 2 |χ|.One can see that even in a low SNR scenario the difference between the quantizers is negligible.To complete the discussion we add the case of a memoryless deterministic quantizer (i.e.no constraint, C ≥ log 2 |χ|), as illustrated in Fig. 10.Here, unlike the previous cases, there is a clear gain using a quantizer which has more bits.

D. Lower and Upper Bounds on the Optimal Performance
We now try to bound the mutual information and apply an upper bound and two lower bounds.As before, the GIB can serve as an upper bound.For the lower bound (which are also interesting achievability schemes), we tested two schemes: 1) Lower bound -setting output entropy H(Z) = C: We chose a quantization scheme which will lead to an output entropy H(Z) = C.In order to assure the required entropy, we changed the cardinality of the output |Z| and the induced (probability mass function) P Z (z) using the method described in Sec.VIII-C.Once the output probability mass function P Z (z) was set, the (symmetric) quantizer thresholds, , can be found by taking an auxiliary variable ν i :    The threshold q i is where Q −1 (x) denotes the inverse of Q(x).Fig. 11 demonstrates these results.
2) Lower bound -uniform quantizer: We tested a uniform quantizer, in which the quantizer step q was increased until the resulting probability mass function P Z (z) of the quantizer output had output entropy H(P Z ) = C.The output of the uniform quantizer has infinite cardinality since its input is unbounded.To that end, we discarded values that are higher (in their absolute value) than N • σ Y , ensuring output cardinality of |Z| ≈ 2N σ Y q for some large N .Fig. 13: Analytical approximation: For each bit-rate constraint C, we present the numerical result for the optimal quantizer, the GIB upper bound, and the analytical approximation.
Fig. 12 presents the results.For each bit-rate constraint C, we plot the numerically optimized quantizer, the GIB upper bound, and the lower bound resulting from uniform quantization.As one can see, the lower bound is fairly near to the curve of the numerically optimized quantizer.This method produced a tighter bound than the previous.

E. Analytic approximation of optimal performance
Let Z − Y − X be the inverse of the Markov chain defined in Sec.V and Eq.(11).Define Y − X, the inverse channel, as Thus, X can also be written as Since Then I(X; Y |Z) is no more than a standard Gaussian channel from Y → X, but Y is conditioned on Z since Gaussian inputs are optimal given the variance constraint Incorporating the Jensen inequality will lead to where the At this point we can utilize the results of Gish [10], where Massey [30] has proved that in an AWGN channel at low SNR and with a zero-mean input, the capacity is the same function of the mean power regardless of the input's probability distribution function.It is also evident that zeroing the added component 0.354[nats] leads to the GIB and Gish's bounds coinciding, since Incorporating (17) in (14) will lead exactly to the GIB bound, which is achieved in the case where the inverse channel input Y |Z is Gaussian, as GIB dictates.Fig. 13 demonstrates these results.Thus, the difference in performance at a low SNR and high C between the stochastic mutual information constrained quantizer and the deterministic entropy constrained quantizer is exactly the 0.255 bits per symbol in the relay bit-rate C.

VI. FURTHER DERIVATIONS AND PROOFS
A. Complete Derivation of Solution of Eq. ( 6) Differentiating Eq. ( 6) with respect to Ĉ, Ŝ leads to Hence, In order to simplify notation we use the following definitions: From (19b) it is clear that Substituting (20) in (19a) will lead to Eq. (7).We now have two sets of solutions for { Ŝ, Q}.Define Then the solution for { Ŝ, Q} is Multiplying the denominator and numerator by At this point, we continue in accordance with Proposition 1 and discard the { Ŝ2 , Q2 } curve since it is a non-concave solution.

C. Constructing the Set of Operating Frequencies B l
We perform a bounded grid search (see Proposition 2) on {λ s , λ c } that will yield the maximum mutual information: and discard the frequencies (i.e.S(f ) = 0, Q(f ) = 1) that contribute least to the total mutual information, until compliance.The set of frequencies that were not discarded is B l .
the df band can be split into two sub-bands with the resource assignment perturbed in each, but with the sum of the resources in df unchanged while increasing the performance I in df using the non-concavity.This is to be expected since our optimization equations are necessary but not sufficient conditions of global optimality [28].
Since we are dealing with a single frequency, H(f ) is constant and its influence is only a scaling of S(f ).Let us rewrite the Lagrangian at (6): and equation ( 22) becomes (remembering that Q e We can also write λ s , λ c as a function of (S, Q): We would like to choose only the concave solution, that is to choose (S, C) such that We then prove that regions with a value of Ψ i = 1 and concavity are identical.Proof: Each (S, C) pair corresponds by (27) to a unique (λ s , λ c ).Thus, the same (S, C) cannot be the outcome of two distinct (λ s , λ c ) with different ψ.
The next step will be to show that the lines S(λ s , λ c ) = f (C; λ s , λ c ) that split the regions of concavity and the sign of ψ coincide.Let us derive the dividing line between the ± regions in (26).At the dividing line (1 + λ c + λ s ) 2 − 4λ c λ s must be zero by the proof of Lemma 1 and the fact the functions are continuous, so at any point the dividing line must be the result of (26) regardless of the value of ψ.Particularly: two points infinitesimally near and each on a different side of the dividing line have the same S, C in the limit and, on the other hand, also the same (λ s , λ c ) by (27), so in the limit the value of ψ will not matter. where Substituting (30) back into (26) yields The allocated power S i must be non-negative; hence, by elimination, we discard η 1 : where S plus/minus is S on the dividing line defined by the sign of ψ i .Remembering that Q = 1 S λc 1−λc and C = −log(Q) we have the curve (S(λ c ), C(λ c )).We now examine the concavity regions of (28).We use the following derivatives: It is easy to see that ∂ 2 I ∂S 2 < 0 and ∂ 2 I ∂C 2 < 0, but we need to examine the sign regions of Once more, we get to solution We can discard η 2 (−) in order to ensure a non-negative solution for S i , leading to which is exactly the dividing line as in (33).Thus, we have that regions with the sign of ψ and concavity/convexity regions of (S, C) are identical (since λ c determines the same unique S in both cases and (λ c , S) determine a unique C).
A numerical calculation of this phenomenon can be easily demonstrated.We choose a square domain of (S, C); calculate (λ s , λ c ) by (27) and choose the correct sign function Ψ i in order to get back (S, C).Once the sign is set, we test for concavity.
Fig. 14 shows that the regions of concavity and sign are identical.In this case we select the plus sign in order to get the concave solution.To conclude, let us investigate the lower limit on C using (33):

Fig. 1 :
Fig.1: The oblivious relay (blue) serving a user communicating via a Gaussian scalar channel (green).

Fig. 3 :
Fig.3: Mutual information and system information rate R with Gaussian signal and two quantizers over an AWGN channel and with a quantization rate of C = 1[bit/channel use] as a function of SNR.Binary input is also presented for comparison.

Definition 1 .
Let X −Y −Z be a Markov chain.The information rate function I : R + → [0, I(X; Y )] is defined by [4] I(C) max P (z|y) I(X; Z) subject to I(Y ; Z) ≤ C.I(C) quantifies the maximum amount of the relevant information that can be preserved when the compression rate is at most C.

Fig. 6 :
Fig. 6: Information rate as a function of allocated bandwidth.

Fig. 7 :
Fig. 7: Allocated (normalized) power and bit-rate vs arbitrary channel and comparison to the allocated power resulting from water-pouring method; user bandwidth W = 10[Hz].

Fig. 8 :
Fig.8: An example of the abrupt nature of the optimal spectral allocation of power and bit-rate vs flat channel and comparison to the allocated power resulting from water-pouring method; user bandwidth W = 10[Hz].

Fig. 11 :
Fig. 11: Lower bound: setting the probability mass function P Z (z) s.t H(Z) ≤ C, for each bit-rate constraint C, we present the numerical result for the optimal quantizer, the GIB upper bound, and the lower bound resulting from setting H(P Z ) ≤ C.

Fig. 12 :
Fig.12: Lower bound: Uniform quantization.For each bit-rate constraint C, we present the numerical result for the optimal quantizer, the GIB upper bound, and the lower bound resulting from uniform quantization of the channel output.
E[X|Y ] = αY (where α = √ snr 1+snr ), and due to the fact that the error term X −E[X|Y ] is independent of the measurement Y , M is a normalized Gaussian variable independent of Y .Having done so, note that I(X; Y, Z) =I(X; Z) + I(X, Y |Z) =I(X; Y ) + I(X; Z|Y ) =I(X; Y ) + 0. I(X; Z|Y ) = 0 due to Markovity, leading to I(X; Z) = I(X; Y ) − I(X, Y |Z).
Z|Y (z|y)p Y |X (y|x)dy, where p Y |X (y|x) is the transition probability distribution function of the Gaussian channel and P Z|Y (z|y) describes the compression mapping Q.The capacity of the Gaussian channel p Y |X (y|x) with average power constraint P and no channel compression equals [17] (units are [nats/channel use]): By the Markovity of X − Y − Z we have P Z|X (z|x) = R P )