Oblivious Processing in a Fronthaul Constrained Gaussian Channel

We consider systems for which the transmitter conveys messages to the receiver through a capacity-limited relay station. The channel between the transmitter and the relay-station is assumed to be frequency selective additive Gaussian. It is assumed that the transmitter can shape the spectrum and adapt the coding technique as to optimize performance.The relay operation is oblivious, that is, the specific codebooks used are unknown, while the spectral shape of the transmitted signal is available. We find the reliable information rate that can be achieved in this setting, and to that end employ Gaussian bottleneck results combined with Shannon's incremental frequency approach. We also prove that unlike classical water-pouring, the allocated spectrum(power and bit-rate) of the optimal solution could be frequently discontinuous.


I. INTRODUCTION
The relaying technique makes use of intermediate nodes to help the communication between two distant nodes. Elementary relaying can be coarsely divided into compress-and-forward (amplifyand-forward is viewed as a special case) and decode-and-forward, depending on whether the relays decode the transmitted message or just forward the received signal to the destination. In this paper we examine the "oblivious" relay system. The oblivious approach is intended to construct universal relaying components serving many diverse users and operators and not dependent on knowing the modulation method and coding. Such an approach might benefit systems used in 'cloud' communication and was investigated for example in [4]. Consider the system in Figure 1. The relay which compresses the received signal y and forwards it to the final user's destination by a finite rate C provides the user with a memoryless communication channel forwarding symbols from the transmitter to the receiver. The user transmits the symbols x and receives symbols z while the effective channel is governed by the transition probability P (z|x). We choose x to be Gaussian because of its optimality at large C and because of its ubiquitous applications. In such a setting the user faces the familiar memoryless communication channel and can choose freely how to utilize it, e.g. selecting a good error correcting code and changing the codes after the oblivious system was already implemented. The serving system is oblivious of the channel code used. See [1] for a more rigorous presentation of obliviousness. The relay performs lossy compression of the output of the Gaussian channel. The trade-off between compression rate and mutual information between channel input and compressed channel output has closed-form expressions for the scalar and vector case using the Gaussian Information Bottleneck(GIB) theorem [2], [5] and [6]. This deviates from the classical remote rate distortion approach [7], [8] and [9](rate distortion for sub-Nyquist sampling scheme) and [10](Sampling stationary signals subject to bit-rate constraints), since the distortion is measured by the equivocation H(X|Z) and not MMSE = E(X − Z) 2 . Say the distribution of X is fixed, then minimizing H(X|Z) means maximizing I(X; Z) = H(X) − H(X|Z). In this paper, we provide a further generalization of the GIB for the case of frequency selective additive Gaussian. We find the reliable information rate that can be achieved in this setting, and to that end employ Gaussian bottleneck results [2] combined with Shannon's incremental frequency approach [3]. The reminder of this paper is organized as follows. Section II provides the required background and definitions for the GIB. In Section III, we review the main results relevant to frequency-flat channel from [5], [3] and present the new derivation for frequency selective channel and infinite-processing-time. Numerical results are provided in Section IV, conclusions and future discussions on V. Notation: We use boldface letters for column vectors and sequences. The expectation operator is denoted by E and we follow the notation of [11] for entropy H(·), differential entropy h(·), and mutual information I(·; ·). Furthermore, [x] + max {x, 0}. All logarithms are natural.

A. Information rate -vector channel
The Gaussian Information Bottleneck and its derivation for the discrete-time signaling case, was thoroughly studied at [5], [12], [2], [6] and [13]. We will now give a brief overview of the GIB (the interested reader is referred to [5], [6] for a full treatment).
A complete derivation of the information rate function for the vector case, as well as the difference between the information rate function and the rate-distortion function namely, I(R) ≥ I RD (R), is presented in [6]. Consider the system in Figure 2. Let x and y be jointly Gaussian zero-mean random vectors(length n) with full rank covariance matrices. Assuming that x ∼ N (0, R xx ). The channel output equals y = Hx + n where H ∈ R n×m and the additive noise n ∼ N(0, σ 2 I) is independent of x which results Let z be a compressed representation of y denoted by the conditional distribution P (z|y). It follows that x − y − z forms a Markov chain and hence by Markovity The compression rate equals I(z; y). The GIB addresses the following variational problem [14]: In the context of the information bottleneck method, x is called the relevance variable and I(x; z) is termed relevant information. The trade-off between compression rate and relevant information is determined by the positive parameter β. It has been shown that the optimal z is jointly Gaussian with y and can be written as where A is an n × n matrix and ξ ∼ N (0, C ξ ) is independent of y .
I(C) quantifies the maximum of the relevant information that can be preserved when the compression rate is at most C.

B. Information rate -scalar channel
We now present I(C) for the channel depicted at Figure 2 for the scalar case. Since x and y are real zero-mean jointly Gaussian random variables, they obey: where h ∈ R + and n ∼ N (0, σ 2 ) is independent of x. Setting x ∼ N(0, P ) yields y ∼ N(0, hP + σ 2 ). The compressed representation of y is denoted z = Q(y). By Markovity of x−y−z we have where P (y|x) is the transition pdf of the Gaussian channel and P (z|y) describes the compression mapping Q. The capacity of the Gaussian channel P (y|x) with average power constraint P and no channel compression equals [11](units are [nats/channel use]) with ρ as the signal-to-noise ratio (SNR) The following corollary states a closed-form expression for the information-rate function and properties [5,Theorem 2].

Corollary 1. The information-rate function of a Gaussian channel with SNR ρ is given by
I(C) has the following properties: The proof is within [5], It should be noted that it can also be proved using the I-MMSE relation [ Figure 3 illustrates the effect of limited-rate processing. It is clear the that the total mutual information is upper bounded by the capcity for AWGN channels (which can be achieved by the water-pouring approach) presented by Shannon [3].
where n(t) is a normalized additive white Gaussian noise with one-sided power spectral density N 0 = 1[W att/Hz], and * designates convolution.
We are interested in the normalized mutual-information when standard coding theorems [16] guarantee that the associated rate can be reliably transmitted through the system.
where x b a : (x(t), a ≤ t ≤ b) and where Z is the output binary vector that is constrained to be C[nats/sec], The information in (13) is also measured in terms of [nats/sec]. n denotes the nsymbol block transmitted and is the dimension of Z.
Again, we'd like to find the(one-sided) power spectral density of the input Gaussian process S x (f ) which maximizes I C n (x; z) under an average power constraint under some bandwidth W : A. Water pouring First we shall recall the classical water pouring approach which yields the maximum I ∞ n (x; z) for C → ∞.
The idea of splitting the channel into incremental bands appears in [3] and [11] where each incremental band of bandwidth df is treated as ideal (independent due to Gaussianity) band-limited channel with response H(f )df , and the result yields: Optimizing this over S x (f ) under the power constraint yields (using the standard Euler-Lagrange [17]) Thus, the result is (see [3, chapter 8] .
and where the frequency region B is given by

B. Processing under limited bit-rate C
As before, we adopt the Shannon's incremental view taking advantage of the fact that disjoint frequency bands are independent under the Gaussian law and stationarity . Let 1 2 C(f ) designate the number of [nats/channel use] assigned for delivering (processing) the band (f, f + df ). Since we do have 2df independent channel uses (Nyquist) per second, the total rate per second in each band is culminating this view and incorporating (11), we reach the equation: (20) Leading to the following optimization problem: max The result follows the standard Euler-Lagrange [17] reasoning. To that end, we follow the notation presented in [17]. Assuming , is the mutual information spectral density[nats/sec/Hz]. Also,Ŝ x S x (f ),Ĉ C(f ). The Lagrangian is: Differentiating L f,Ŝ x ,Ĉ with respect toĈ,Ŝ x leads: and solving the quadratic equation that follows from (23) , we have two sets of solutions for {Ŝ x ,Q} (see [18] for full derivation) Define: Although equation (26) produce two curve sets,we discard the {S x,2 (f ), Q 2 (f )} curve, since for each frequency (regardless of H(f )), meaning that one could improve the "optimal" I[f,Ŝ x ,Q] simply by splitting the incremental frequency band, a rigorous proof can be found within [18].
In stark contrast to classical water-pouring [11] and [10], the optimal solution will be discontinuous frequently. An outstanding example is H(f ) constant over f , sufficient SNR and rather low C. In this case an attempt to use frequency-constant S x (f ), C(f ) will place us in the non-concave region and a better performing solution will use only part of the available spectra to utilize the available bits better by transmitting less information about the channel noise, and since C(f ) and S(f ) will never fall gradually down to zero, the transition will have always an abrupt part.
It is clear that the best course would not be to spread the power and bit-rate all over the spectrum.

IV. NUMERICAL ANALYSIS
The algorithm has been tested for different types of channels(assigned as "Channel A") of the form H A (f ) ≡ α 1 N (f 1 , 1) + α 2 N (f 2 , 1) (when N (μ, σ 2 ) is the Gaussian curve) with P = 100, C = 9[nats/sec].We also tested the "reciprocal" channel -assigned as "channel B" (i.e, H A (f ) ) In each scenario, we've compared the overall information rate using the following algorithms: • The proposed algorithm • Uniform allocation of rate and power • Classical water pouring as presented in [3], for the case of C → ∞ • "Limited Rate Water Pouring", which is: A) Calculate S x (f ) using the classical water pouring approach. B) The allocated rate is: The results are summarized at the next figures and table. Each figure contains a Normalized curves of S x (f ), C(f ) of the proposed algorithm with respect to H(f ) b Comprasion between the allocated power using the proposed approach and the classical water pouring It is clear form the results that: • The proposed approach for allocating the power S x (f ) and rate C(f ) is indeed optimal and superior to the other methods that are presented. Evidently, the rate is upper bounded by the classical water pouring result(C → ∞). It is evident: The price of obliviousness is demonstrated, as for a cognitive relay the reliable rate is min(I ∞ n (x; z), C), achieved by a relay that decodes the signal and transmits then the decoded information at the maximum allowable rate (C).

V. CONCLUSIONS AND FUTURE DIRECTIONS
We presented and analyzed the rate and power limited oblivious relay over the frequency selective AWGN channel and derived the optimal transmit power spectral density and the optimal allocation of the relay bit-rate for Gaussian signaling. Our results relate directly to the classical water-pouring as well as to the the Gaussian bottleneck frameworks. The advantage of this approach over other methods was demonstrated. Our results apply directly also to the frequency dependent vector (MIMO) channels presented in figure 1 and in section II above. Such channels can be transformed to a set of parallel independent channels [19] . Thus equation (26) and the optimization algorithms can be applied on those with no need of modification by considering those independent channels as occupying independent frequency bands. A modern implementation of such a MIMO system might use the OFDM framework in which the MIMO channel diagonalization is convenient to implement(see for example [20]) An area under current investigation is the Constrained H(z) for general memoryless channels. Let us replace the constraint It should be emphasized that the entropy constraint is per symbol, In this case we can apply insights from the results of [21]. This will result in a universal compression scheme, where standard algorithms, as the Lempel Ziv , Arithmetic coding and others can be applied [11]. This will also enforce z to a discrete alphabet since otherwise H(z) is infinite. Open questions:   A) Is the curve of the maximal I(x; z) as a function of the maximal permitted H(z) concave? It is of cause concave if time sharing between two different quantizers is allowed. B) In which cases does the optimal mapping y → z become deterministic? In some cases it is stochastic, for example if y is binary there are only two possible deterministic quantizers, one of which is z = y and the other z = const. So H(z) has only two possible values. In such cases the quantizer will be stochastic or time sharing between different deterministic quantizers will be used. With deterministic quantizer the scheme becomes similar to Lloyd quantizer concatenated with entropy compression but may be superior to it because the quantizer is optimized jointly for I(x; z) and H(z). On the other hand there is a trivial way to transfer any stochastic quantizer to a deterministic one if the x → y channel is an additive noise channel.