On the Capacity of Cloud Radio Access Networks with Oblivious Relaying

We study the transmission over a network in which users send information to a remote destination through relay nodes that are connected to the destination via finite-capacity error-free links, i.e., a cloud radio access network. The relays are constrained to operate without knowledge of the users' codebooks, i.e., they perform oblivious processing. The destination, or central processor, however, is informed about the users' codebooks. We establish a single-letter characterization of the capacity region of this model for a class of discrete memoryless channels in which the outputs at the relay nodes are independent given the users' inputs. We show that both relaying \`a-la Cover-El Gamal, i.e., compress-and-forward with joint decompression and decoding, and"noisy network coding", are optimal. The proof of the converse part establishes, and utilizes, connections with the Chief Executive Officer (CEO) source coding problem under logarithmic loss distortion measure. Extensions to general discrete memoryless channels are also investigated. In this case, we establish inner and outer bounds on the capacity region. For memoryless Gaussian channels within the studied class of channels, we characterize the capacity region when the users are constrained to time-share among Gaussian codebooks. We also discuss the suboptimality of separate decompression-decoding and the role of time-sharing. Furthermore, we study the related distributed information bottleneck problem and characterize optimal tradeoffs between rates (i.e., complexity) and information (i.e., accuracy) in the vector Gaussian case.


I. INTRODUCTION
Cloud radio access networks (CRAN) provide a new architecture for next-generation wireless cellular systems in which base stations (BSs) are connected to a cloud-computing central processor (CP) via error-free finite-rate fronthaul links.This architecture is generally seen as an efficient means to increase spectral efficiency in cellular networks by enabling joint processing of the signals received by multiple BSs at the CP, thus alleviating the effect of interference.Other advantages include low cost deployment and flexible network utilization [1].
In a CRAN network, each BS acts essentially as a relay node; and so can in principle implement any relaying strategy, e.g., decode-and-forward [2, Theorem 1], compress-andforward [2,Theorem 6] or combinations of them.Relaying strategies in CRANs can be divided roughly into two classes: i) strategies that require the relay nodes to know the users' codebooks (i.e., modulation, coding), such as decode-andforward, compute-and-forward [3], [4] or variants of it, and ii) strategies in which the relay nodes operate without knowledge of the users' codebooks, often referred to as oblivious relay processing (or nomadic transmitter) [5]- [7].This second class is composed essentially of strategies in which the relays implement forms of compress-and-forward [2], such as successive The work of G. Caire is supported by an Alexander von Humboldt Professorship.The work of S.Shamai has been supported by the European Union's Horizon 2020 Research And Innovation Programme, grant agreement no.694630.
Wyner-Ziv compression [8]- [10] and noisy-network coding [11].Schemes combining the two apporaches have been shown to possibly outperform the best of the two in [12], especially in scenarios in which there are more users than relay nodes.
In the spirit, however, a CRAN architecture is usually envisioned as one in which BSs operate as simple radio units (RUs) that are constrained to implement only the radio functionalities, such as analog-to-digital conversion and filtering, while the baseband functionalities are migrated to the CP.For this reason, while relaying schemes that involve partial or full decoding of the users' codewords can sometimes offer rate gains, they do not seem to be suitable in practice -In fact, such schemes assume that all or a subset of the relay nodes are fully aware (at all times) of the codebooks and encoding used by the users; and the signaling required to convey such information is generally prohibitive, particularly as networks become large.Instead, schemes in which relays perform oblivious processing are preferred.Oblivious processing was first introduced in [5].The basic idea is that of using randomized encoding to model lack of information about codebooks.For related works, the reader may refer to [6], [13]- [15].In particular, [15] extends the original definition of oblivious processing of [5], which rules out time-sharing, to include settings in which encoders are allowed to switch among different codebooks and oblivious nodes are unaware of the codebooks but are given, or can acquire, time-or frequency-schedule information, which is generally less difficult to obtain.The framework is termed therein as "oblivious processing with enabled time-sharing".
In this work, we consider transmission over a CRAN in which the relay nodes are constrained to operate without knowledge of the users' codebooks, i.e., are oblivious, and only know time-or frequency-sharing information.Focusing on a class of discrete memoryless channels in which the relay outputs are independent conditionally on the users' inputs, we establish a single-letter characterization of the capacity region of this class of channels.We show that relaying à-la Cover-El Gamal, i.e., compress-and-forward with joint decompression and decoding, or noisy network coding, are optimal.For the proof of the converse part, we utilize useful connections with the Chief Executive Officer (CEO) source coding problem under logarithmic loss distortion measure [16].For memoryless Gaussian channels within this class, we characterize the capacity under Gaussian channel inputs.Extensions to general discrete memoryless channels are also investigated.In this case, we establish inner and outer bounds on the capacity region.
Notation: Throughout, we use the following notation.Lower case letters denote scalars, e.g., x; upper case letters denote random variables, e.g., X, boldface lower case letters denote vectors, e.g., x, and boldface upper case letters denote matrices, e.g., X. Calligraphic letters denote sets, e.g., X ; and the cardinality of set X is denoted by |X |.For a set of integers K, the notation X K denotes the set of random variables {X k } with indices k in the set K, i.e., X K = {X k } k∈K .II.SYSTEM MODEL Consider the discrete memoryless CRAN model shown in Figure 1.In this model, a set of users communicate with a central processor (CP) through a set of relay nodes that are connected to the CP via error-free finite-rate fronthaul links.Let L = {1, . . ., L} denote the set of users, and K = {1, . . ., K} denote the set of relays, and let C k be the capacity of the link connecting relay node k to the CP, k ∈ K. Similar to [6], the relays nodes are constrained to operate without knowledge of the users' codebook and only know time-sharing information, i.e., oblivious relay processing with enabled time sharing.The obliviousness of the relay nodes to the actual codebooks is modeled by the transmitters picking at random their selected codebooks and the relays not aware of the actual codebooks indices.Specifically, the codeword X n (F l , M l , Q n ) transmitted by encoder l, l ∈ L, depends not only on the message M l ∈ [1, 2 nR l ], but also on the index F l which runs over all possible codebooks of the given rate R l , i.e., F l ∈ [1, |X l | n2 nR l ] and the time sharing sequence Q n .Formally, the model is defined as follows. 1) The index F l is picked at random and shared with CP, but not to the relays.2) Time-sharing sequence: All terminals, including the relay nodes, are aware of a time-sharing sequence Q n , distributed as 3) Encoding functions: The encoding function at user l, l ∈ L, is defined by a pair (p X l , φ l ) where p X l is a singleletter pmf and φ l is a mapping , that maps the given codebook index F l , message m l and time-sharing variable q n to a channel input where , and maps the received sequence ), that it then sends to the CP over the error-free link of capacity C k .The index J k is then sent the to the CP over the link of capacity C k .5) Decoding function: Upon receiving the indices J K = (J 1 . . ., J K ), the CP estimates the users' messages as where g : is the decoder mapping.
Definition 2. A rate tuple (R 1 , . . ., R L ) is said to be achievable if, for any > 0, there exist a sequence of codes, such that, for sufficiently long blocklength n, each user's message can be decoded by the CP at rate at least R k with vanishing probability of error, i.e., For given C K , the capacity region R(C K ) is the closure of all achievable rate tuples (R 1 , . . ., R L ).
Due to space limitations, some of the results of this paper are only outlined or given without proofs.The detailed proofs can be found in [17].

A. Class of Discrete Memoryless Channels
In this work, we establish the capacity region of the following class of discrete memoryless CRAN channels with oblivious relay processing and enabled time-sharing.In this class, the channel outputs at the relay nodes are independent conditionally on the users' inputs.That is, or, equivalently, the following Markov chain holds,

B. Oblivious Relaying with Enabled Time-Sharing
Similar to [6], the above constraint of oblivious relay processing with enabled time-sharing means that, in the absence of information regarding the indices F L and the messages M L , a codeword x n l (F l , m l |q n ) taken from a (n, R l ) codebook has independent but non-identically distributed entries.Lemma 1.Without the knowledge of the selected codebooks indices (F 1 , . . ., F L ), the distribution of the transmitted codewords conditioned on the time-sharing sequence are given by p

A. Capacity Region of Studied Class of CRAN Channels
The main result of this paper is a single-letter characterization of the capacity region of the class of channels with oblivious relaying and enabled time-sharing that satisfy (5).The following theorem states the result.
Theorem 1.For the class of discrete memoryless channels given by (5) with oblivious relay processing and enabled timesharing, a rate tuple (R 1 , . . ., R L ) is achievable if and only if for all T ⊆ L and for all S ⊆ K, we have for some joint measure of the form p(q) Proof: The proof of converse part of Theorem 1 is relegated to Section V.The proof of the direct part can be obtained by applying the noisy network coding (NNC) scheme of [11,Theorem 1].Alternatively, the rate region of Theorem 1 can also be achieved by a scheme that generalizes that of [7, Theorem 1], which is established in the case of a single transmit node, to the case of multiple users and accommodate timesharing.By opposition to the NNC scheme, the latter scheme is based on compress-and-forward à la Cover-El Gamal with joint decoding and decompression at the CP (CoF-JD).
Remark 1. Key element for the proof of Theorem 1 is the connection with the chief executive officer (CEO) problem.For the case of m encoders, m ≥ 3, while characterization of the optimal rate-distortion region of this problem for general distortion measures has eluded the information theory, a characterization of the optimal region in the case of logarithmic loss distortion measure has been provided recently in [16].
Remark 2. The sum-rate of Theorem 1 can also be achieved by a scheme in which the CP decodes explicitly the compression indices first, and then decodes the users' transmitted messages, i.e., decompression and decoding is not performed jointly.A similar observation is found in [18,Theorem 2].

B. Memoryless Gaussian Model
In this section, we consider a memoryless Gaussian MIMO model of the studied CRAN with oblivious relay processing and enabled time sharing.The channel output at relay node k, equipped with M k receive antennas, is given by where is the channel matrix connecting user l to relay node k, and N k ∈ C M k is the noise vector at relay node k, assumed to be Gaussian with N k ∼ CN (0, Σ k ).The transmission is subjected to power constraint Tr(K l ) ≤ P k , where K l = E[X l X H l ] is the covariance matrix of X l .The noises at the relay nodes are assumed to be independent; and so the studied Gaussian model satisfies the Markov chain (5).
The result of Theorem 1 can be extended to continuous channels using standard techniques; and so it characterizes the capacity region of the model (7).The computation of this region, however, is not easy as it requires finding the optimal choices of the involved auxiliary random variables U 1 , . . ., U K .The following theorem characterizes more explicitly the capacity region when the users are constrained to employ Gaussian signaling, i.e., for Q = q, X l,q ∼ CN (0, K l,q ), for all l ∈ L. Theorem 2. If the input vectors are Gaussian, the capacity region of the memoryless Gaussian MIMO model ( 7) is given by the set of all rate tuples (R 1 , . . ., R L ) satisfying that for all T ⊆ L and all S ⊆ K t∈T for some 0 B k Σ −1 k , where H k,T denotes the channel matrix connecting the input X T to the output Y k , formed by concatenating the matrices H k,l , l ∈ T , horizontally.Remark 3. Theorem 2 extends the result of [5,Theorem 5] to the case of L users and enabled time-sharing.In addition to showing that under the constraint of Gaussian input signaling, the quantization codewords can be chosen optimally to be Gaussian, the result of Theorem 2 also means that timesharing is not needed in the memoryless Gaussian case.Recall that, as shown through an example in [5], if the relays are aware of the users' codebooks restricting to Gaussian input signaling can be a severe constraint and is generally suboptimal.
Remark 4. In [18], the authors study the questions of optimal fronthaul compression and decoding strategies for uplink CRAN networks without oblivious processing constraints.It is shown that NNC with Gaussian input and Gaussian quantization achieve to within a constant gap of the capacity region of the Gaussian MIMO uplink CRAN.In this paper, we show that if only oblivious relay processing is allowed, NNC and CoF-JD is in fact optimal from a capacity viewpoint.

IV. GENERAL DISCRETE MEMORYLESS MODEL
In this section, we focus on general discrete memoryless CRAN channels with oblivious relay processing and time sharing, i.e., the channel outputs at the relays are arbitrarily correlated and the Markov chain (5) does not necessarily hold.We establish bounds on the capacity region of the model.The results extend those of [5], which only consider a single transmitter and no time-sharing, to the case of multiple transmitters and allowed time-sharing.
The following theorem provides an inner bound on the capacity region of the general DM CRAN model with oblivious relay processing and time sharing.Theorem 3.For general DM CRAN channels with oblivious relay processing and enabled time-sharing, the set of rates (R 1 , . . ., R L ) such that for all T ⊆ L and all S ⊆ K, for some joint measure p(q) We now provide an outer bound on the capacity region of the general DM CRAN model with oblivious relay processing and time-sharing.The following theorem states the result.Theorem 4. For general DM CRAN channels with oblivious relay processing and enabled time-sharing, if a rate-tuple (R 1 , . . ., R L ) is achievable then for all T ⊆ L and all S ⊆ K, for some for some random variable W and some deterministic functions {f k }, k ∈ K.
Remark 5.The inner bound of Theorem 3 and the outer bound of Theorem 4 do not coincide in general.This is due to the fact that in Theorem 3, U 1 , . . ., U K satisfy the Markov chain does not necessarily hold for the auxiliary random variables of the outer bound.Remark 6.As we already mentioned, the class (5) of DM CRAN channels connects with the CEO problem under logarithmic loss distortion measure.The rate-distortion region of this problem is characterized in the excellent contribution [16] for an arbitrary number of (source) encoders (see Theorem 3 therein).For general DM CRAN channels, i.e., without the Markov chain (5) the model connects with the distributed source coding problem under logarithmic loss distortion measure.While a solution of the latter problem for the case of two encoders has been found in [16,Theorem 6], generalizing the result to the case of arbitrary number of encoders poses a significant challenge.In fact, as also mentioned in [16], the Berger-Tung inner bound is known to be generally suboptimal (e.g., see the Korner-Marton lossless modulo-sum problem [19]).Characterizing the capacity region of the general DM CRAN model under the constraint of oblivious relay processing and enabled time-sharing poses a similar challenge, except perhaps for the case of two relay nodes, results on which will be reported elsewhere.

V. PROOF OF CONVERSE PART OF THEOREM 1
Assume the rate tuple (R 1 , . . ., R L ) is achievable.Let T be a set of L, S be a non-empty set of K, and J k φ r k (Y n k , q n ) be the message sent by relay k ∈ K, and let Q = q n be the time-sharing variable.For simplicity we define Qi (X i−1 L , X n L,i+1 , Q).From Fano's inequality, we have with n → 0 for n → ∞ (for vanishing probability of error), for all T ⊆ L, We show the following inequality, used below in the proof.
Inequality ( 16) can be shown as follows.
where (17) follows since m T are independent; (19) follows since m T is independent of Q and F T c ; (20) follows from ( 14); (24) follows since m T is independent of F L ; (26) follows from the data processing inequality;(28) follows since X n T c , F T c are independent from X n T and since conditioning reduces entropy and; (29) follows due to the Markov chain Then, from (29) we have ( 16) as follows: where (33) is due to Lemma 1. Continuing from (29), we have where (36) follows due to Lemma 1; and (37) follows since conditioning reduces entropy.
On the other hand, we have the following equality where (39) follows due to the Markov chain which follows since the channel is memoryless.
Then, from the relay side we have, where (47) follows since J S is a function of Y n S ; (49) follows from ( 16); (51) follows since conditioning reduces entropy; and (52) follows from ( 16) and (42).
In general, Qi is not independent of X L,i , Y S,i , and that due to Lemma 1, conditioned on Qi , we have the Markov chain Finally, a standard time-sharing argument completes the proof of Theorem 1.
VI. CONCLUDING REMARKS In this paper, we study transmission over a cloud radio access network under the framework of oblivious processing at the relay nodes, i.e., the relays are not allowed to know, or cannot acquire, the users' codebooks.Our results shed light (and sometimes determine exactly) what operations the relay nodes should perform optimally in this case.In particular, perhaps non-surprisingly it is shown that compress-and-forward, or variants of it, generally perform well in this case, and are optimal when the outputs at the relay nodes are conditionally independent on the users inputs.Furthermore, in addition to its relevance from a practical viewpoint, restricting the relays not to know/utilize the users' codebooks causes only a bounded rate loss in comparison with the non-oblivious setting (e.g., compress-and-forward and noisy network coding perform to within a constant gap from the cut-set bound in the Gaussian case).
≥H(Xn T |X n T c , J S c , Q) − nΓ T + I(Y n S ; J S |X n L , J S c , Q) T ,i |X n T c , J S c , X i−1 T , Q) − nΓ T + I(Y n S ; J S |X n L , J S c , Q) T ,i |X T c ,i , U S c ,i , Qi ) − nΓ T + I(Y n S ; J S |X n L , J S c , Q) (51) =nR T − n i=1 I(X T ,i ; U S c ,i |X T c ,i , Qi ) k,i ; U k,i |X L,i , Qi ), 43) and since J k is a function of of Y n k ; and (41) follows due to the Markov chain Y