On the Capacity of Cloud Radio Access Networks With Oblivious Relaying

We study the transmission over a network in which users send information to a remote destination through relay nodes that are connected to the destination via finite-capacity error-free links, i.e., a cloud radio access network. The relays are constrained to operate without knowledge of the users’ codebooks, i.e., they perform oblivious processing. The destination, or central processor, however, is informed about the users’ codebooks. We establish a single-letter characterization of the capacity region of this model for a class of discrete memoryless channels in which the outputs at the relay nodes are independent given the users’ inputs. We show that both relaying à-la Cover–El Gamal, i.e., compress-and-forward with joint decompression and decoding, and “noisy network coding” are optimal. The proof of the converse part establishes, and utilizes, connections with the Chief Executive Officer source coding problem under logarithmic loss distortion measure. Extensions to general discrete memoryless channels are also investigated. In this case, we establish the inner and outer bounds on the capacity region. For memoryless Gaussian channels within the studied class of channels, we characterize the capacity region when the users are constrained to time-share among Gaussian codebooks. Furthermore, we also discuss the suboptimality of separate decompression and decoding and the role of time sharing.


I. INTRODUCTION
C LOUD radio access networks (CRAN) provide a new architecture for next-generation wireless cellular systems in which base stations (BSs) are connected to a cloud-computing central processor (CP) via error-free finiterate fronthaul links. This architecture is generally seen as an efficient means to increase spectral efficiency in cellular networks by enabling joint processing of the signals received by multiple BSs at the CP and, so, possibly alleviating the effect of interference. Other advantages include low cost deployment and flexible network utilization [2]. In a CRAN network, each BS acts essentially as a relay node; and so it can in principle implement any relaying strategy, e.g., decode-and-forward [3, Th. 1], compress-andforward [3,Th. 6] or combinations of them. Relaying strategies in CRANs can be divided roughly into two classes: i) strategies that require the relay nodes to know the users' codebooks (i.e., modulation, coding), such as decode-and-forward, computeand-forward [4]- [6] or variants thereof, and ii) strategies in which the relay nodes operate without knowledge of the users' codebooks, often referred to as oblivious relay processing (or nomadic transmission) [7]- [9]. This second class is composed essentially of strategies in which the relays implement forms of compress-and-forward [3], such as successive Wyner-Ziv compression [10]- [12] and quantize-map-and-forward [13] or noisy-network coding [14]. Schemes that combine the two approaches have been shown to possibly outperform the best of the two [15], especially in scenarios in which there are more users than relay nodes.
In essence, however, a CRAN architecture is usually envisioned as one in which BSs operate as simple radio units (RUs) that are constrained to implement only radio functionalities such as analog-to-digital conversion and filtering while the baseband functionalities are migrated to the CP. For this reason, while relaying schemes that involve partial or full decoding of the users' codewords can sometimes offer rate gains, they do not seem to be suitable in practice. In fact, such schemes assume that all or a subset of the relay nodes are fully aware (at all times!) of the codebooks and encoding operations used by the users. For this reason, the signaling required to enable such awareness is generally prohibitive, particularly as the network size gets large. Instead, schemes in which relay nodes perform oblivious processing are preferred in practice. Oblivious processing was first introduced in [7]. The basic idea is that of using randomized encoding to model lack of information about codebooks. For related works, the reader may refer to [8], [16], and [17]. In particular, [8] extends the original definition of oblivious processing of [7], which rules out time-sharing, to include settings in which transmitters are allowed to switch among different codebooks, constrained relay nodes are unaware of the codebooks but are given, or can acquire, time-or frequency-schedule information. 1 The framework is referred to therein as "oblivious processing with enabled time-sharing".
In this work, we consider transmission over a CRAN in which the relay nodes are constrained to operate without knowledge of the users' codebooks, i.e., are oblivious, and only know time-or frequency-sharing information. The model is shown in Figure 1. Focusing on a class of discrete memoryless channels in which the relay outputs are independent conditionally on the users' inputs, we establish a single-letter characterization of the capacity region of this class of channels. We show that both relaying à-la Cover-El Gamal, i.e., compress-and-forward with joint decompression and decoding [7], [18], and noisy network coding [14] are optimal. For the proof of the converse part, we utilize useful connections with the Chief Executive Officer (CEO) source coding problem under logarithmic loss distortion measure [19]. Extensions to general discrete memoryless channels are also investigated. In this case, we establish inner and outer bounds on the capacity region. For memoryless Gaussian channels within the studied class, we provide a full characterization of the capacity region under Gaussian signaling, i.e., when the users' channel inputs are restricted to be Gaussian. In doing so, we also investigate the role of time-sharing.

Outline and Notation
The rest of this paper is organized as follows. Section II provides a formal description of the model, as well as some definitions that are related to it. Section III contains the main result of this paper, which is a single-letter characterization of the capacity region of a class of discrete memoryless CRANs with oblivious processing at relays and enabled time-sharing in which the channel outputs at the relay nodes are independent conditionally on the users' channel inputs. This section also provides inner and outer bounds on the capacity region of general discrete memoryless CRANs with constrained relays, as well as some discussions on the suboptimality of successive decompression and decoding and the role of time-sharing. Finally, in Section IV, we study a memoryless vector Gaussian CRAN model with oblivious processing at relays and enabled 1 Typically, this information is small, e.g., 1 bit that captures on/off activity; and, so, obtaining it is generally much less demanding that obtaining full information about the users' codebooks. time-sharing, for which we characterize the capacity region under Gaussian signaling.
Throughout this paper, we use the following notation. Upper case letters are used to denote random variables, e.g., X; lower case letters are used to denote realizations of random variables x; and calligraphic letters denote sets, e.g., X . The cardinality of a set X is denoted by |X |. The length-n sequence (X 1 , . . . , X n ) is denoted as X n ; and, for integers j and k such that 1 ≤ k ≤ j ≤ n, the sub-sequence (X k , X k+1 , . . . , X j ) is denoted as X j k . Probability mass functions (pmfs), are denoted by p X (x) = Pr{X = x}; or for short, as p(x) = Pr{X = x}. Boldface upper case letters denote vectors or matrices, e.g., X, where context should make the distinction clear. For an integer L ≥ 1, we denote the set of integers smaller or equal L as L := {l ∈ N : 1 ≤ l ≤ L}. Sometimes, this set will also be denoted as [1 : L]. For a set of integers K ⊆ L, the notation X K designates the set of random variables {X k } with indices k in the set K, i.e., X K = {X k } k∈K . We denote the covariance of a zero mean vector X by x := E[XX H ]; x,y is the cross-correlation x,y := E[XY H ], and the conditional correlation matrix of X given Y as x|y := x − x,y −1 y y,x .

II. SYSTEM MODEL
Consider the discrete memoryless (DM) CRAN model shown in Figure 1. In this model, L users communicate with a common destination or central processor (CP) through K relay nodes, where L ≥ 1 and K ≥ 1. Relay node k, 1 ≤ k ≤ K , is connected to the CP via an error-free finite-rate fronthaul link of capacity C k . In what follows, we let L := [1 : L] and K := [1 : K ] indicate the set of users and relays, respectively. Similar to [8], the relay nodes are constrained to operate without knowledge of the users' codebooks and only know a time-sharing sequence Q n , i.e., a set of time instants at which users switch among different codebooks. The obliviousness of the relay nodes to the actual codebooks of the users is modeled via the notion of randomized encoding [7] (see also [20] for an earlier introduction of this notion in the context of coding for channels with unknown states). That is, users or transmitters select their codebooks at random and the relay nodes are not informed about the currently selected codebooks, while the CP is given such information. Specifically, in this setup, user l, l ∈ L, sends codewords X n l (F l , M l , Q n ) that depend not only on the message M l ∈ [1 : 2 n R l ] of rate R l that is to be transmitted to the CP by the user and the time-sharing sequence Q n , but also on the index F l of the codebook selected by this user. This codebook index F l runs over all possible codebooks of the given rate R l , i.e., F l ∈ [1 : |X l | n2 n R l ], and is unknown to the relay nodes. The CP, however, knows all indices of the currently selected codebooks by the users. Also, it is assumed that all terminals know the time-sharing sequence.

A. Formal Definitions
The discrete memoryless CRAN model with oblivious relay processing and enabled time-sharing that we study in this paper is defined as follows.
1) Messages and Codebooks: Transmitter l, l ∈ L, sends message M l ∈ [1 : 2 n R l ] to the CP using a codebook from a set of codebooks {C l (F l )} that is indexed by F l ∈ [1 : |X l | n2 n R l ]. The index F l is picked at random and shared with the CP, but not the relays. 2) Time-sharing sequence: All terminals, including the relay nodes, are aware of a time-sharing sequence Q n , distributed as p Q n (q n ) = n i=1 p Q (q i ) for a pmf p Q (q).

3) Encoding functions:
The encoding function at user l, l ∈ L, is defined by a pair ( p X l , φ l ) where p X l is a single-letter pmf and φ l is a mapping φ l : [1 :|X l | n2 n R l ]× [1 : 2 n R l ] × Q n → X n l that assigns the given codebook index F l , message M l and time-sharing variable Q n to a channel input X n l = φ l (F l , M l , Q n ). Conditioned on a time-sharing sequence Q n = q n , the probability of selecting a codebook F l ∈ [1 :|X l | n2 n R l ] is given by where p X n l |Q n (x n l |q n ) = n i=1 p X l |Q (x l,i |q i ) for some given conditional pmf p X l |Q (x l |q). 4) Relaying functions: The relay nodes receive the outputs of a memoryless interference channel defined by Relay node k, k ∈ K, is unaware of the codebook indices F L = (F 1 , . . . , F L ), and maps its received channel output Y n k ∈ Y n k into an index J k ∈ [1 : 2 nC k ] as J k = φ r k (Y n k , Q n ). The index J k is then sent the to the CP over the error-free link of capacity C k . 5) Decoding function: Upon receiving the indices J K := (J 1 . . . , J K ), the CP estimates the users' messages where g : is the decoding function at the CP. Definition 1: A (n, R 1 , . . . , R L ) code for the studied DM CRAN model with oblivious relay processing and enabled time-sharing consists of L encoding functions φ l : [1 : where the probability is taken with respect to a uniform distribution of messages M l ∈ [1 : 2 n R l ], l = 1, . . . , L, and with respect to independent indices F l , l = 1, . . . , L, whose joint distribution, conditioned on the time-sharing sequence, is given by the product of (1). For given individual fronthaul constraints C K := (C 1 , . . . , C K ), the capacity region C(C K ) is the closure of all achievable rate tuples (R 1 , . . . , R L ).
In this work, we are interested in characterizing the capacity region C(C K ).

B. Some Useful Implications
As shown in [8], the above constraint of oblivious relay processing with enabled time-sharing means that, in the absence of information regarding the indices F L and the messages M L , a codeword x n l ( f l , m l , q n ) taken from a (n, R l ) codebook has independent but non-identically distributed entries. Lemma 1: Without the knowledge of the selected codebooks indices (F 1 , . . . , F L ), the distribution of the transmitted codewords conditioned on the time-sharing sequence are given by Thus, the channel output Y n k at relay k ∈ K is distributed as Proof: The proof of this lemma, whose result was also used in [8], is along the lines of that of [7, Lemma 1] and is therefore omitted for brevity. Remark 1: Equation (7) states that, when averaged over the probability of selecting a codebook F l and over the uniform distribution of the message set, but conditioned on the time-sharing variable Q n , the transmitted codeword X n l has a pmf according to a product distribution p X l |Q of independent but non-identically distributed entries. That is, in the absence of codebook information, the codewords lack structure. When a node is informed of the codebook index F l = f l , the codebook structure is provided by the selected codebook.

A. Capacity Region of a Class of CRANs
In this section, we establish a single-letter characterization of the capacity region of a class of discrete memoryless CRANs with oblivious relay processing and enabled time-sharing in which the channel outputs at the relay nodes are independent conditionally on the users' inputs. Specifically, consider the following class of DM CRANs in which equation (2) factorizes as Equation (8) is equivalent to that, for all k ∈ K and all i ∈ [1 : n], forms a Markov chain. The following theorem provides the capacity region of this class of channels. Theorem 1: For the class of DM CRANs with oblivious relay processing and enabled time-sharing for which (9) holds, the capacity region C(C K ) is given by the union of all rate for all non-empty subsets T ⊆ L and all S ⊆ K, for some joint measure of the form Proof: The proof of Theorem 1 appears in Appendix A.

Remark 2:
Our main contribution in Theorem 1 is the proof of the converse part. As mentioned in Appendix A, the direct part of Theorem 1 can be obtained by a coding scheme in which each relay node compresses its channel output by using Wyner-Ziv binning [21] [7], [12], and [18]. Remark 3: Key element to the proof of the converse part of Theorem 1 is the connection with the Chief Executive Officer (CEO) source coding problem. 3 For the case of K ≥ 2 encoders, while the characterization of the optimal rate-distortion region of this problem for general distortion measures has eluded the information theory for now more than four decades, a characterization of the optimal region in the case of logarithmic loss distortion measure has been 2 The rate region achievable by this scheme for a general DM CRAN, i.e., without the Markov chain (9), is given by Theorem 2. 3 Because the relay nodes are connected to the CP through error-free finiterate links, the scenario, as seen by the relay nodes, is similar to one in which a remote vector source (X n 1 , . . . , X n L ) needs to be compressed distributively and conveyed to a single decoder. There are important differences, however, as the vector source is not i.i.d. here but given by a codebook that is subject to design. provided recently in [19]. A key step in [19] is that the log-loss distortion measure admits a lower bound in the form of the entropy of the source conditioned on the decoders input. Leveraging on this result, in our converse proof of Theorem 1 we derive a single letter upper-bound on the entropy of the channel inputs conditioned on the indices J K that are sent by the relays, in the absence of knowledge of the codebooks indices F L . (Cf. the step (65) in Appendix A). Remark 4: In the special case in which K = L and the memoryless channel (8) is such that Y k = X k for k ∈ K, the source coding counter-part of the problem treated in this section reduces to a distributed source coding setting with independent sources (recall that the users input symbols are independent here) under logarithmic loss distortion measure. Note that, for K > 2 and general, i.e., arbitrarily correlated, sources, the problem appears to be of remarkable complexity, and is still to be solved. In fact, the Berger-Tung coding scheme [22] can be suboptimal in this case, as is known to be so for Korner-Marton's modulo-two adder problem [23].

B. Inner and Outer Bounds for the General DM CRAN Model
In this section, we study the general DM CRAN model (2). That is, the Markov chains given by (9) are not necessarily assumed to hold. In this case, we establish inner and outer bounds on the capacity region that do not coincide in general. The bounds extend those of [7], which are established therein for a setup with a single transmitter and no time-sharing, to the case of multiple transmitters and enabled time-sharing. The following theorem provides an inner bound on the capacity region of the general DM CRAN model (2) with oblivious relay processing and time-sharing. Theorem 2: For the general DM CRAN model (2) with oblivious relay processing and enabled time-sharing, the achievable rate region R CF−JD of the scheme CF-JD is given by the union of all rate tuples (R 1 , . . . , R L ) that satisfy, for all non-empty subsets T ⊆ L and all S ⊆ K, for some joint measure of the form Proof: The proof of Theorem 2 appears in Appendix B.

. , R L ) is achievable then for all non-empty subsets
where u k = f k (w, y k , q) for k ∈ K; for some random variable W and deterministic functions { f k }, for k ∈ K.

Proof:
The proof of Theorem 3 appears in Appendix C.

Remark 6: The inner bound of Theorem 2 and the outer bound of Theorem 3 do not coincide in general. This is because in Theorem 2, the auxiliary random variables
not necessarily hold for the auxiliary random variables of the outer bound. Remark 7: As we already mentioned, the class of DM CRAN models satisfying (9) connects with the CEO problem under logarithmic loss distortion measure. The rate-distortion region of this problem is characterized in the excellent contribution [19] for an arbitrary number of (source) encoders (see [19,Th. 3] therein). For general DM CRAN channels, i.e., without the Markov chain (9) the model connects with the distributed source coding problem under logarithmic loss distortion measure. While a solution of the latter problem for the case of two encoders has been found in [19,Th 6], generalizing the result to the case of arbitrary number of encoders poses a significant challenge. In fact, as also mentioned in [19], the Berger-Tung inner bound is known to be generally suboptimal (e.g., see the Korner-Marton lossless modulo-sum problem [23]). Characterizing the capacity region of the general DM CRAN model under the constraint of oblivious relay processing and enabled time-sharing poses a similar challenge, even for the case of two relays. Finally, we mention that in the context of multi-terminal distributed source coding with general distortion measure, an outer bound has been derived in [24]; and is shown to be tight in certain cases. The proof technique therein is based on introducing a random source X such that the observations at the encoders are conditionally independent on X, i.e., a Markov chain similar to that in (9) holds. Note however that the connection of the outer bound that we develop here for the uplink CRAN model with oblivious relay processing with that of [24] is only of high level nature as the proof techniques are different.

C. On the Suboptimality of Separate Decompression-Decoding and Role of Time-Sharing
For the general DM CRAN model (2), the scheme CF-JD of Theorem 2 is based on a joint decoding of the compression indices and users' messages. That is, the CP performs the operations of the decoding of the quantization codewords and the decoding of the users' messages simultaneously. A more practical strategy, considered also in [7] and [12], consists in having the CP first decode the quantization codewords (jointly), and then decode the users' messages (jointly). That is, compress-and-forward with separate decompression and decoding operations. In what follows, we refer to such a scheme as CF-SD. The following proposition provides the rate-region allowed by this scheme for the DM CRAN model (2).
. It is clear that the rate region R CF-SD of Proposition 1 is contained in that, R CF-JD , of Theorem 2.
As a special instance of the scheme CF-SD, we consider compress-and-forward with successive separate decompression-decoding performs sequential decoding of the quantization codewords first, followed by sequential decoding of the users' messages. More specifically, let π r : K → K and π u : L → L be two permutations that are defined on the set of quantization codewords and the set of user message codewords, respectively. An outline of this scheme, which we denote as CF-SSD, is as follows. The relays compress their outputs sequentially, starting by relay node π r (1). In doing so, they utilize Wyner-Ziv binning [21], i.e., relay node π r (k), k ∈ K, quantizes its channel output Y n π r (k) into a description U n π r (k) taking into account (U n π r (1) , . . . , U n π r (k−1) ) as decoder side information. The CP first recovers the quantization codewords in the same order, and then decodes the users' messages sequentially, in the order indicated by π u , starting by user π u (1). That is, the codeword of user l, l ∈ L, is estimated using all compression codewords (Y n π r (1) , . . . , Y n π r (K ) ) as well as the previously decoded user codewords (X n π u (1) , . . . , X n π u (l−1) ). The rate-region obtained with a given decoding order (π r , π u ) as well as that of the scheme CF-SSD, obtained by considering all possible permutations, are given in the following proposition. Proposition 2: For the general DM CRAN model (2) with oblivious relay processing and enabled time-sharing, the achievable rate region R CF-SSD (π r , π u ) of the scheme CF-SSD with decoding order (π r , π u ) is the union of all rate tuples (R 1 , . . . , R L ) that satisfy, for all l ∈ L and k ∈ K, . The rate region R CF-SSD achievable by the scheme CF-SSD is defined as the union of the regions R CF-SSD (π r , π u ) over all possible permutations π r and π u , i.e., While successive separate decompression and decoding results in a rate region that is generally strictly smaller than that of joint decoding, i.e., with CF-JD, in what follows we show that the maximum sum-rate that is achievable by this specific separate decompression-decoding is the same as that achieved by joint decoding. That is, the schemes CF-SSD and CF-JD achieve the same sum-rate (and, so, so does also the scheme CF-SD). Specifically, let the maximum sum-rate achieved by the scheme CF-JD be defined as Similarly, let the maximum sum rate for the scheme CF-SD be defined as and that of the scheme CF-SSD defined as (21) Proof: The proof of Theorem 4 appears in Appendix D. Remark 8: The proof of Theorem 4 uses properties of submodular optimization; and is similar to that of [12,Th. 2] which shows that CF-JD and CF-SD achieve the same sum-rate for the class of CRANs that satisfy (9). Thus, in a sense, Theorem 4 can be thought of as a generalization of [12,Th. 2] to the case of general channels (2). A generalized successive decompression-decoding scheme (CF-GSD) which allows arbitrary interleaved decoding orders between quantization codewords and users' messages is proposed in [12], which under the sum-rate constraint is also optimal. In general, CF-GSD achieves a larger rate-region that CF-SD and achieves the same rate-region as CF-JD under sum-fronthaul constraint [12,Th. 2]. Remark 9: Theorem 4 shows that the three schemes CF-JD, CF-SD and CF-SSD achieve the same sum-rate and that, in general, the use of time-sharing is required for the three schemes to achieve the maximum sum-rate. Note that the uplink CRAN is a multiple-source, multiple-relay, single-destination network. If all fronthaul capacities were infinite, then the model would reduce to a standard multiple access channel (MAC) and it follows from standard results that time-sharing is not needed to achieve the optimal sum-rate in this case [25]. The reader may wonder whether it is also so in the case of finite-rate fronthaul links, i.e., whether one can optimally set Q = ∅ in the region C(C K ) for sum-rate maximization. The answer to this question is negative for finite fronthaul capacities {C l }, as shown in Section IV. This is reminiscent of the fact that time-sharing generally increase rates in relay channels, e.g., [26], [27]. In addition, when the three schemes CF-JD, CF-SD and CF-SSD are restricted to operate without time-sharing, i.e., Q = ∅, CF-SSD might perform strictly worse than CF-JD and CF-SD. To see this, the reader may find it useful to observe that while time-sharing is not required for sum-rate maximization in a regular MAC, as successive decoding (in any order) is sum-rate optimal in this case, it is beneficial when the sum-rate maximization is subjected to constraints on the users' message rates such as when the users' rates need to be symmetric [28], i.e., the operation point is not in a corner point of the MAC region. Similarly, standard successive Wyner-Ziv (in any order, without time-sharing) is known to achieve any corner point of the Berger-Tung region [29], [30], but time-sharing (or rate-splitting à-la [29]) is beneficial if the compression rates are subjected to constraints such as when the compression rates are symmetric. An example which illustrates these aspects for memoryless Gaussian CRAN is provided in Section IV.

IV. MEMORYLESS MIMO GAUSSIAN CRAN
In this section, we consider a memoryless Gaussian MIMO CRAN with oblivious relay processing and enabled timesharing. Relay node k, k ∈ K, is equipped with M k receive antennas and has channel output where X := [X T 1 , . . . , X T L ] T , X l ∈ C N l is the channel input vector of user l ∈ L, N l is the number of antennas at user l, H k,L := [H k,1 , . . . , H k,L ] is the matrix obtained by concatenating the H k,l , l ∈ L, horizontally, with H k,l ∈ C M k ×N l being the channel matrix connecting user l to relay node k, and N k ∈ C M k is the noise vector at relay k, assumed to be memoryless Gaussian with covariance matrix N k ∼ CN (0, k ) and independent from other noises and from the channel inputs {X l }. The transmission from user l ∈ L is subjected to the covariance constraint, where K l is a given N l ×N l positive semi-definite matrix, and the notation indicates that the matrix

A. Capacity Region Under Time-Sharing of Gaussian Inputs
The memoryless MIMO Gaussian model with oblivious relay processing described by (22) and (23) clearly falls into the class of CRANs studied in Section III-A, since a Markov chain in this order for all k ∈ K. Thus, Theorem 1, which can be extended to continuous channels using standard techniques, characterizes the capacity region of this model. The computation of the region of Theorem 1, i.e., C(C K ), for the model described by (22) and (23), however, is not easy as it requires finding the optimal choices of channel inputs (X 1 , . . . , X L ) and the involved auxiliary random variables (U 1 , . . . , U K ). In this section, we find an explicit characterization of the capacity region of the model described by (22) and (23) in the case in which the users are constrained to time-share only among Gaussian codebooks. That is, for all q ∈ Q and all l ∈ L, the distribution of the input X l conditionally on Q = q is Gaussian (with covariance matrix that can be optimized over so as to satisfy (23)). We denote that region by C G (C K ). Although Gaussian input may generally be suboptimal for uplink CRAN [7], i.e., in general C G (C K ) ⊂ C(C K ), restricting to Gaussian input for every Q = q is appreciable because it leads to rate regions that are less difficult to evaluate. In doing so, we also show that time-sharing Gaussian compression at the relay nodes is optimal if the users' channel inputs are restricted to be Gaussian for all q ∈ Q.
Let, for all l ∈ L, the input X l be restricted to be distributed such that for all Q = q, where the matrices {K l,q } |Q| q=1 are chosen to satisfy The following theorem characterizes the capacity region of the model with oblivious relay processing described by (22) and (23) under the constraint of fixed Gaussian input and given fronthaul capacities C K . Theorem 5: The capacity region C G (C K ) of the memoryless Gaussian MIMO model with oblivious relay processing described by (22) and (23) under time-sharing of Gaussian inputs is given by the set of all rate tuples (R 1 , . . . , for all non-empty T ⊆ L and all S ⊆ K, for some pmf p Q (q) and matrices K q,l and B k,q such that E Q [K l,Q ] K l and 0 B k,q −1 k ; and where, for q ∈ Q and T ⊆ L, Proof: The proof of Theorem 5 appears in Appendix E. Remark 10: Theorem 5 extends the result with oblivious relay processing of [7,Th. 5] to the MIMO setup with L users and enabled time-sharing, and shows that under the constraint of Gaussian signaling, the quantization codewords can be chosen optimally to be Gaussian. Recall that, as shown through an example in [7], restricting to Gaussian input signaling can be a severe constraint and is generally suboptimal.

B. On the Role of Time-Sharing
In Remark 9 in Section III-C we commented on the utility of time-sharing for sum-rate maximization in the uplink of DM CRAN with oblivious relay processing. In this section we investigate further the role of timesharing. Specifically, we first provide an example in which time-sharing increases capacity; and then discuss some scenarios in which time-sharing does not enlarge the capacity region of the memoryless MIMO Gaussian CRAN model with oblivious relay processing described by (22) and (23).
For convenience, let us denote by C no-ts G (C K ) the rate region obtained by setting Q = ∅, i.e, without enabled time-sharing, in the region of Theorem 5. That is, C no-ts G (C K ) is given by the set of all rate tuples (R 1 , . . . , R L ) that for all non-empty for some 0 B k −1 k , k ∈ K. The following example shows that C no-ts G (C K ) may be contained strictly in C G (C K ). Example 1: Consider an instance of the memoryless MIMO Gaussian CRAN described by (22) and (23) in which L = 1, K = 2, M 1 = M 2 = N 1 = 1 (all devices are equipped with single-antennas), the relay nodes have equal fronthaul capacities, i.e., C 1 = C 2 = C, and where E[|X| 2 ] ≤ P and N k ∼ CN (0, 1), for k = 1, 2.
The capacity C G (C) of this one-user Gaussian CRAN example can be obtained from Theorem 5 as the following optimization problem where the maximization is over 0 ≤ b q ≤ 1, 0 ≤ α q ≤ 1 and P q ≥ 0, such that |Q| q=1 α q = 1 and |Q| q=1 α q P q ≤ P. Due to Theorem 4, C G (C) is achievable with CF-JD, CF-SD and CD-SSD by using time-sharing. Without time-sharing, i.e., Q = ∅, the capacity C no-ts G (C) of this one-user Gaussian CRAN example is achievable with the CF-JD scheme and can be obtained easily from (28), as With time-sharing with, say Q = {1, 2}, the user can communicate at larger rates with CF-JD, as follows. The transmission time is divided into two periods or phases, of duration αn and (1 − α)n respectively, where 0 < α < 1. The user transmits symbols only during the first phase, with power P/α; and it remains silent during the second phase. The two relay nodes operate as follows. During the first phase, relay node k, k = 1, 2, compresses its output to the fronthaul constraint C/α; and it remains silent during the second phase.
Observe that with such transmission scheme the input constraint (25) (37), which is a lower bound on C G (C). Observe that while restricting to CF-JD with two-phases might be suboptimal, R two-ph G,CF-JD (C) is very close to C G (C). As it can be seen from the figure, the utility of time-sharing (to increase rate) is visible mainly at small average transmit power. The intuition for this gain is that, for small P, the observations at the relay nodes become too noisy and the relay mostly forwards noise. It is therefore more advantageous to increase the power at P/α for a fraction α of the transmission. Accordingly, the effective compression rate is increased to C/α, therefore reducing the compression noise. This observation is reminiscent of similar ones in [26] in the context of relay channels with orthogonal components and in [27] in the context of primitive relay channels.
When the three schemes CF-JD, CF-SD and CF-SSD are restricted to operate without time-sharing, i.e., Q = ∅, and Gaussian signaling, CF-SD and CF-SSD might perform strictly worse than CF-JD. The rate achievable by the CF-SD scheme without time-sharing follows by Proposition 1, and it is easy to show that it coincides with C no-ts G (C) in (35), i.e., in this example, CF-JD and CF-SD achieve the capacity C no-ts G (C) without time-sharing. The rate achievable by CF-SSD without time-sharing and Gaussian test channels U k ∼ CN (Y k , σ 2 k ), k ∈ K, can be obtained from Proposition 2, as where σ 2 1 = (a 2 P + 1)/(2 C − 1) and σ 2 2 = (a 2 P + 1 − a 4 P 2 (a 2 P + 1 + σ 2 1 ) −1 )/(2 C − 1). Figure 3 shows the capacities C G (C), C no-ts G (C) and the achievable rates R two-ph G,CF-JD (C) and R no-ts G,CF-SSD (C) for a = 1 and C = 6, as function of the transmit power P. Note that CF-SSD, when restricted not to use time-sharing performs strictly worse than CF-JD and CF-SD without time-sharing, i.e., C no-ts G (C). Observe that in this scenario, the gains due to time-sharing are limited. This observation is in line with the fact that for large fronthaul values, the CRAN model reduces to a MAC, for which time-sharing is not required to achieve the optimal sum-rate.
The above shows that in general time-sharing increases rates for the memoryless MIMO Gaussian CRAN model described by (22) and (23), i.e., C no-ts In what follows, we discuss two scenarios in which time-sharing does not enlarge the capacity region of the model given by (22) and (23), i.e., C no-ts G (C K ) = C G (C K ). 1) Case of Fixed Gaussian Codebook at User Side: Consider the scenario in which the users are not allowed to time-share among several Gaussian codebooks, but they are constrained to use each a single, possibly different, Gaussian codebook. This may be relevant, e.g., for contexts in which signaling overhead reduction among the users and relays is of prime interest. Conceptually, this corresponds to equalizing all the covariance matrices {K l,q } for given l and all q = 1, . . . , |Q|. Let K l := K l,1 = · · · = K l,|Q| K l .
The reader may wonder whether allowing the relay nodes to time-share among compression codebooks can be beneficial in this case. Note that the answer to this question is not clear a-priori, because time-sharing in general increases the Berger-Tung rate region if constraints on the rates are imposed. (See Remark 9). The following proposition shows that for the model described by (22) and (23) this does not hold under the constraint (40). Proposition 3: For the model with oblivious relay processing described by (22) and (23), if (40) holds for all l ∈ L then C no-ts G (C K ) = C G (C K ). Proof: The proof of Proposition 3 appears in Appendix F.
2) High SNR Regime: Consider again the model described by (22) and (23). Assume that for all k ∈ K the vector Gaussian noise at relay node k has covariance matrix for some ≥ 0 and˜ k 0 that is independent from . The following proposition shows that, in this case, the benefit of time-sharing in terms of increasing rates vanishes for arbitrarily small . Proposition 4: For the model with oblivious relay processing described by (22) and (23), if for all k ∈ K the vector Gaussian noise at relay node k has covariance matrix that can be put in the form given by (41) for some ≥ 0 and k 0 that is independent from , then the following holds: The proof of Proposition 4 appears in Appendix G.

C. Price of Non-Awareness: Bounded Rate Loss
In this section, we show that for the memoryless MIMO Gaussian model that is given by (22) and (23) allowing the relay nodes to be fully aware of the users' codebooks (i.e., the non-constrained or non-oblivious setting) increases rates by at most a bounded constant (only !). In other terms, restricting the relay nodes not to know/utilize the users' codebooks causes only a bounded rate loss in comparison with maximum rate that would be achievable in the non-oblivious setting. The constant depends on the network size, but is independent of the channel gain matrix, powers and noise levels. The result is an easy combination of a recent improved constant-gap result of Ganguly and Kim [31] (which tightens further that of Zhou et al. [12], see Remark 11 below) with our Theorem 5.
For simplicity, we focus on the case in which N l = N for all l ∈ L and M k = M for all k ∈ K. For the unconstrained case (i.e., with none of the constraints of obliviousness and Gaussian signaling assumed), the capacity region of the model described by (22) and (23), which we denote hereafter as C uncons (C K ), is still to be found in general; and an easy outer bound on it is given by the maximum-flow min-cut bound, i.e., the set R up (C K ) of all rate tuples (R 1 , . . . , R L ) for which for all T ⊆ L and S ⊆ K The following theorem shows that the rate-region of Theorem 5 is within a constant gap from R up (C K ), and so from the capacity region of the unconstrained setting C uncons (C K ).

Remark 11: In the unconstrained case with no time-sharing, Zhou et al. show in [12] (see Theorem 3 therein) that the rate region C no-ts G (C K ) achievable with the scheme CF-JD with Gaussian input and Gaussian quantization is within a constant gap η = (K M + N) of the capacity region C uncons (C K ).
Specifically, for any rate tuple (R 1 , . . . , R L ) ∈ R up (C K ), the tuple (R 1 − η, . . . , R L − η) ∈ C no-ts G (C K ). As we already mentioned, our Theorem 5 shows that under the constraint of Gaussian signaling and oblivious relay processing CF-JD is in fact optimal from a capacity viewpoint. Also, our Theorem 6 improves the gap to the cut-set bound of [12,Th. 3], which in our context can be interpreted as tightening the rate loss that is caused by restricting the relay nodes not to know/utilize the users' codebooks.

D. Numerical Results: Circular Symmetric Wyner Model for CRAN
In this section, we evaluate and compare the performance of some oblivious and non-oblivious schemes for a simple Gaussian CRAN example, the circular symmetric Wyner model shown in Figure 5. There are K cells, with each cell containing a single-antenna user and a single antenna RU. Inter-cell interference takes place only between adjacent cells; and intra-cell and inter-cell channel gains are given by 1 and γ ∈ [0, 1], respectively. All RUs have a fronthaul capacity of C. In this model, the channel output at RU or relay node k ∈ K is given by where Although seemingly simple, the capacity region of this model is still to be found in the case in which the relay nodes are not constrained, i.e., are allowed to perform non-oblivious processing. In what follows, we restrict to studying the maximum per-cell -sum-rates that are offered by various schemes, some of which use only oblivious relay processing and others not. A straightforward upper bound on those per-cell rates is given by the cut-set bound, This model is clearly an instance of the memoryless MIMO Gaussian CRAN described by (22) and (23). Thus, its performance, in terms of per-cell capacity C G (C), under oblivious relay processing with time-sharing of Gaussian inputs can be obtained easily using Theorem 5 as where H S c is the submatrix of H composed by only those rows of H that are in the subset S c , and the maximization is over 0 ≤ b q ≤ 1, 0 ≤ α q ≤ 1 and P q ≥ 0 such that For non-oblivious schemes, we consider mainly the following two schemes:

1) Decode-and-Forward (DF): This scheme proposed
in [9] is based on the fact that the output at each relay node can be seen as that of a three user Gaussian multiple-access channel. Relay k decodes the message from user k by either treating interference from users [k − 1] K and [k + 1] K as noise, or by jointly decoding all three messages. Then, it forwards message k to the CP. This scheme yields the per-cell rate [9] R DF (C) := min{max{R tin , R joint }, C} (52a) 2) Compute-and-Forward (CoF): This scheme, proposed in [4], is based on nested lattice codes. The users transmit using the same lattice code. Then, each relay node decodes one equation (with integer-valued coefficients) that relates the users symbols and forwards that equation to the CP. If the collected K equations are linearly independent, the CP can invert the system and obtain the transmitted symbols. For the studied example, this yields [6] R CoF (C) = min C, where the set B is given by For comparison reasons, we also consider the following oblivious schemes: 1) CF-JD with |Q| = 2: It is easy to see that the per-cell sum-rate achievable using the CF-JD scheme with time-sharing between two phases in which users and relays are active during the first phase and remain silent in the second as in Example 1 is given by 2) CF-SD without time-sharing: The per-cell rate achievable by CF-SD without time-sharing and Gaussian test channels where σ 2 * is the unique solution of the equation K C = log det(I + (1/σ 2 * )(PHH H + I)).

3) CF-SSD without time-sharing:
The per-cell rate achievable by CF-SSD without time-sharing and Gaussian test channels U k ∼ CN (Y k , σ 2 k ), k ∈ K follows from Proposition 2, as where

4) CF-PtP without time-sharing:
A simplified version of CF-SSD, to which we refer as "Compress-and-Forward with Point-to-Point compression" (CF-PtP), is one in which each relay node compresses its channel output using standard compression, i.e., without binning. The per-cell rate R no-ts CF-PtP (C) allowed by this scheme is given as in (57) with D = (2 C − 1)/(2 C + P(1 + 2γ 2 ))I.
(60) Figure 5 depicts the evolution of the per-cell rates obtained using the above discussed oblivious and non-oblivious schemes, as well as the cut-set bound, for numerical values K = 3, γ = 1/ √ 2 and C = 3.5, as function of the user transmit power P. As it can be seen from the figure, for this example the loss in performance, in terms of per-cell rate, that is caused by constraining the relay nodes to implement only oblivious operations is less than 1.7743 bits. Also, time-sharing is generally beneficial, in the sense that the discussed oblivious schemes generally suffer some (small) rate-loss when constrained not to employ time-sharing. Figure 6 shows how the rates offered by the aforementioned oblivious and non-oblivious schemes scale with the signalto-noise ratio, when the available per-link fronthaul capacity scales logarithmically with the available user transmit power as C = 5 log 10 (P). As the figure illustrates, in opposition with non-oblivious schemes such as decode-and-forward and compute-and-forward, oblivious processing also has the advantage to cause no loss in terms of degrees of freedom.

V. CONCLUDING REMARKS
We close this paper with some concluding remarks. Our results shed light (and sometimes determine exactly) what operations the relay nodes should perform optimally in the case in which transmission over a cloud radio access network is under the framework of oblivious processing at the relays, i.e., the relays are not allowed to know, or cannot acquire, the users' codebooks. In particular, perhaps non-surprisingly, it is shown that compress-and-forward, or variants of it, generally perform well in this case, and are optimal when the outputs at the relay nodes are conditionally independent on the users inputs. Furthermore, in addition to its relevance from a practical viewpoint, restricting the relays not to know/utilize the users' codebooks causes only a bounded rate loss in comparison with the non-oblivious setting (e.g., compress-andforward and noisy network coding perform to within a constant gap from the cut-set bound in the Gaussian case).
Finally, leveraging on the now known connection of the information bottleneck method (IB) [32] (see [33], [34] for an earlier equivalent formulation of the IB problem in the context of source coding and investment theory, respectively) with the CEO source coding problem with logarithmic loss and one that can be established with the CRAN channel coding problem with oblivious relay processing, we note that the results of this paper, and the proof techniques, translate easily into analogous ones for the problem of distributed information bottleneck. In this problem, multiple sensors compress separately their observations in a manner that, collectively, the compressed signals provide as much information as possible about a remote (or hidden) source. On this aspect, the reader may refer to [35] and [36] where a full characterization of the optimal tradeoffs among the minimum description lengths at which the features are described (i.e., complexity) and the information that the latent variables collectively preserve about the target variable (i.e., accuracy or relevant information) are established for both DM and Gaussian models, together with Blahut-Arimoto type algorithms and neural network based representation learning algorithms, that allow to compute optimal tradeoffs. The results of [35] and [36] generalize those for the single user DM IB problem [32] and the single-user scalar [34] and vector Gaussian IB problem [37] to the distributed scenario. Since the single-encoder IB method has found application in various contexts of learning and prediction [38], such as word clustering for text classification [39], community detection [40], neural code analysis [41], speech recognition [42] and others, distributed IB methods clearly finds usefulness in the extensions of those applications to the distributed case.
Among interesting problems that are left unaddressed in this paper that of characterizing optimal input distributions under rate-constrained compression at the relays where, e.g., discrete signaling is already known to sometimes outperform Gaussian signaling for single-user Gaussian CRAN [7]. Alternatively, one may consider finding the worst-case noise under given input distributions, e.g., Gaussian, and rate-constrained compression at the relays. Also, although it is still not clear whether the known multiaccess/broadcast (MAC/BC) duality extends to one between uplink and downlink CRAN models in general [43], it is expected that the approach of this paper be instrumental towards characterizing the effect of the relay nodes being oblivious to the actual codebooks used by the users in the downlink setting, especially in the case in which the connection between the CP and the relay nodes are not wired.

A. Proof of Direct Part of Theorem 1
We derive the rate region achievable by the CF-JD scheme for the class of DM CRAN models satisfying (9) using the inner bound derived in Theorem 2 for the general DM CRAN model. It follows from Theorem 2 that the rate region in Theorem 1 is achievable by noting that for the class of DM CRAN models satisfying (9), we have where (61) follows due to the Markov chains (given Q), This concludes the proof.

B. Proof of Converse Part of Theorem 1
Assume the rate tuple (R 1 , . . . , R L ) is achievable. Let T ⊆ L, S ⊆ K, with T , S = ∅, and J k := φ r k (Y n k , Q n ) be the message sent by relay k, k ∈ K, F L be the codebook indices, and letQ := Q n be the time-sharing variable. For simplicity, let X n L := (X n 1 , . . . , X n L ), R T := t ∈T R t and C S := k∈S C k . Define From Fano's inequality, we have with n → 0 for n → ∞, for all T ⊆ L, We start by showing the following inequality which will be instrumental in the rest of this proof.
Inequality (65) can be shown as follows.
where (66) follows since m T are independent; (68) follows since m T is independent ofQ and F T c ; (69) follows from (63); (72) follows since m T is independent of F L ; (74) follows from the data processing inequality;(76) follows since X n T c , F T c are independent from X n T and since conditioning reduces entropy and; (77) follows due to the Markov chain Then, from (77) we have (65) as follows: where (81) is due to Lemma 1. We pause to mention that, for a subset T ⊆ L, inequality 65 provides a lower bound on the term in the RHS of it in terms of a conditional entropy term; and, as such, it is reminiscent of the result of [19,Lemma 1] which states that for the CEO problem with logarithmic loss fidelity measure the expected distortion admits a lower bound in the form of a conditional entropy term, namely the entropy of the remote source conditioned on the CEO's inputs. Continuing from (77), we have where (84) follows due to Lemma 1; and (85) follows since conditioning reduces entropy.
On the other hand, we have the following equality where (87) follows due to the Markov chain, for k ∈ K, and since J k is a function of Y n k ; and (89) follows due to the Markov chain Y k,i − X n L − Y i−1 k which follows since the channel is memoryless.
Then, from the relay side we have, for S = ∅ where (94) follows since J S is a function of Y n S ; (96) follows from (65); (97) follows since conditioning reduces entropy; and (98) follows from (65) and (90).
Note that, in general,Q i is not independent of X L,i , Y S,i , and that due to Lemma 1, conditioned onQ i , we have the Markov chain and similarly, from (98), we have for S = ∅ This completes the proof of Theorem 1.

APPENDIX B PROOF OF THE INNER BOUND IN THEOREM 2
The scheme CF-JD employed in Theorem 2 for the general DM CRAN model generalizes [7,Th. 3] to the case of multiple users and enabled time-sharing. An outline of this scheme is as follows. User l, l ∈ L, sends X n l (m l , f l , q n ), where m l ∈ [1 : 2 n R l ] is the users' message, f l ∈ [1 : |X l | 2 n R l ] is the codebook index and q n ∈ Q is the time-sharing sequence. Relay node k, k ∈ K, compresses its channel output Y n k into a description U n k of compression rateR k indexed by i k ∈ [1 : 2 nR k ]. The descriptions are randomly binned into 2 nC k bins, indexed by a Wyner-Ziv bin index j k ∈ [1 : 2 nC k ]. Relay node k forwards the bin index j k of the bin containing the description U n k to the CP over the error-free link. The CP receives ( j 1 , . . . , j K ) and decodes jointly the compression indices and the transmitted messages, i.e., it jointly recovers the indices (m 1 , . . . , m L , i 1 , . . . , i K ). The detailed proof is as follows.
Fix δ > 0, non-negative rates R 1 , . . . , R K and a joint pmf that factorizes as Codebook Generation: Randomly generate a time-sharing sequence q n according to n i=1 p Q (q i ). For user l, l ∈ L and every codebook index F l , randomly generate a codebook C l (F l ) consisting of a collection of 2 n R l independent codewords {x n l (m l , f l , q n )} indexed with m l ∈ [1 : 2 n R l ], where x n l (m l , f l , q n ) has its elements generated i.i.d. according to Let non-negative ratesR 1 , . . . ,R K . For relay k, k ∈ K, generate a codebook C r k consisting of a collection of 2 nR l independent codewords {u n k (i k )} indexed with i k ∈ [1 : 2 nR k ], where codeword u n k (i k ) has its elements generated i.i.d. according to n i=1 p(u i |q i ). Randomly and independently assign these codewords into 2 nC k bins {B j k }, indexed with j l ∈ [1 : 2 nC k ], and containing 2 n(R k −C k ) codewords each. Encoding at User l: Let (m 1 , . . . , m L ) be the messages to be sent and ( f 1 , . . . , f L ) be the selected codebook indexes. User l ∈ L, transmits the codeword x n l (m l , f l , q n ) in codebook C l ( f l ). Oblivious processing at Relay k: Relay k finds an index i k such that u n k (i k ) ∈ C r k is strongly -jointly typical with y n k . Using standard arguments, this can be accomplished with vanishing probability of error as long as n is large and Let j k ∈ [1 : 2 nC k ] be the index such that u k (i k ) ∈ B j k . Relay k then forwards the bin index j k to the CP through the error-free link.
Decoding at CP: The CP collects all the bin indices j K = ( j 1 , . . . , j K ) from the error-free link and finds the set of indicesî K = (î 1 , . . . ,î K ) of the compressed vectors u n K and the transmitted messagesm L = (m 1 , . . . ,m L ), such that x n l (m l , f l , q n ) ∈ C l ( f l ) for l ∈ L.
An error event in the decoding is declared ifm L = m L or if there is more than one suchm L . The decoding event can be accomplished with vanishing probability of error for sufficiently long n as shown next. Assume that for some T ⊆ L and S ⊆ K, we havem T = m T and belongs, with high probability, to a typical set with distribution The probability that the tuple ) is strongly -jointly typical is, according to [7,Lemma 3], upper bounded by Overall, there are 2 n( j ∈T R j + s∈S [R s −C s ]) − 1, of such sequences in the set B j 1 × · · · × B j K . This means that the CP is able to reliably decode m L and i K , i.e., that the decoding event has vanishing probability of error for sufficiently long n, as long as (R 1 , . . . , R L ) satisfy, for all T ⊆ L and for all where (110) follows from (103) and due to the independence of X t with X l , l = t; (112) is due to the independence of X T c and X T ; and (115) follows due to the Markov chains (given This completes the proof of Theorem 2.

APPENDIX C PROOF OF THE OUTER BOUND IN THEOREM 3
The proof of this theorem is along the lines of that of Theorem 1. In the following, we outline the similar steps and highlight the differences. Suppose the tuple (R 1 , . . . , R L ) is achievable. Let T be a set of L, S be a non-empty set of K, and J k := φ r k (Y n k , q n ) be the message sent by relay k ∈ K, and letQ := Q n be the time-sharing variable. Define for k ∈ K and i ∈ [1 : n], From Fano's inequality, we have with n → 0 for n → ∞, for all T ⊆ L, Similarly to (65), we have the following inequality Then, we have where (122) follows as in (66)-(77); (125) follows due to Lemma 1 and (126) follows since conditioning reduces entropy.
On the other hand, we have the following inequality Then, from the relay nodes side we have, where: (138) follows since J S is a function of Y n S ; (142) follows from (120); (144) follows since conditioning reduces entropy; and (145) follows from (120) and (134).
We define the standard time-sharing variable Q uniformly distributed over {1, . . . , n}, X L := X L,Q , Y k := Y k,Q , U k := U k,Q and Q := [Q Q , Q ] and we have from (129) and (147), Define , and note that, due to Lemma 1, X L,Q and Y K,Q are independent of W := W Q when not conditioned on F L . Note that in general,Q Q is not independent of X L,Q , Y K,Q . Then, conditioned on Q, the auxiliary variables U k,Q satisfies Therefore, conditioned onQ i , for k ∈ K the following Markov chains hold This completes the proof of Theorem 3.

APPENDIX D PROOF OF THEOREM 4
Since R sum, CF-SSD ≤ R sum, CF-SD ≤ R sum, CF-JD , to prove that CF-SD and CF-SSD achieve the same sum-rate as CF-JD, it suffices to show R sum, CF-SSD ≥ R sum, CF-JD . To that end, let us define the following regions, representing the sum-rate achievable by CF-JD and CF-SSD. Definition 3: Let R sum, CF-JD be the union of tuples (R, C 1 , . . . , C K ) that satisfy, for all S ⊆ K, for some joint measure of the form p(q) L l=1 p(x l |q) p(y K |x L ) K k=1 p(u k |y k , q). Definition 4: The region R sum, CF-SSD is defined as the union of the regions R sum, CF-SSD (π r ) over all possible permutations π r , i.e., R CF-SSD = π r R CF-SSD (π r ), where we let R sum, CF-SSD (π r ) with decoding order (π r ) be the union of tuples (R, C 1 , . . . , C K ) that satisfy, for all S ⊆ K, for some pmf p(q) L l=1 p(x l |q) p(y K |x L ) K k=1 p(u k |y k , q). We prove R sum, CF-SSD ⊇ R sum, CF-JD using the properties of submodular optimization. To this end, assume (R sum , C 1 , . . . , C K ) ∈ R sum, CF-JD for a joint pmf p(q) L l=1 p(x l |q) K k=1 p(u k |y k , q). For such pmf, let P R ∈ R K + be the polytope formed by the set of pairs (C 1 , . . . , C K ) that satisfy, for all S ⊆ K,  , C 1 , . . . , C K ) ∈ R sum, CF-SSD for which C k ≤ C k , for k ∈ K, and R sum ≥ R sum .
To show (R sum , C 1 , . . . , C K ) ∈ R sum, CF-SSD , it suffices to show that each extreme point of P R is dominated by a point in R sum, CF-SSD that achieves a sum-rateR sum satisfyingR sum ≥ R sum .
Next, we characterize the extreme points of P R . Let us define the set function g : 2 K → R: It can be verified that the function g + (S) := max{g(S), 0} is a supermodular function (see [19, Appendix C, Proof of Lemma 6]. 4

)
We can rewrite (157) as follows. For each S ⊆ K, we have where (159) follows due to the Markov chain Then, by construction, P R is equal to the set of (C 1 , . . . , C K ) satisfying for all S ⊆ K, Following the results in submodular optimization [12, Appendix B, Proposition 6], we have that for a linear ordering i 1 ≺ i 2 ≺ · · · ≺ i K on the set K, an extreme point of P R can be computed as follows for k = 1, . . . , K : All the K ! extreme points of P R can be enumerated by looking over all linear orderings i 1 ≺ i 2 ≺ · · · ≺ i K of K. Each ordering of K is analyzed in the same manner and, therefore, for notational simplicity, the only ordering we consider is the natural ordering i k = k. By construction, Let j be the first index for whichC j > 0, i.e., the first k for which g({1, . . . , j }) > 0. Then, it follows from (163) that where (165) follows from 158; and (166) follows due to the Markov Chain Moreover, since we must have g({1, . . . , j }) ≤ 0 for j < j ,C j can be expressed as where α ∈ (0, 1] is defined as α := −g({1, . . . , j − 1}) Therefore, for the natural ordering, the extreme point (C 1 , . . . ,C K ) is given as (C 1 , . . . ,C K ) (172) = 0, . . . , 0, (1−α)I (Y j ; U j |U K j +1 ,Q),I (Y j+1 ;U j +1 |U K j+2 ,Q), . . . , I (Y K −1 ; U K −1 |U K , Q),I (Y K ; U K |Q) .
We consider an instance of the CF-SSD in which for a fraction α of the time, the CP decodes U n j +1 , . . . , U n K while relays k = 1, . . . , j are inactive. For the remaining fraction of time (1 − α), the CP decodes U n j , . . . , U n K and relays k = 1, . . . , j − 1 are inactive. Then, the CP decodes X L .
Formally, we consider the pfm p(q ) L l=1 p(x l |q ) K k=1 p(u k |y k , q ) for CF-SSD as follows. Let B denote a Bernoulli random variable with parameter α ∈ (0, 1], i.e., B = 1 with probability α and B = 0 with probability (1 − α). We let α as in (171). We consider the reverse ordering π r such that π r (1) = K , π r (2) = K − 1, . . . , π r (K ) = 1, i.e., compression is done from relay K to relay 1. Then, we let Q = (B, Q) and the tuple of random variables be distributed as Then, for k = 1, . . . , j − 1, we have where (178) follows since U k = ∅ for k < j independently of B. For k = j + 1, . . . , K , we have where (182) follows since U k = U k for k > j independently of B. For k = j , we have where (186) follows since U j = ∅ for B = 1 and U j = U j for B = 0.
Therefore, from (178), (182), (186) and (192), it follows that the extreme point (C 1 , . . . ,C K ) ∈ P R is dominated by the point (R sum , C 1 , . . . , C K ) ∈ R sum, CF-SSD satisfyinḡ R sum ≥ R sum . Similarly, considering all possible orderings, each extreme point of P R can be shown to be dominated by a point (R sum , C 1 , . . . , C K ) which lies in R sum, CF-SSD (associated to a permutation π r ). This completes the proof of Theorem 4.

APPENDIX E PROOF OF THEOREM 5
The proof is along the lines of the proofs of [12,Th. 4] and [44,Th. 8], and uses the relations between the MMSE and the Fischer information matrix developed in [44] and a reparametrization of the MMSE matrix from [12], but differs from them to account for the time-sharing variable Q. We will use the following lemmas. Lemma 2: [44], [45]: Let (X, Y) be a pair of random vectors with pmf p(x, y). We have log |(πe)J −1 (X|U)| ≤ h(X|U) ≤ log |(πe)mmse(X|U)|. (196) First, we derive an outer bound on the capacity region of the memoryless Gaussian MIMO model described by (22) and (23) under time-sharing of Gaussian inputs by deriving an outer bound on the rate region given in Theorem 1 under input constraints (24) and (25). Then, we show that this outer bound is achievable by time-sharing of Gaussian inputs.
For a fixed Q = q, let us define Y k,q := H k,L X L,q +N k and X L,q := [X n 1 , . . . , X n L |Q = q] T . For fixed Gaussian distribution X L,q ∼ CN (0, K L,q ) and distribution K k=1 p(ŷ k |y k , q), let us choose B k,q satisfying 0 B k,q −1 k such that for k ∈ K, mmse(Y k,q |X L,q , U k,q ) = k − k B k,q k .
(197) Such B k,q always exists since 0 mmse(Y k,q |X L,q , U k,q ) k for all q ∈ Q and k ∈ K. Next, we derive the following equality. For q ∈ Q, and for all T ⊆ L and S ⊆ K, we have satisfying (40). We use the following lemma, which can be readily proven by the application of Weyl's inequality [46]. Lemma 4: Let A and B be two m × m positive-definite matrices satisfying B A. Then for any m×m positive-definite matrix C, we have |I + BC| ≥ |I + AC|.
In the following we show C G (C K ) ⊆ C no-ts G (C K ). Let us defineB k := q∈Q p(q)B k,q for 0 B k,q = Next, we show that in the high SNR regime, i.