On the Compound Broadcast Channel: Multiple Description Coding and Interference Decoding

This work investigates the general two-user Compound Broadcast Channel (BC) where an encoder wishes to transmit common and private messages to two receivers while being oblivious to two possible channel realizations controlling the communication. The focus is on the characterization of the largest achievable rate region by resorting to more evolved encoding and decoding techniques than the conventional coding for the standard BC. The role of the decoder is first explored, and an achievable rate region is derived based on the principle of"Interference Decoding"(ID) where each receiver decodes its intended message and chooses to (non-uniquely) decode or not the interfering message. This inner bound is shown to be capacity achieving for a class of non-trivial compound BEC/BSC broadcast channels while the worst-case of Marton's inner bound -based on"Non Interference Decoding"(NID)- fails to achieve the capacity region. The role of the encoder is then studied, and an achievable rate region is derived based on"Multiple Description"(MD) coding where the encoder transmits a common as well as multiple dedicated private descriptions to the many instances of the users channels. It turns out that MD coding outperforms the single description scheme -Common Description (CD) coding- for a class of compound Multiple Input Single Output Broadcast Channels (MISO BC).

transmit both a common message W 0 and two private messages W 1 (resp.) W 2 , each dedicated to a user observing the channel output Y (resp.) Z . Following this seminal work, intensive research was undertaken to characterize the capacity region of this setting which implies the design of efficient interference mitigation techniques.
In this work, we study the two-user compound BC wherin an encoder wishes to communicate two private messages W 1 (resp. W 2 ) to two users which can each observe the output of one of many possible channel statistics (Y 1 , . . . , Y J ) (resp.(Z 1 , . . . , Z K )). The actual channel statistic controlling the communication is unknown to the transmitter, however, it is assumed to remain constant during the communication, and the aim is to ensure reliable communication whatever the channel realization. The compound channel model is relevant whenever the transmitter fails to acquire a perfect estimate of the channel but knows only a subset, or an interval, to which it belongs. Finite rate feedback from the receiver to the transmitter, which relies on a quantization step, might be the most realistic scenarios in which compound channels are encountered. It is also well understood that, when interested in the maximum error probability, the compound BC is equivalent to a BC with multiple users and common messages. Thus, the channel uncertainty in the compound BC is equivalent to multicasting in a multi-user scenario.
Let us first briefly discuss the optimal coding schemes for the two-user BC, reported also partly in [2]. Although the capacity region of the BC remains an open problem to date, Marton established in [3] an achievable rate region for the general two-user BC based on the notion of random binning and superposition coding, with common and private messages, which is commonly referred to as Marton's inner bound. This inner bound remains the best hitherto known in literature while the best outer bound on the capacity region of the BC is due to Nair & El Gamal [4]. These two bounds were shown to coincide for several classes of ordered channels, e.g., degraded, less-noisy, and more-capable BCs (see [5] and references therein) and more recently [6], for essentially less noisy and essentially more capable BCs. The optimal coding scheme for such ordered BCs relies on superposition coding scheme at the encoder and allowing the user with the best channel observation to decode the interfering message, i.e., that of the opposite user, in addition to its intended message. Marton's inner bound also proved to be capacity-achieving for some nonordered channels: the deterministic and semi-deterministic BC in [3] and [7], the MIMO BC in [8], as well as the product and sum of two unmatched channels in [9]. For the latter channel models, it is rather random binning that proves to be 0018-9448 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
In the works listed herebefore, the channel statistic is perfectly known to the transmitter and thus, the encoder can exploit this knowledge to allow for an efficient interference mitigation scheme. Indeed, in all settings where Marton's inner bound is tight, the construction of the optimal auxiliary code depends on the prior knowledge of either the channel output statistic (e.g. deterministic and semi-deterministic BCs) or a function of these statistics (e.g. users' ordering in ordered single antennas BCs). When the transmitter has no Channel State Information (no-CSIT), very few capacity results are known for the compound BC among which are the two results in [10] and [11]. The coupling between interference and channel uncertainty calls for more involved coding schemes that Marton's coding scheme, which will be the focus of our contribution.

A. Related Works
We investigate in this work more involved encoding and decoding techniques than the usual coding schemes that proved to be capacity-achieving for some classes of BCs, namely, Interference Decoding (ID) and Multiple Description (MD) coding.
The idea behind ID was first introduced in [12] for Gaussian settings and formalized later in [13], and it consists in a combination of non-unique decoding with the possibility at each receiver to decode, or not, the interfering messages intended to the others users. The benefits of this decoding scheme results not only from non-unique decoding [14] but it follows essentially from allowing to decode, or not, interference. Note here that the straight-forward extension of the results of [13] to the BC is not geenral enough for it encompasses only superposition coding but fails at including random binning. Nevertheless, it provides an interesting insight on how to recover a superposition coding like inner bound while keeping a symmetric encoding, which will be crucial in the construction of our Interference Decoding scheme.
While ID allows to choose between decoding or not the interference, authors in [15] suggested a coding scheme which relies on decoding part of interference, which includes decoding interference or not decoding it. Though the setting investigated therein is fundamentally different from the compound BC we are investigating here, the results they suggest are strongly related to ours and will be thoroughly compared to them later on. The idea behind the coding scheme in [15] is to introduce an additional auxiliary codeword in the encoding step which is meant to carry part of the interferinterfering message and to be decoded at only one user. This renders the exentsion of the inner bound to multiple users very hard since it involves extra superposition and binning operations. Thus, we will not pursue this direction in our work, but will restrict to Marton's encoding scheme whilst improving on Marton's inner bound through ID.
Authors in [16] derived an inner bound based on coset codes for the three users BC. The idea behind using coset codes is to allow the users to decode a compressive function of the interfering messages of other users, for instance, w 2 ⊕ w 3 , and thus a complete cancellation of interference with less impediment to the information rates than fully decoding the interfering messages, i.e., decoding w 2 and w 3 . A class of 3 users BCs is proposed where two links are interference free and for which the straightforward extension of Marton's coding scheme is strictly suboptimal as compared to the coset coding. Such a coding technique based on coset codes, proves to be useful when there are many interfering messages, however, it does not enlarge Marton's inner bound in the two user's case. Yet this work presents the first class of 3 users BC for which Marton's inner bound, with many common layers messages, is strictly sub-optimal.
When the channels are not ordered, e.g, MISO BC, designing optimal coding schemes with no-CSIT calls for more involved encoding strategies. The intuition follows from the analysis of the effect of channel uncertainty on the Degrees of Freedom (DoF), which are very insightful to understand how interference should be managed in multiple antennas settings. For finite state compound settings, Weingarten et al. had first derived both inner and outer bounds on the DoF region and on the sum-DoF of the compound MISO BC [17] with some cases of optimality. The outer bound derived therein was conjectured to be loose, but later Gou et.al [18] and Maddah-Ali [19] proved the optimal DoF region of the generic compound MISO BC, both in the complex and in the real settings, to match this outer bound. The achievability of the optimal DoF relies on either a linear or a non-linear coding scheme combined with block expansion (coding over many time slots) in [17] while the proof made in [19] resorts to number theory tools and consists in Interference Alignment (IA) over rational dimensions of the real numbers (see also [20]). When the states span an infinite set, i.e., in the ergodic setting, DoF become limited. Indeed, in [21], it is shown that with Rayleigh fading channels, the sum-DoF collapses to the number of transmit antennas: time-sharing is optimal. A few more works deal with various models of the amount and accuracy of CSI available at the transmitter, e.g. [22]. It turns out that richer encoding strategies, like interference alignment along with block expansion are crucial in dealing with interference, and thus, any optimal scheme for the finite power limited MISO BC should encompass such DoF-optimal schemes.

B. Our Contribution
In this work, we explore the role that two main interference mitigation techniques can play in the compound BC setup, and show that, by operating clever optimization either on the encoding or on the decoding side, we can alleviate the effect of uncertainty when coupled with interference in two different ways.
We first start by deriving a rate region that takes advantage of the combination of Marton's random binning, and superposition coding with the choice of decoding or not interference which we denote by ID. We show that for the compound BC -unlike the standard two-user BC-ID can strictly outperform Marton's standard coding scheme based only on random binning, and superposition coding, denoted later by No-Interference Decoding (NID). This results from that ID allows a symmetric encoding and deals with the source's uncertainty by relegating the asymmetric decoding to the receive terminals. To illustrate clearly the gain of ID over NID, we investigate a class of discrete ordered compound Binary Erasure/ Binary Symmetric (BEC/BSC) BCs for which we derive the capacity region resorting to ID and show that NID yields a strictly sub-optimal rate region.
As for the class of non-oredered compounds BCs, more involved encoding schemes need to be investigated since we need to precode against interference rather than to decode it. Since each channel statistic involves a distinct intereference signal, precoding against interference with only one common auxiliary code, i.e., Common Description (CD) coding, might be inefficient. For this reason, we look at the role that Multiple Description (MD) coding can play in the non-ordered compound BC, where the encoder precodes against interference differently for the many channel statistics of each user through private descriptions each tailored to one channel statistic. We follow a similar approach to that in [23] where MD coding has already proved to be useful over compound state-dependent channels. We prove that MD coding is beneficial as compared with CD coding [3] and we illustrate this for a class of compound Gaussian MISO BC under a specificDirty-Paper Coding (DPC) scheme [24].
Finally, we discuss the relative behaviour of ID and MD coding techniques and present a brief example to support their exclusive inclusion.
The remainder of this paper is organized as follows. Section II presents the system model and provides basic definitions as well as a simple outer bound on the capacity region of a general compound BC. In Section III, we study the utility of ID for the compound BC. We start by deriving the ID inner bound in Section III-A and show in Section III-B that ID is capacity achieving for a class of discrete compound BCs while NID is strictly sub-optimal. Next, in Section IV, we introduce MD coding and specialize it to the compound Gaussian MISO BC in Section V. The performances of these two inner bounds are then compared to the outer bound presented in Section V-G. Last, we compare the relative behavior of both the ID and MD inner bounds in Section VI-A and end with summary and discussion in Section VI-B.

Notations
The term p.m.f will refer to probability mass function. Random variables (resp. their realizations) are denoted by upper (resp. lower) case letters. Vectors are denoted by bold font characters and RV stands for random variable, ARV for Auxiliary Random Variable, while FME stands for Fourier Motzkin Elimination. For any sequence (x i ) i∈N + , notation x n k stands for the collection (x k , x k+1 , . . . , x n ). x n 1 is simply denoted by x n . Entropy is denoted by H (·), and mutual information by I (·; ·) while differential entropy is denoted by h(·). E (resp. P) denote the expectation (resp. the generic probability) measure while the notation P X is specific to the of a RV X. |X | stands for the cardinality of the set X . We denote typical and conditional typical sets by T n δ (X) and T n δ (Y |x n ), respectively (see Appendix A for details). Let X, Y and Z be three RVs on some alphabets with joint probability distribution P XY Z . If P X |Y Z (x|yz) = P X |Y (x|y) for all x, y, z, then they form a Markov chain, which is denoted by X − − Y − − Z . The binary entropy function H 2 is defined ∀x ∈ [0 : 1] by , and the binary convolution operator () as: For two channels with outputs Y 1 and Y 2 , Y 2 Y 1 means Y 1 is less noisy than Y 2 . h t is to be understood as the transpose of the real valued vector h. Let B u be a unit norm 2 × 1 column vector. We denote the scalar product between vectors B u and h j by h j,u = h t j B u .

II. PROBLEM DEFINITION
Consider the compound BC problem which consists in one source terminal and two distinct receivers each observing one of many possible channel outputs and where the source wishes to communicate two private messages W 1 and W 2 , one to each receiver. This setup is equivalent to a setting where each user is represented by multiple users that are interested in the same message W 1 or W 2 .

A. Definition of the Compound BC
• Consider a collection of n-th extensions of discrete memoryless BCs (defined by a pmf and the input and output alphabets) {W n j,k } j ∈J ,k∈K = P Y n j Z n k |X n X n , Y n , Z n , defined by the conditional p.m.fs: • Users' pair of index ( j, k) takes values in the finite set of indices J × K [1 : J ] × [1 : K ]. • An (M 1n , M 2n , n)-code for this channel consists of: two sets of messages M 1 and M 2 , an encoding function that assigns an n-sequence x n (w 1 , w 2 ) to each pair of messages (w 1 , w 2 ) ∈ M 1 × M 2 and decoding functions, one at each receiver, that assign to the received signal an estimate of its intended message or an error. The probability of error is given by: The capacity region is the conrex hull of the set of all achievable rate pairs (R 1 , R 2 ) and is denoted as C J ×K .

B. Outer Bound on the Capacity Region of the Compound BC
We derive in this section an intuitive outer bound on the capacity region of the compound BC. This outer bound results from a straightforward extension to the compound setting of the best-known outer bound on the capacity region of the BC. It will be useful in the examples we investigate later.
Let the rate region R ( j,k) NEG denote the outer bound derived in [4] applied to each pair of users with index ( j, k). For the private message setup, the rate region is given by for a specific joint p.m.f P QU V X . A simple outer bound on the capacity region of the compound BC is stated in the following theorem.
Theorem 1 (Outer bound). The capacity region of the two-user compound BC C J ×K verifies:

Remark 2.
It is worth mentioning that when the compound BC consists in only one BC, the outer bound [4] was not proven to be tight in general. For ordered compound setups, the fact of optimizing the common auxiliary RV Q for each channel with index ( j, k), prevents this outer bound from being tight since the encoder is oblivious to the actual channel realization. For instance, it cannot optimize the code for each of the possible channels instances. However, this bound can still be tight in some cases of interest as will be clarified later on.
Proof: We need to recall that the proof in [4] of the outer bound for users' pair ( j, k), uses the specific choice of auxiliary RV: Here, we notice that the auxiliary RV (U i , V i ) do not the depend on the users' pair index. Thus, we can show that for all channel indices ( j, k) with the specific choice: ). Thus, we could possibly factor the resulting joint p.m.f on (U i , V i ) over all compound channel indices, and let only the common variable choice vary from one channel to another. Moreover, we can show in the same fashion as in [4,Lemma 3.2], that the maximizing distribution of the input P X |QU V is a deterministic mapping.

III. ID FOR THE COMPOUND BROADCAST CHANNEL
In this section, we derive an inner bound on the capacity region of the compound which relies on Marton's encoding scheme with common and private codewords, generated and mapped via superposition coding and random binning, but resorts to ID at the decoders.

A. ID Inner Bound
The inner bound we derive here shares common ideas with following works [25]. First, the idea used in [12] whereroughly speaking-each receiver is required to decode its intended message and is as well allowed to decode or not the interfering message. Second, the fact that decoding nonuniquely the interfering message alleviates an extra constraint on the information rates yielding the same result as if the decoder would have to successively decode the interfering and the intended messages which is related to [15].

Theorem 3 (ID inner bound). An inner bound on the capacity region of the compound BC consists in the set of all rates
Rate Splitting where P is the set of all input p.m.f's P QU V X such that The set T and the rate regions T ( j,k) [1:4] are, respectively, defined as follows: Proof. The proof is relegated to Appendix B. Hereafter we summarize its main steps. The messages (W 1 , W 2 ) are first split into two parts: common messages (W 0,1 , W 0,2 ) transmitted through the common codeword Q, i.e., decoded by both users, and private messages (W p,1 , W p,2 ) transmitted only through the private codewords U and V and intended to their respective decoders. Encoding is performed in the same fashion as Marton's encoding schme through binning and superposition coding.
Each user introduces the union of two sets of constraints at the decoding, each set corresponds to decoding or not the interference. This results -in terms of achievable rates-in the union of four rate regions: is the same rate region as obtained with Marton's inner bound, which does not involve decoding interfering messages, in which both decoders decode their intended and the interfering messages, correspond to each destination decoding the interfering message in turns. A similar rate region was also derived in [13], but it does not take advantage of Marton's encoding technique, i.e., random binning with common and private codewords, and thus fails at achieving even Marton's inner bound in the compound setting.

Remark 4.
Consider the standard two-user BC, i.e., where J = K = 1. Observe that the rate region R ID contains Marton's rate region [3], which we will denote in the following as R NID . These regions are given by It is clear that R NID ⊆ R ID , but the question is whether this inclusion strict or not. To check this, we need to evaluate both regions and thus, we resort to FME to simplify the binning rates (T 1 , T 2 ), and bit recombination between the private rates (R 1 , R 2 ). 1 Since the unions commute, we can write that: where the regions R k , k ∈ {2, 3} are respectively defined by the following sets of inequalities 1 For the interested reader a similar simplification through FME and bit recombination is presented in Appendix D. and while R NID is equal to R 1 and is defined by From the above rate regions, we observe that by taking For the compound BC setting, we observe similarly that the rate region R ID contains Marton's rate region [3], which we will denote in the following as R NID and which is given by: However, no evidence on the strict inclusion of R NID in R ID can be readily stated. To this end, we investigate in the following a compound BC for which we show that the rate region R ID from Theorem 3 is tight, i.e, achieves capacity, while the region R NID is strictly suboptimal.

B. ID is Optimal for a Class of Compound BCs
In this section, we investigate a compound BC model for which Marton's inner bound obtained through NID, is strictly sub-optimal compared to ID inner bound, in which users are allowed to decode or not the interference. For simplicity, we restrict our analysis to the case in which J = 2 and K = 1. This setting is complex enough to highlight the challenges of coding for compound settings. We first discuss a criterion for the construction of such a compound BC model and later, prove the strict optimality of ID.
1) Irrelevant Compound BC Models: Characterizing the optimal coding for a compound BC might be rather challenging, as it can be trivial, depending on the class of BCs over which the compound setting is defined. We refer to as irrelevant models those of ordered BCs for which ID cannot strictly outperform NID.
Consider a compound BC with two possible BCs X → , Y 1 is less noisy than Y 2 . Then, it follows that, whatever the auxiliary RVs (Q, U, V ) ∼ P QU V : This rate region corresponds to one obtained in the presence of only one BC channel: X → (Y 2 , Z ), i.e., non-compound setting. Since ID does not outperform NID for the two-user non-compound BC setting, this class of BC channels becomes irrelevant. Thus, if the possible channel outputs of user 1, Y 1 and Y 2 , are ordered, at least in the known sense of less noisiness, the resulting compound model does benefit from ID.
2) Compound Binary Erasure and Binary Symmetric BC: In this section, we construct the simplest while relevant compound BC setting for which ID can be beneficial, i.e., for which the inclusion R NID ⊂ R ID is strict.
In section III-B1, we showed that Y 1 and Y 2 need not be ordered in the strong sense of less-noisiness. In addition, we need to provide for some inverse orderings, among the BC channels of the compound setting, so as to impose a tradeoff between two orders of supersposition coding schemes. To this end, we consider a compound BC setting in which, for the To illutsrate such a model, let us consider the Binary Erasure Channel (BEC) with erasure probability e and the Binary Symmetric Channel (BSC) with crossover probability p. These two channels allow for a variety of orderings between their outputs [6], depending on the pair (e, p), as summarized in Table I. Define the compound BC with components: In order to build a relevant compound BC setting, we choose Y 2 to be more capable than Y 1 , which requires: 4 p 1 (1− p 1 ) < e 2 ≤ H 2 ( p 1 ). Then we let Y 1 be a physically degraded version of Z , i.e., p < p 1 < 0.5, and Y 2 more capable than Z , i.e., (1) writes as: where: Proof. Let us denote byC 1 the closure of C 1 . We have that: In the following, we show thatC 1 ⊆ C 2 . To this end, let (R 1 , R 2 ) ∈C 1 , and let α ∈ [0 : 1] such that Let us show that (R 1 , R 2 ) ∈ C 2 . We have that where (a) results from that p < p 1 < 0.5 and (b) is stems from e 2 ≤ H 2 ( p),see (3). Next, since, then there exists β ∈ [0 : 0.5] such that, which implies that With this definition of β, we have that where (a) is a result of (7) and (b) stems from Mrs Gerber's lemma which yields while (c) is a consequence of (6) and e 2 ≤ H 2 ( p).

3) Evaluation of ID Inner Bound in Theorem 3:
In this section, we evaluate the inner bound R ID of Theorem 3, and show that for the class of compound BEC/BSC BC we construct following (3).

Theorem 5. The rate region R ID of Theorem 3 is tight for the compound BEC/BSC under investigation, i.e.,
Proof. The rate region R ID of Theorem 3 contains the rate region given by where the intersection T corresponds to decoding method (3) for the BC (Y 1 , Z ) and decoding method (4) for the BC (Y 2 , Z ), i.e., • For the BC channel (Y 1 , Z ), user 1 which observes Y 1 , decodes only its intedend messages, while user 2 which observes Z decodes its intended messages and intereference as well • For the BC channel (Y 2 , Z ), user 1 (resp. user 2), observing Y 2 (resp. Z ), decode both their intended messages as well as the interfering ones. In Appendix D, it is shown that, after FME on the binning rates (T 1 , T 2 ) and the rate splitting, the rate region R 3,4 reduces to the set of rates satisfying: Then, by letting: V = X,Q = (Q, U ), and using the fact that Y 2 is more capable than Z , R 3,4 becomes LetQ −→ X ≡ BSC(α), and X ∼ Bern(1/2). The rate region R 3,4 writes then as In the following, we show that First, let us note that R 3,4 is the union of two rate regions, The corner points C and D of R E belonf to C 1 . To show that the point E lies in the region C 1 , note first that Since Y 1 is physically degraded with respect to Z , i.e., p ≤ p 1 , and since α, p α and p 1 α are all included in the interval , which is already achievable in C 1 . The line between C and E is then achievable by time sharing and convexity of the rate region C 1 . Thus, since the proof of the equality R ID = C 1 is complete. [15,Proposition 7] derived the capacity region of a specific class of three receiver broadcast channel where a common message is to be delivered to three users, obersving each a channel output, Y 1 , Y 2 and Z , and a private message is to be delivered only to the user with observation Z . When the channel Z is less-noisy than Y 1 , Nair & El Gamal show that the capacity region is given by

Remark 6. Authors in
Though this setting is fundamentally different from the compound setting we are investigating, due to the degraded message set assumption, it turns out that when we assume that Z is less-noisy than Y 1 , the optimal scheme of [15,Proposition 7] yields an achievable rate region in our case, since, whatever Y 1 can decode, can be decoded as well by Z . Plus, we can prove that the resulting rate region is tight through a converse argument, outlined in the following, , where the first inequality results from Nair's inequality for less-noisy channels [6], and where we define . Thus, by letting V = X, we recover the capacity result in Theorem 5, through an alternative proof.

4) NID Inner Bound is Strictly Sub-Optimal:
In this section, we show that NID inner bound defined in (2) is strictly suboptimal for the class of compound BEC/BSC BC investigated. To this end, we first derive an outer bound on the rate region R NID and then, show that this outer bound is strictly included in R ID .
where R OuterNID is the closure of the set of rate pairs satisfying for some joint p.m.f PQ X where ||Q|| ≤ 4 and X ∼ Bern(1/2).
Proof. Let us note first that the NID rate region R NID defined in (2) is included in the following rate region where we have used the fact that I (Q; Y 1 ) ≤ I (Q; Z ) since Y 1 is physically degraded with respect to Z . Next, this rate region in is contained in the set of rate pairs satisfying by fropping some contraints and using the fact that, for all Defining thusQ (Q, U ), yields the rte region R OuterNID . In Appendix E, we show that it suffices to evaluate R OuterNID for all auxiliary RVsQ that verify Q ≤ 4 and for X ∼ Bern(1/2).
In the following, we show that R OuterNID defined in (11) is striclty included in C 1 by comparing the closures of both regions. To this end, let us define the function t (·) on the interval [0 : where the class C(x) is given by The function t (·) characterizes the convex closure of the region R Outer,NID , i.e., Similarly, let us characterize the convex closure ofR ID , by the function where C(x) is given in (12).

Lemma 3. The function t (·) can be upper bounded as follows
where Further, t a (·) can be written as follows Proof. The full proof is given in Appendix F. We can now compare the two regions NID and ID, through their closures, for all values of the rate R 2 .
Theorem 7. The functions t (·), t 1 (·) and t a (·) satisfy the following where in 1) and 2) α 0 is the unique solution to Proof. 1) To proof part 1 of the Theorem 7, let us note first that, from Theorem 5, t 1 (·) writes as: and consider the specific choice of (Q x , X) and α x ∈ [0 : 1] is such that, It is clear that P X Q x ∈ C(x) and that Thus, Let then α 0 be the unique solution to the equality Combining (16) and (17) allows to write that, for all This, along with (13), concludes the proof of 2) A closed form expression of the function t a (·) for an arbitrary a brings about significant computational complexity, we thus only chose to plot it using stochastic optimization methods, i.e., Monte Carlo simulations. To this end, let e 2 = 0.46, p = 0.1 and p 1 = 0.13. It can be readily shown that these parameters verify (3).
In Fig. 2, we choose a = 0.92 and plot the normalized difference function: , over the interval of interest: The function d a being strictly positive, the claim of strict inclusion is thus shown.
We have investigated so far the role that evolved decoding techniques, namely Interference Decoding, play in the compound BC where the actual controlling the communication are ordered, however, the order is unknown to the transmitter. The decoding technique takes advantage of many possible decoding methods to alleviate the constraint of a fixed superposition order at the source, which allows the latter to apply a symmetric encoding rule regardless of which channels control the communication. In the sequel, we analyse a class of non-ordered compound BC to infer novel strategies when there is no specific order between users channels. In this case, we will not seek to optimize the decoding strategies but rather the encoding strategy.

IV. MULTIPLE DESCRIPTION CODING IN THE COMPOUND BROADCAST CHANNEL
In this section, we investigate a coding technique, referred to as Multiple Description (MD) coding, that can enhance the achievable rates in the compound BC. This coding scheme is particularly beneficial when the many possible channels of the two users cannot be ordered. The main idea behind MD coding is to convey the message intended to the one user through a common description as well as a set of dedicated private descriptions. The common description is decoded whatever the actual channel realization, while the private descriptions are each decoded when a specific channel statistic is encountered. The common description -to be decoded in all cases-will be rate limited since its rate needs to be low enough to meet the decoding constraints of all possible channels, while the private descriptions encounter no such constraint.
Alike the first part of the work, we consider an elementary yet complex compound setting in which only one user has two possible channels, namely Y 1 or Y 2 , whilst the other user suffers from no such uncertainty Z . We first derive two inner bounds on the capacity region to be compared: the Common Description (CD) inner bound that is equivalent to Marton's inner bound for the compound BC, and the MD inner bound. We then show, for a class of compound MISO BC, and that MD coding outperforms the standard CD coding strategy. Finally, we analyse the behaviour of the obtained rate regions compared to the outer bound of Theorem 1.

A. Multiple Description (MD) Inner Bound
In the following, we derive an inner bound based on MD description coding that combines both a common description U 0 and two private descriptions U 1 and U 2 .

Theorem 8 (MD inner bound
). An inner bound on the capacity region of 2 × 1 compound BC is given by the set of rate pairs (R 1 , R 2 ) satisfying: for some set of arbitrarily correlated RVs with Proof. The full proof is given in Appendix H. Yet, a brief outline of proof can be described as follows. The private descriptions U 1 and U 2 are superimposed over the common description U 0 , the three of which are binned against the interfering codeword V . The common variable Q is introduced mainly to allow for time sharing.

Remark 9.
In this part of the work, we do not resort to rate splitting, i.e., the common codeword is used mainly for time sharing, since the resulting rate region would become rapidly intractable and thus, out of the scope of this part where we need to obtain closed form expressions of the MD coding inner bound.
Also, authors in [15,Proposition 5] resort to a codebook construction that appears to be similar to the one used in MD inner bound, and which yields the rate region Though the two inner bounds seem similar at first sight, they are different in many respects. The two private descriptions U 1 and U 2 play different roles in both regions. In carry part of interference and allow therefore the users Y 1 and Y 2 to decode each a distinct part of interference, as opposed to our MD inner bound in which the private descriptions precodes each for interference but each is optimized for its dedicated channel, Y 1 or Y 2 . As such, it is hard to state that either region contains the other.

B. Common Description (CD) Inner Bound
Inspired by Marton's inner bound, we can derive what we call the "common description" CD coding inner bound, i.e., worst-case of Marton's inner bound for the compound BC, which consists of all rate pairs (R 1 , R 2 ) verifying: where U , V and Q are arbitrarily correlated auxiliary RVs. It is easy to check that MD inner bound in (18) recovers the CD inner bound in (20) by setting both private descriptions to U 1 ≡ ∅ and U 2 ≡ ∅. In the following, we investigate whether MD coding can striclty outperform CD coding.

C. MD Coding Over the BC and the Compound Point-to-Point Channel
In this section, we show that for both the single user (non-broadcast) compound channel and the standard two-user (non-compound) BC, MD coding does not outperform CD coding.
As for the single user compound channel, let us assume that we have a compound model with two possible channel outputs denoted by Y 1 and Y 2 . In the following, we show that, for all joint p.m.f's P QU 0 U 1 U 2 X , the rate achieved by MD coding R MD , is no greater than the rate achieved by CD coding R CD , where R MD and R CD write as It is clear that R CD ≤ R MD . TO prove the inverse inequality, we have that: and thus, As for the standard two-user (non-compound) BC, i.e., Y 1 ≡ Y 2 , let us denote R MD the MD inner bound in (18) and R CD the CD inner bound in (20). The inclusion R CD ⊆ R MD is trivial. To show the converse, let us fix a joint p.m.f P QU 0 U 1 U 2 V X and let us assume, for instance, that It is easy to see that the choice U = (U 0 , U 2 ) allows us to obtain: which is equal to R CD .
Thus, MD coding does not outperform CD for setting in which channel uncertainty and interference are not coupled.
In the following, we show that for a class of compound Gaussian MISO BC, MD coding can be strictly beneficial as compared to CD coding.

V. THE REAL VALUED COMPOUND MISO BC AND MD BASED DPC
Consider the compound Gaussian MISO BC which consists of a source equipped with 2 antennas and 2 single antenna receivers. Receiver 1 can observe one of two possible channel outputs, namely, Y 1 and Y 2 , and let Z be the channel output of the receiver 2, where, for i = [1 : n] the difference outputs are given by for j ∈ {1, 2}, where: h j and g are 2 × 1 generic real-valued channel vectors that are assumed to be constant throughout the transmission. Moreover, it is assumed that any subset of 2 channels among them are linearly independent; x is the 2 × 1 power limited channel input vector so that E[x t x] ≤ P and last, the noise sequences {n j,i } i∈ [1:n] and {w i } i∈ [1:n] are assumed to be i.i.d. draws according to a Gaussian distribution  N (0, N). The optimal transmit strategy for the non-ordered Gaussian MISO BC is to apply Dirty-Paper Coding (DPC) [24], [26], which is a non-linear coding technique that allows the encoder to precode against interference and suppress it at the decoder without having the decoder explicitly decode it.
In the folowwing, in order to compare the two strategies of MD coding and CD coding, we will combine them with a DPC construction yielding thus two schemes that we will denote MD-DPC and CD-DPC. Besides, we will compare the CD to two variations of MD-DPC schemes, depending on the correlation between the private descriptions RVs we assume. In the first MD-DPC scheme, the private descriptions are timeorthogonal, in the sens that the encoder communicates part of the time a private description U 1 to cancel interference at user Y 1 , and a private description U 2 during the remaining part of the time to help user Y 2 , which annihilates the correlation cost, i.e, the mutual information I (U 1 ; U 2 |QU 0 ) becomes zero. In the second scheme, we consider an MD-DPC strategy where both private descriptions are transmitted across the whole time slot, however their correlation corst might not be null, i.e, the mutual information term I (U 1 ; U 2 |QU 0 ) is strictly positive.

A. Preliminaries and Definitions
In the sequel, we resort to DPC [24] in its vector formulation, thus some basic definitions and analytic formulas will be introduced herein to lighten the notation afterwards.
Let us consider the following coding scheme: where X u ∼ N (0, P u ) and X v ∼ N (0, P v ) are independent RVs such that P u + P v ≤ P. It is then easy to check that: where: In the sequel, besides the common description U 0 which precodes against the interfering signal X v , we are interested in introducing a private description U j that will be required to precode as well against interference. If we now choose to transmit an additive private description X p ∼ N (0, x) while keeping the total useful power equal to P u , i.e., 0 ≤ x ≤ P u . Then, with the following coding scheme: we can optimize the value of the private DPC parameter α j to state the following result.
and where, for j ∈ {1, 2}, we have: Proof. The key point of the proof is that the private description, when optimized, yields an interference free link: The rest of the proof is relegated to Appendix I.

B. Common Description DPC (CD-DPC)
In the following, we evaluate the CD inner bound under a CD based DPC scheme for the channel model defined by (21). To this end, let us define the two following rate regions resulting from two DPC schemes: where β j and I j are given by (24) and (25). The second rate region is given by the set of rate pairs satisfying: (21) is given by the set of rates satisfying:

Proposition 1 (CD inner bound). An inner bound on the capacity region of the compound MISO BC defined in
Proof. First, note that the rate regions R 1 and R 2 are nothing but the two corner points of the CD rate region given in (20). The rate region R 1 is obtained by evaluating the corner point using the following coding scheme: As for the second rate region R 2 , it results from the evaluation of the second corner point of CD with the following DPC scheme in which the codeword V dirty-paper codes against the interference U ; the analysis follow in a similar manner.

C. MD-DPC With Orthogonal Private Descriptions
In the following, we evaluate the MD inner bound given in Theorem 8. To this end, we explore two different constructions of an MD-DPC scheme, depending on the existing correlation between the private descriptions. We will restrict our analysis to the specific corner point given by The MD inner bound we derive in this section is based on the evaluation of (30) via a time-sharing argument [17], where each private description, U 1 (resp. U 2 ), is transmitted only part of the time. Let then Q be a binary valued time-sharing RV such that: For Q = 1, we let U 2 = ∅ and for Q = 2, we let U 1 = ∅, which annihilates the correlation cost, i.e., Let us define the following rate region R u as: where β x j and I x j are given by (27) and (28).

Proposition 2 (MD-DPC inner bound with orthogonal private descriptions). An inner bound on the capacity region of the compound MISO BC defined in (21) is given by:
Proof. For Q = 1, we let: Conversely, for Q = 2, let: In this case, I (U 1 ; U 2 |QU 0 ) = 0 since U 1 and U 2 are never transmitted in the same time slot. Hence, (30) reduces to The key point is then to note that, for j ∈ {1, 2}, where (a) follows similarly to the proof of Lemma 4 to maximize the private DPC parameters α 1 and α 2 .

D. MD-DPC With Correlated Private Descriptions
In this section, we allow the private descriptions U 1 and U 2 in (30) to be arbitrarily correlated and no longer timeorthogonal. Let the rate region R c defined by: where: and β x j and I x j are given by (27) and (28) .

Proposition 3 (MD inner bound with correlated private descriptions). An inner bound on the capacity region of the compound MISO BC is given by:
Proof. To prove our claim, we resort to the MD coding inner bound letting, for the discrete memoryless case, the two ARVs U 1 and U 2 be arbitrarily correlated given Q, U 0 , and V . Let us use the following coding scheme: where X p 1 and X p 2 are jointly Gaussian Random Variables, independent from all other variables, with covariance matrix: such that x = 2σ 2 (1 + ρ). Then, the correlation cost of the two private descriptions is given by Following similar calucations as in the proof of Lemma 4, the proof follows.

E. MD-DPC Strictly Outperforms CD-DPC
Let us now consider the compound Gaussian MISO BC model where the possible channels of user 1, h 1 and h 2 , are unit-norm orthogonal channels. Assume also that the second user's channel is quite accommodating such that g is orthogonal to the mean channel of user 1, In order to show that MD-DPC strictly outperforms CD-DPC for this setting, we need to evaluate CD-DPC inner bound based on the corresponding channel models. Then, we show that the MD-DPC inner bound strictly outperforms it.

1) CD-DPC Inner Bound:
We start by characterizing CD-DPC inner bound in a closed form.

Proposition 4 (CD-DPC inner bound). The CD-DPC inner bound writes as the set of rate pairs satisfying:
Proof. The proof is relegated to Appendix J.

Remark 11.
In order to derive the optimal value of η for the overall rate region, we look at the resulting weighted sumrate. If we let μ ∈ + , then the optimization of R 1 + μR 2 over η depends on the value of μ. For μ = 0, the optimal choice is η = 1 that is we have to transmit in a direction that is collinear with the mean channel h 1,2 , as for the case μ → ∞, the optimal choice is to let η = −1, which means to transmit the information for the second user in a direction that is colinear to its channel. For intermediate values of μ, the weighted sum-rate is not necessarily maximized with either choices of η.
We evaluate the two MD-DPC inner bounds as a function of x, the power dedicated to private descriptions, and compare them to the case x = 0, i.e., the CD-DPC inner bound. We let B u = h 1,2 and thus, by transmitting information to user 1 orthogonal to the channel of user 2.

2) MD-DPC With Uncorrelated Private Descriptions Outperforms CD-DPC:
As for MD-DPC inner bound with uncorrelated private descriptions, the constraint on the rate R 1 writes as: where the function g(·) is defined by and where we have considered a time-sharing t =t = 0.5. The function g(·) is not necessarily strictly decreasing in x for all values of η. However, it is clear that: Thus, P(η) > P u 2 suffices to have the function g strictly decreasing in x, and thus, the claim of strict optimality would be proved. Note that if, for instance, e.g. P ≥ 4N, then for values of η close to −1, i.e., R 2 close to second user's capacity, the gain is strictly positive and more significant.

3) MD-DPC With Correlated Private Descriptions Outperforms CD-DPC:
To evaluate the gain of MD-DPC inner bound with arbitrarily correlated private descriptions, note that if at least ρ = 0, then the bound on R 1 can be written as follows: Let us define, we show in the following that, for some x ∈ [0 : We have that thus, if suffices that the two functions η → P(η) and x → x , have values in intersecting intervals, which will be plot later for the example investigated.

F. Block Expansion and Plots
Last, the bounds we have studied so far did not allow for different encoding parameters across time slots. The reason is that the question we were investigating is one of the utility of private descriptions in the compound MISO BC. Now, if we combine CD inner bound and MD inner bound with correlated private descriptions both with a time-sharing argument where in each time slot a new coding scheme is used (in terms of beam directions, power allocations and DPC parameters), then one could expect that the gain of multiple descriptions over one common description is still captured by the obtained bounds.
In Fig. 3, we plot the corresponding rate regions for SNR = 10 dB, and the assumptions made on the channels' structure. Fig. 3 shows four inner bounds on the capacity region of the compound Gaussian MISO BC. It can be seen that CD-DPC is strictly included in all MD-DPC inner bounds, i.e., MD-DPC with time-orthogonal descriptions and MD-DPC with arbitrary correlated private under both assumptions rho = 0 (uncorrelated) and rho variable (optimized). It can be seen that already at ρ = 0, MD-DPC is still strictly better than CD-DPC. It can be noticed as well that in this case, both MD-DPC bounds are equal, though this is not necessarily the case for more general settings.

G. Outer Bound on the Capacity of the Compound MISO BC
In this section, we present an outer bound on the capacity region of the compound MISO BC which consists in the intersection of four rate regions.
Let us introduce the following channel matrices: We then define the corresponding channel outputs to the channel g 1,2 , that has the same marginal p.d.f as the output formed by the concatenation of [Z Y 1 Y 2 ], as Z 1,2 , and we define similarly the two outputs Y 1,z and Y 2,z . The following theorem gives the resulting outer bound.

Theorem 12 (Outer bound on the capacity of the compound MISO BC). An outer bound on the capacity region of the compound MISO BC is given by the set of rate pairs:
where C j is the capacity region of the BC with outputs (Y j , Z ), for j ∈ {1, 2}, and finally, C z is the capacity region of the compound BC with outputs (Y 1,z , Z ) and (Y 2,z , Z ), Proof. The proof is straightforward from the following observations. The fact that the capacity of the considered compound model is always included in the intersection of the capacities of the BCs C 1 and C 2 , and that this setting is a degraded version of the setups where there is a least one user with an extra receive antenna, whose capacities are given in references [27], [10].

Remark 13. The outer bound stated in Theorem 12 is tight in the high SNR regime and thus is DoF optimal.
To check this, notice that the bounds C 1 , C 2 and C z attain each the points (d 1 ≤ 1, d 2 ≤ 1) by letting K u = g ⊥ × (g ⊥ ) t . As for the bound C 1,2 , it achieves all the points (2 d 1 + d 2 ≤ 2), thus the intersection of these two regions leads to the optimal DoF.
In Fig. 4, we plot the inner and outer bound for intermediate SNR values. Although the gap to the outer bound suggests that the inner and outer regions do not meet, it is our belief that the inner bound might be tighter than the outer bound, and that better outer bounds could be derived for the compound Gaussian MISO BC.

VI. DISCUSSION
We start our conclusions with the analysis of the relative behavior of the MD and the ID inner bounds, to understand if there is any mutual inclusion between the two rate regions. The question we want to answer is whether introducing multiple descriptions, one for each instance in the compound setting, allows to recover the ID inner bound. We also would like to understand to what extent decoding interference is crucial for Marton's worst case inner bound.

A. Can Multiple Descriptions or Interference Decoding Techniques Recover Each Other?
We evaluate the MD inner bound in the case of the discrete example studied in Section III-B and try to identify a set of auxiliary RVs yielding the capacity region. For the discrete compound BC we studied earlier, we assumed that user 1 could observe one of two possible channel instances, namely, Y 1 and Y 2 , such that Y 2 is more capable than both Y 1 and Z , and Y 1 be a degraded version of Z . The maximizing choice of auxiliary RVs led to Z and Y 2 decoding all the signal and Y 1 decoding only its intended information.
The capacity region writes as We next discuss a formulation of the MD inner bound that captures the intuition of the capacity achieving choice of auxiliary RV for ID inner bound. Indeed, the encoder does not transmit a common description to the two users interested in the same message, but communicate only private descriptions to them. However, in the present case the common auxiliary RV Q is no longer a time-sharing variable as it was the case in Section IV, it can carry common information to all receivers as well. With this, we can achieve the set of rate pairs R 3-ARV defined by (31) where the rate region M is defined by and the set T is given by Proof. The proof is relegated to Appendix K.
We know that an optimal transmission scheme to achieve the capacity region of the considered BEC/BSC requires both users Z and Y 2 to decode all messages while restricting the weaker user Y 1 to decode only the common message. Hence, we rely on this argument to build the straightforward extension of Marton's coding scheme, i.e., V = U 2 = X and U 1 = Q, which along with rate splitting leads to the following achievable rate region: In the general case, there is strong evidence that the above rate region induced by MD is strictly included in the capacity region given by: that is achieved by using ID, which yields: where Y 1 is degraded with respect to Z and Y 2 is more capable than Z . The inclusion results from the fact that there exist P X |Q for which Thus, MD does not seem to be enough to achieve the capacity region of the compound model investigated in Section III-B. This is due to the fact that the cost engendered by precoding against interference prevents from decoding it which results in a loss proportional to its entropy. Therefore, it appears that ID outperforms MD in some cases.
On the other hand, in the MISO case, imposing users to decode interference is sub-optimal, at least from a DoF perspective, since ID introduces sum-rates constraints of the form: and thus, prevents the sum-DoF from reaching values greater than 1 which we already know is sub-optimal. Therefore, it is crucial to precode against interference. Summarizing, since neither MD coding or ID seem to generalize all the results obtained herein, one can benefit from the combination of both techniques and thus, from the optimization of both encoding and decoding schemes.

B. Conclusion
In this work, we explored a decoding and a encoding technique for the two-user memoryless compound Broadcast Channel (BC). We first studied the role of ID for which we derived an achievable rate region is derived by using superposition coding and random binning. At the decoders, the constraint of decoding only the intended message is alleviated to allow each of the users to decode or not the other user's (interference) message. Unlike the standard two-user BC, the compound BC benefits from ID since channel uncertainty prevents the encoder from coding optimally for each possible BC formed by all pairs of channels in the set. A simple outer bound is also derived based on the best outer bound hitherto known on the capacity region of the two-user BC, i.e, Nair & El Gamal outer bound. This outer bound is limited by the difficulty to write Csiszár & Körner's sum-identity for more than 2 users. Surprisingly enough, ID not only outperforms NID technique, i.e., Marton's worst-case rate region, but also allows to achieve the capacity of a class of non-trivial BC while NID is strictly suboptimal. Thus, though the coding scheme is simple (in terms of the number of auxiliary variables involved and of the complexity of the encoding operation) the decoders' optimization allows to alleviate the uncertainty at the source.
Later, we studied an encoding technique with a more involved coding strategy, namely MD coding. The source transmits for each possible channel output, of the same user, common and private descriptions. For the specific case of the compound MISO BC, resorting to MD is essential since a common description, i.e., applying DPC with a single description cannot accommodate the interference seen by each instance of the users channels in the set, unless combining it with a time-sharing argument. The key point in the MISO BC setting is that using a fraction of power to transmit the private descriptions is useful for all SNR ranges while turns out to be DoF optimal. Indeed, each private description creates an interference free link and thus each user can recover a part of its rate interference free.
Finally, we addressed the question of whether MD or ID may include each other. It appears that none of these schemes can perform well for ordered and non-ordered class of compound BCs at once, mainly because the two strategies strongly rely on two different interference mitigation techniques: precoding against interference and decoding interference. The former results in a rate loss tantamount to a correlation cost while the latter results in an extra sum-rate constraint.
As a conclusion, it would be worth mentioning the benefits of combining these two schemes to yield a larger inner bound, and thus, full advantage would be taken from the joint optimization of the encoding technique (MD coding) and the decoding technique (ID).

APPENDIX A USEFUL NOTIONS AND AUXILIARY RESULTS
In this appendix we provide basic notions on some concepts used in this paper.
Following [28], we use in this paper strongly typical sets and the so-called Delta-Convention. Some useful facts are recalled here. Let X and Y be RVs on some finite sets X and Y, respectively. We denote by p XY (resp. p Y |X , and p X ) the joint p.m.f of (X, Y ) (resp. conditional distribution of Y given X, and marginal distribution of X).

Definition 14.
For any sequence x n ∈ X n and any symbol a ∈ X , N(a|x n ) denotes the number of occurrences of a in x n .

Definition 15.
A sequence x n ∈ X n is called (strongly) δ-typical w.r.t. X (or simply typical if the context is clear) if and N(a|x n ) = 0 for each a ∈ X such that P X (a) = 0. The set of all such sequences is denoted by T n δ (X).
and, N(a, b|x n , y n ) = 0 for each a ∈ X , b ∈ Y such that P Y |X (b|a) = 0. The set of all such sequences is denoted by T n δ (Y |x n ).
Delta-Convention [28]: For any sets X , Y, there exists a sequence {δ n } n∈N * such that the lemmas below hold. 2 From now on, typical sequences are understood with δ = δ n . Typical sets are still denoted by T n δ (·).

APPENDIX B SKETCH OF THE PROOF OF THEOREM 3
Let ( j, k) ∈ J × K be the index of an arbitrary pair of users in the compound set. We first show the achievability of the union of the four regions for this channel i∈ [1:4] T i . For convenience of notations we drop the index ( j, k).

A. Outline of Proof
The coding scheme we use is as follows: • We consider three messages: a common message w 0 , and two private messages w 1 and w 2 , • We use three auxiliary RVs Q (resp. U and V ) to code for the common message (resp. private messages), • We perform random binning on the two auxiliary RV U and V which are superposed on Q, • The decoders resort to list decoding, which allows us to combine many decoding techniques, by intersecting different lists, • The error probability is directly related to the list size, and thus, bounding the list size yields a bound on the average probability of error.

B. Detailed Proof
Codebook generation: The encoding is similar to that of Marton's coding with a common message.
Fix the p.m.f's P Q , P U |Q , P V |Q and P X |QU V , and let T 1 ≥ R 1 and T 2 ≥ R 2 be four positive real numbers. Generate 2 n R 0 sequences q n (w 0 ), w 0 ∈ M 0 with probability distribution: and denote the set of all such codewords as C 0 . For each q n (w 0 ), generate 2 nT 1 sequences u n (l 1 , w 0 ), where l 1 ∈ [1 : 2 nT 1 ], following and set all these sequences randomly in 2 n R 1 bins, each indexed with w 1 ∈ [1 : 2 n R 1 ] : C(w 1 , w 0 ).

Decoding:
First, assume that no encoding error has occurred and let us note 0 the event of no error. Let then (L 1 , L 2 ) the chosen indices. For a matter of conciseness, we consider only Decoder 1.
Given a received sequence y n , define the two lists: These two lists correspond to two different decoding strategies: non-unique decoding of the other user's message (interfence), and no-interference decoding. Denote the intersection of these two lists by L (n) L 1 (y n ) ∩ L 2 (y n ). (32)

Analysis of the probability of error:
To analyze the probability of error at user 1, we need to control the expected cardinality of the intersection of the above lists. The next lemma states this result.

Lemma 9.
For every 1 > 0, there exists an integer N 1 , such that, for all n ≥ N 1 , the average probability of error is linked to the list size as follows: Now, bounding the probability of error will mainly consist in bounding the decoding list size.
Bounding the list size: On the one hand, the list size being an integer valued RV, we can write: On the other hand: The next lemma provides a bound on the expected list size from the RHS of (34).
Hence, from (33), (34) and (35) we can write that: Then Lemma 1 and (36), imply that for n large enough: the probability of error at user 1, knowing that no encoding error occurred, will tend to 0 as n → ∞.
Following the proof of the covering lemma [29], the probability of encoding error can be upper bounded as n grows large enough as follows: The condition for no such error does not depend on the users pair index, and thus, it intersects the union of all regions, which concludes the proof.

APPENDIX C THE PROBABILITY OF ERROR IS LINKED TO LIST SIZE
1) Proof of Lemma 9: Let us start by recalling: Let (Ŵ 0 ,Ŵ 1 ) be the estimated messages at decoder 1, where Then, following standard arguments, by the LLN and independence of codebooks, we can easily show that, for all 1 > 0, ∃ N 1 such that for n ≥ N 1 , we have (1 − δ) ≤ 1 . This ends the proof of the statement: 2) Proof of Lemma 10: Let (Ŵ 0 ,Ŵ 1 ) = (W 0 , W 1 ) be the supposedly decoded pair of messages. We have, recalling (32), that: For the first list, we have, following similar arguments of Lemma 8, that: and similarly, if moreoverŴ 0 = W 0 , Now, for the second list, i.e, decoding method, we know that: 1) IfŴ 0 = W 0 ,Ŵ 1 = W 1 andL 2 = L 2 which implieŝ W 2 = W 2 : where we used the fact that, sinceŴ 1 = W 1 , then U n (L 1 , W 0 ) and V n (L 2 , W 0 ) are independent conditionally on Q n (W 0 ).

APPENDIX D PROOF OF ACHIEVABILITY OF THE CAPACITY
From Theorem 3, we can see that the region R SNID verifies: In this section, we evaluate the region given by is the subset of 2 + defined by the inequalities: Recalling here that Y 1 is physically degraded towards Z , we can first rewrite the decoding constraints as the following: The, we can run FME over the binning rate pair (T 1 , T 2 ) to get the following region: Later, we chose to apply bit recombination on the admissible rates (R 0 , R 1 , R 2 ) as follows: It is straightforward that this bit recombination fits the decoding logic of the terminals, i.e., part of the private messages is mapped into the common message, enabling each terminal to still recover the totality of its intended message. The region writes thus as: Performing again FME over the splitting rate pair (R 01 , R 02 ), we get the following region: We clearly notice that the constraints: (37) and (38) are implied by (39), thus, the resulting region R ID is defined by the following constraints: Thus, letting R 0 = 0, and noting the rate pairs as (R 1 , R 2 ), one gets the desired rate region.

APPENDIX E CARDINALITY BOUNDS
Consider a pair of RVs (Q, X) following the joint p.m.f P Q X . Since the input is binary, let the four continuous functions on P X |Q : By the usual consequence of Fenchel-Eggleston-Caratheodory theorem [28], we can construct an auxiliary RV Q such that: q P Q (q)P X |Q (0|q)= q P Q (q )P X |Q (0|q ) = P X (0), Thus, we conclude that with this new auxiliary RV Q , the region is unchanged:

Optimality of uniform input
In [6] the c-symmetric BC is defined as the BC formed by 2 c-symmetric channels. Following this same idea, and considering equivalently the compound BC or the compound channel, we can say that the BC resulting from the simultaneity of two c-symmetric BC is c-symmetric.
As it is shown in [6, Lemma 2] that uniform input distribution is optimal for such channels, thus X ∼ Bern(1/2) is optimal for the compound BC as well.

APPENDIX F PROOF OF PROPOSITION 3
We follow the method in [30] to write: Notice that: • The case a = 1 was already studied in [30] and it was shown that: • The case a = 0 can be studied in a very similar fashion as in [30] by finding out that: where: Now, to upper bound t a , we could have written that: where (42) follows from what we have proved in Section III-B2, i.e., t 0 dominates t 1 over the interval [0 : . Thus, we cannot restrict ourselves to the upper bound in (40) on t a since it is rather loose, and we will hence bound more tightly the function t a .
Proposition 5. The function t a satisfies the following properties: • (ii) t a is concave in x, • (iii) t a can be described identically by its supporting lines, • (iv) t a is decreasing in x.
Proof. The proof is relegated to Appendix G.
The next result allows to transform the optimization of a rate region into optimizing one function denoted as F a (λ).
The following conclusions can be drawn: (a) The constraint in (12) can be transformed into: (b) We have that: where (a) follows from the non-increasing property of t a and (b) follows from the concavity of the function t a since a concave function can be described by its supporting lines [31].

APPENDIX G PROOF OF PROPOSITION 5
Recall that: We want to show that: 2) t a is concave in x; 3) t a can be described identically by its supporting lines; 4) t a is decreasing in x.
Proof. 1) We have that: Since, we have proved that the optimizing probabilities have a finite cardinality, the conditional mutual information being continuous, C(x) is thus compact. As the probability space P(X × Q) has a finite dimension, the set C(x) is thus closed. Thus, the supremum is achieved.
2) Concavity: Let Let for i ∈ {1, 2}, Define moreover: T ∼ Bern(t) independent of all other RVs. Define and by letting Q = (Q T , T ), we have: is a valid Markov chain.
• And the following equalities hold: We thus have that: p X Q ∈ C(x). Thus, which concludes the proof of concavity.
3) This property follows from the concavity of t a . 4) Monotony: Since t a is concave, we have that: Since, for all x ∈ [0 : 1 − H 2 ( p)], we have that:

APPENDIX H PROOF OF ACHIEVABILITY OF MULTIPLE DESCRIPTION INNER BOUND
In this section, we establish the achievability of the MD inner bound in (8). Let W 1 be the message decoded by user 1, and let W 2 be the message decoded by user 2, and let R 1 and R 2 denote their respective rates. Let T 1 and T 2 denote the corresponding binning rates. We construct the following code.

Encoding:
To send a message pair (W 1 , W 2 ), the encoder finds a pair of sequences u n 0 (l 1 ) and v n (l 2 ) in the product bin C 0 (W 1 ) × C v (W 2 ) and a pair of indices (s 1 , s 2 ) such that . It then transmits an sequence x n u n 0 (l 1 ), u n 1 (s 1 , l 1 ), u n 2 (s 2 , l 1 ), v n (l 2 ) which is generated via a stochastic mapping.
Using the well known second order moment method, one can make the probability of the encoding error event arbitrarily small if:

Decoding:
The second user, upon receiving the sequence z n , looks for the unique index w 2 such that for some v n (l 2 ) ∈ C v (w 2 ), the following holds: v n (l 2 ), z n ∈ T n δ (V Z). The probability of error in such a decoding rule is arbitrarily small provided that: Concerning the two instances of the first user Y 1 and Y 2 let us identify each of them by a decoder. Decoder j finds the unique index l 1 such that for some s j where, the following joint typicality holds: . The probability that the decoded l 1 does not fall into the bin specified by w 1 is made arbitrarily provided that: Then the overall decoding error events occur with arbitrary small probability provided that: After running FME on the system of inequalities bearing in mind the natural encoding constraints: the region given in (8) follows immediately.

APPENDIX I PROOF OF LEMMA 4
We derive the optimal rate obtained when the following coding scheme is used: where X p ∼ N (0, x), X u ∼ N (0, P u −x) and X v ∼ N (0, P v ) are pairwise independent RVs and such that: P u ≤ P − P v . This means that we transmit two descriptions intended for user 1 making these two descriptions compensate "jointly" the interference, hence, we are interested in computing the rate: R 0,1 = I (U 0 U 1 ; Y ) − I (U 0 U 1 ; V ). Some algebraic manipulations yield where the quadratic polynomial P(α, α 1 ) is given by: An interesting insight brought by this expression is that to achieve the optimal DoF, we need only have α 1 + α = β o 1 + α o rather than pairwise equality α 1 = β o 1 and α = α o . This translates perfectly the joint interference management of both decoded descriptions U 0 and U 1 , recovering trivially the optimal interference free rate as both descriptions cancel the interference fully each on their own α 1 − α 1 = α 0 − α 0 = 0. Upon optimizing the polynomial P(α, α 1 ) over α 1 , the resulting rate is given by the rather simple expression: where I x j is given by (28). It can be readily checked that this expression corresponds to the following and where I (U 0 ; Y )− I (U 0 ; V ) corresponds to the case where X u dirty-paper codes X v under the noise component variance: This means that the optimal choice of the variable U 1 is the one that maximizes the DPC term I (U 1 ; Y |U 0 ) − I (U 1 ; V |U 0 ).

APPENDIX J OPTIMIZATION OF COMMON DESCRIPTION INNER BOUND:
Let us first optimize the second corner point of the CD inner bound. We have that Since h 1 and h 2 are orthogonal and of unit norm, thus, we can write that: h 2 1,u = 1 − h 2 2,u and h 2 1,v = 1 − h 2 2,v . The rate R 2 does not depend on the beam B u , thus, we start by optimizing the rate R 1 over it. The two min operands are both monotonic, one operand is increasing in h 2 1,u , while the other is decreasing in in h 2 1,u . Thus, the maxmin point corresponds to the equality point. Which by simple algebraic calculations leads to the condition: and yields then a rate (independent of the beam B v ) equal to: Note then that the maximizing beam direction B v = g, thus one can easily check that this verifies: h 1,v = −1/ √ 2 and thus, from (44), that |h 1,u | = 1/ √ 2. Thus transmitting the first user's signal in the mean channel direction is an admissible optimal solution. Later in the proof, we show that this secong corner point R 2 is dominated by the first corner point of the CD inner bound R 1 which is investigated below. In the sequel, we will perform the optimization under the choice of h 1,u = 1/ √ 2 and g u = 0, i.e., we transmit the signal intended to user 1 in the mean channel direction, which makes it orthogonal to the second user's channel; the optimality of which is very involved and is not of central importance.
We can rewrite the first corner point of the CD inner bound as follows: where α j = √ 2P u P u + 2 N h j,v . Since h j = B v = 1 and, h 1 and h 2 are orthogonal, we can let h 1,v = cos(θ v ) and h 2,v = sin(θ v ).
The key point in the optimization is to solve the equation*: The optimization of the rate of the first user R 1 yields the following: (i) If cos 2 (θ v ) = 1 2 and cos(θ v ) = − sin(θ v ), then the optimal rate is given by: where α 1 = −α 2 = P u P u + 2 N . It turns out then, that the optimization over the DPC parameter α yields α = 0, meaning that the two inteference signals to be precoded are orthogonal, and thus, precoding against both of them is impossible. A very important remark, is that this yields exactly the first corner point of the region.
(ii) If cos 2 (θ v ) = 1 2 and cos(θ v ) = sin(θ v ), then the optimal rate is given by: which corresponds to the point where h 1,v = h 2,v i.e. α 1 = α 2 . Thus, we would have h 1 − h 2 orthogonal to B v , but since h 1 −h 2 is collinear to the second user's channel, then it means that no information is transmitted to it with the beam B v . The power optimization of this point corresponds to the corner point (C 1 , 0).
The root that yields the greater rate is α 1 . Then, we can rewrite with the following transformation y = sin(2θ v ) that: Note that the value of y = −1, i.e., θ v = −π/4, is included in this expression. Thus we drop the case distinctions cos 2 (θ v ) = 1/2 and cos 2 (θ v ) = 1/2.
Send then a random mapping sequence: x n (w 0 , l 1,1 , l 1,2 , l 2 ). The encoding is error free if all inequalities in T are verified.

Decoding:
Each receiver decodes its intended messages (w 0 , w j ) by decoding the index l j and non-uniquely the common message, yielding the constraints of M.