Cooperative binning for semi-deterministic channels with non-causal state information

The capacity of two semi-deterministic channels with the presence of non-causal channel state information (CSI) is characterized. The first channel is a state-dependent semi-deterministic relay channel. The CSI is available only at the transmitter and receiver, but not at the relay. The second channel is a state-dependent multiple access channel (MAC) with partial cribbing and CSI only at one transmitter and the receiver. In the semi-deterministic relay channel without states, the capacity can be achieved using partial-decode-forward scheme. The transmission is split to blocks; in each block, the relay decodes a part of the message and cooperation is established using those bits. When the channel depends on a state, the decoding procedure at the relay reduces the transmission rate. Recently, a cooperative bin forward scheme has been proposed which establishes cooperation without requiring the relay to decode a part of the message. In this scheme, the relay maps its received sequence, which is a deterministic function of the transmitted sequence, into bins. The transmitter coordinates its transmission with the bin index that is chosen by the relay. This scheme achieves the capacity when the CSI is available causally. In this work, we present a variation of the cooperative-bin-forward scheme that achieves capacity for non-causal CSI. The bin index corresponding to the deterministic output of the relay is selected by the transmitter in such a way that the relay's transmission is coordinated with the states. This coding scheme also applies for the MAC with partial cribbing and non-causal CSI at one transmitter and receiver. The capacity is achieved by the new variation of cooperative bin-forward. On top of that, we show an example in which the capacity with non-causal CSI is strictly greater than with causal CSI.


I. INTRODUCTION
Semi-deterministic models describe a variety of communication problems in which there exists a deterministic link between a transmitter and a receiver. This work focus on the semi-deterministic relay channel (SD-RC) and the multiple access channel (MAC) with partial cribbing encoders and non-causal channel state information (CSI) only at the encoder and decoder. The state of a channel may be governed by physical phenomena or by an interfering transmission over the channel, and the deterministic link may also be a function of this state.
The work of Ido B. Gattegno, Haim H. Permuter and Shlomo Shamai was supported by the Heron consortium via the minister of economy and science, and, by the ERC (European Research Council). The work of A. Ozgur was supported in part by NSF grant #1514538. ǫ (p X ), which is a ǫ-strongly typical set with respect to PMF p X , and defined by Jointly typical sets satisfy the same definition with respect to (w.r.t.) the joint distribution and are denoted by ǫ (p X,Y ). Conditional typical sets are defined as A (n) ǫ (p X,Y |y n ) x n : (x n , y n ) ∈ A (n) ǫ (p X,Y ) . (2)

B. Semi-Deterministic Relay Channel
We begin with a state dependent SD-RC, depicted in Fig. 2. This channel depends on a state S i ∈ S, which is known non-causally to the encoder and decoder, but not to the relay. An encoder sends a message M to the decoder through a channel with two outputs. The relay observes an output Z n of the channel, which at time i is a deterministic function of the channel inputs, X i and X r,i , and the state (i.e., Z i = z(X i , X r,i , S i )). Based on past observations Z i−1 the relay transmits X r,i in order to assist the encoder. The decoder uses the state information and the channel output Y n in order to estimateM . The channel is memoryless and characterized by the joint PMF p Y,Z|X,Xr,S = 1 Z|X,Xr,S p Y |Z,X,Xr ,S .
Definition 1 (Code for SD-RC) A (R, n) code C n for the SD-RC is defined by x n : 1 : 2 nR × S n → X n x r,i :Z i−1 → X r 1 ≤ i ≤ n m :Y n × S n → 1 : 2 nR Definition 2 (Achievable rate) A rate R is said to be achievable if there exists (R, n) such that for any ǫ > 0 and some sufficiently large n.
The capacity is defined to be the supremum of all achievable rates. and |U| ≤ min{|S||X ||X r |, |S||Y| + 1}.
The proof for the theorem is given in Section V. Let us first investigate the capacity and the role of the auxiliary random variable U . Here, the random variable U is used to create empirical coordination between the encoder, the relay and the states, i.e., with high probability (S n , U n , X n r , X n ) are jointly typical w.r.t. p S,U,Xr,X . Note that the PMF factorizes as p U|S p Xr |U p X|Xr ,U,S ; the random variable X r , which represents the relay, depends on S through the random variable U . This dependency represents the state knowledge at the relay, using an auxiliary codeword U n .

C. Multiple Access Channel with Partial Cribbing
Consider a MAC with partial cribbing and non-causal state information, as depicted in Figure 3. This channel depends on the state (S 1 , S 2 ) sequence that is known to the decoder, and each encoder w ∈ {1, 2} has non-causal access to one state component S w ∈ S w . Each encoder w sends a message M w over the channel. Encoder 2 is cribbing Encoder 1; the cribbing is strictly causal, partial and controlled by S 1 . Namely, the cribbed signal at time i, denoted by Z i , is a deterministic function of X 1,i and S 1,i . The cribbed information is used by Encoder 2 to assist Encoder 1. z(X 1,i , S 1,i ) Fig. 3: State dependent MAC with two state components and one side cribbing. The cribbing is strictly causal - Encoder 1 Encoder 2 Definition 3 (Code for MAC) A (R 1 , R 2 , n) code C n for the state-dependent MAC with strictly causal partial cribbing and two state components is defined by for any ǫ > 0 and some sufficiently large n.
Definition 4 (Achievable rate-pair) A rate-pair (R 1 , R 2 ) is achievable if there exists a code C n such that P e (C n ) P Cn [(m 1 (Y n , S n 1 , S n 2 ),m 2 (Y n , S n 1 , S n 2 )) = (M 1 , M 2 )] ≤ ǫ for any ǫ > 0 and some sufficiently large n.
The capacity region of this channel is defined to be the union of all achievable rate-pairs. We note here that a setup with causal cribbing, depicted in Fig. 4, satisfy a similar definition with x 2,i : [1 : 2 nR2 ] × S n 2 × Z i → X 2 . Fig. 3 is given by the set of rate pairs (R 1 , R 2 ) that satisfy R 1 ≤ I(X 1 ; Y |X 2 , Z, S 1 , S 2 , U ) + H(Z|S 1 , U ) − I(U ; S 1 |S 2 ) (5a)

Theorem 3 The capacity region for discrete memoryless MAC with non-causal CSI and causal cribbing in
We note here that when S 2 is degenerated, i.e., there is only one state component, the capacity region in both theorems is given by degenerating S 2 . Note that the difference between Theorems 2 and 3 is conditioning on Z in the PMF p X2|Z,U,S2 . Here, the auxiliary random variable U plays a double role. The first role is similar to the role in the SD-RC; it creates dependency between X 2 and S 1 . This is done using a cooperation codeword U n ; Encoder 1 selects a codeword that is coordinated with the states. Encoder 2 uses this codeword in order to cooperate. Since the codeword depends on the state, so does X n 2 . When there are two state components, the second component is used by Encoder 2 to select the cooperation codeword from a collection. The second role is to generate a common message between the encoders.
In Section VI we provide proof for Theorem 2 when there is only one state component. The proof for the general case is given in Section VII and is based on the case with a single state component. The proof for Theorem 3 is given in Section VIII. In the following section we examine the results in cases which emphasize the role of U .

A. Cases of State-Dependent SD-RC
Case 1: SD-RC without states: When there is no state to the channel, i.e., the channel is fixed throughout the transmission, the capacity of SD-RC is given by Cover and El-Gamal [3] as This case is captured by degenerating S. Then, S can be omitted from the information terms in Theorem 1 and the joint PMF is p U p Xr |U p X|U 1 Z|X,Xr p Y |X,Xr . Choosing U = X r recovers the capacity. Therefore, we see that here, U plays the role of a common message between X r and X.
where Z = z(X, X r , S). Let us compare this capacity to the one with non-causal states. In the causal case, we see that X and X r are dependent, but X r and S are not. In the non-causal case (eq. (4)), X r and S are dependent.
The random variable U generates empirical coordination w.r.t. P U|S , and then uses it as common side information at the encoder, relay and decoder. When the state is known causally, such dependency cannot be achieved since the the states are drawn i.i.d. and the relay observes only past outputs of the channel. The capacity of the causal case is directly achievable by Theorem 1 by substituting U = X r and X r ⊥ ⊥ S.

B. Cases of State-Dependent MAC with Partial Cribbing
Let us investigate the role of the auxiliary random variable U in the MAC configuration via special cases of Theorem 2. We consider here the naive case of one state component, i.e., S 2 is degenerated. We denote S S 1 to emphasize this. Proofs for these cases are given in Appendix B.
Case A: Multiple Access Channel with states (without cribbing): Consider the case of a multiple access channel Encoder 2 with CSI at Encoder 1 and the decoder, depicted in Fig. 5. It is a special case without cribbing (i.e. z = constant).
The capacity region, characterized by Jafar [21], is defined by all (R 1 , R 2 ) pairs that satisfy with PMFs that factorize as p X1|S p X2 . Encoder 2 in Fig. 6. In this case, the channel depends only on part of x 1 , which we denote by x 1c . The other part of x 1 , denoted by x 1p , is known in a strictly causal manner to Encoder 2.
This setting is different from previous works, which considered a rate-limited cooperation. Here we use a sequence with noiseless communication and a fixed alphabet X 1p . It turns out that the capacity region of the channel is the same for both a strictly causal and a non-causal cooperation link. The capacity of both cases when Case C: Point-to-point with non-causal CSI: Consider a configuration of a PTP channel with non-causal CSI, Fig. 7: Case C -PTP with non-causal CSI. depicted in Fig. 7. This is a special case of the MAC when R 2 = 0 and p Y2|X1,X2,S = p Y2|X1,S . The capacity of this channel was given by Wolfowitz [22,Theorem 4.6.1] as ] × X i 1 → X 2 . We will first discuss on the inclusion of the non-causal case in the MAC setting.
To apply the MAC with partial cribbing to this case, consider the following situation with only one state component. Encoder 1 has no access to the channel, i.e., p Y |X1,X2,S = p Y |X2,S ), and no message to send (R 1 = 0).
Its only job is to assist Encoder 2 by compressing the CSI and sending it via a private link. The private link is the partial cribbing with z(x 1 , s) = x 1 . When the link between the encoders is non-causal, i.e., when x 2,i = f (M 2 , X n 1 ), using the characterization of Rosenzweig [19] with a rate limit of R s = log |X 1 | yields When there is a causality constraint, the transmission at time i can only depend on the strictly causal output of state encoder, i.e., x 2,i = f (M 2 , X i−1 1 ); nonetheless, the capacity remains. Briefly explained, the capacity is achieved as follows. The transmission is divided to blocks (block-Markov coding). In each block, Encoder 1, which serves as the state encoder, sends a compressed version of the states of the next block. After each transmission block, Encoder 2 has a compressed version of the state of the current transmission block and uses it for coherent transmission.

B. An Example -Non-causal CSI Increases Capacity
The non-causal CSI in the MAC configuration does increase the capacity region in the general case. The following example proves this claim. Consider a model where the channel states are coded, as depicted in Fig. 8. Case (a) is a non-causal case, and (b) is causal. As we previously discussed, the channel in Fig. 8a is a special case of the non-causal state dependent MAC with partial cribbing. Similarly, Fig. 8b is a special case of causal state dependent MAC with partial cribbing [4].
Since this is a point-to-point configuration, it is a bit surprising that the non-causal CSI increases capacity; when the states are perfectly provided to the encoder, the capacity with causal CSI and with non-causal CSI coincide.
As we will next show, in the causal case, the size of X 1 can enforce lossy quantization on the state, while in the non-causal case, the states can be losslessly compressed.
where C nc and C c are the capacity of non-causal and causal CSI configurations, respectively. Assume that the states distribution is For each state there is a different channel; these channels are depicted in Fig. 9; a Z-channel for s = 0, an S-channel for s = 1, where both share the same parameter α, and a noiseless channel for s = 2.
The idea is that when the CSI is known non-causally we can compress S n while in a causal case we cannot.
Assume that X 1 is binary, and p is small enough, for instance p = 0.2, such that Therefore, taking U = S satisfies I(U ; S) = H(S) ≤ 1 and results in the non-causal capacity where On the other hand, the capacity for causal CSI is The capacity can be achieve by one of several deterministic functions x 1,i (S i ). Each function, maps both S = 2 and S = 1/0 to one letter, and S = 0/1 to the other letter, respectively. Note that this operation causes a lossy quantization of the CSI. For comparison, we also provide the capacity when there is no CSI at the encoder, which is The capacity of the channels (non-causal, causal, no CSI) for p = 0.2 are summarized in Table I. There are two points where the three channels results in the same capacity. The first is when α = 0; in this case, the channel is noiseless for s = 0, 1, 2 and the capacity is 1. There is no need for CSI at the encoder and, therefore, the capacity is the same (among the three cases). The second point is when α = 1; the channel is stuck at 0 and stuck at 1 for s = 0 and s = 1, respectively, and noiseless for s = 2. In this case we can set P X1 (1) = 0.5 for every s and achieve the capacity. Therefore, the encoder does not use the CSI in those cases. However, for every α ∈ (0, 1), the capacity of the non-causal case is strictly larger than of the others, which confirms that non-causal CSI indeed increases the capacity region.
V. PROOF FOR THEOREM 1

A. Direct
Before proving the achievability part, let us investigate important properties of the cooperative-bin-forward scheme. This scheme was derived by Kolte et al [4] and is based on mapping the discrete finite space Z n to a range of indexes L = [1 : 2 nRB ]. We refer this function as cooperative-binning for two reasons: 1) it randomly maps Z n into 2 nRB bins, and 2) the random binning is independent of all other random variables, which make 'suitable' for cooperation. For instance, a sequence z n ∈ Z n can be drawn given v n , but its bin index is drawn uniformly, i.e., bin(z n ) ∼ Unif[1 : 2 nRB ], and is not a function of v n . Thus, if we observe z n we can find bin(z n ) without knowing v n . This index is used to create cooperation between the encoder and a relay.
bin(z n (m ′ , k)) u n (l) then, lim n→∞ P |{l : ∃k s.t. Bin(Z n (k)) = l}| < 2 n(R−δn) |V n = v n = 0 where δ n → 0 as n → ∞, The proof for this lemma is given in Appendix A. Lemma 1 states that by choosing R < H(Z|V ) − δ 1 and R B > R + δ 2 , we can guarantee (with high probability) that we will see approximately 2 n(R−∆n) different bins indexes. Having these indexes allow us to assign to each one of them a sequence or threat them as bins (i.e. use the index to create a list). For instance, if we assign each index l ∈ [1 : 2 nRB ] a sequence u n (l) ∼ n i=1 p U (u i (l)), we can perform covering [23,Lemma 3.3] in order to create coordination with another sequence s n , by choosing The coding scheme works as follows. Divide the transmission to B block and choose a distribution p X,U|S p Xr|S . Draw a codebook for each block b which consist of the followings. A cooperative-binning function (a map from Z n to [1 : 2 nRB ], drawn uniformly), a collection of 2 n(I(U;S)+δ) codewords z n To send a message m (b) , recall that the link from the encoder to the relay is deterministic. Therefore, the Encoder can dictate which sequence the relay will observe during the block. Thus, it look at the collection of z n(b) sequences and search for k s.t. z n(b) (m ′(b) , k) points toward a cooperation codeword u n that is coordinated (typical) with s n(b+1) of the next block. This lookup is illustrated in Fig. 10, and we refer it as indirect covering 1 . Lemma 1 guarantees us that if we take R B >R > I(U ; S) andR < H(Z|X r , U, S) then with high probability we will see at least one coordinated sequence u n(b+1) . Afterwards, the transmission codeword x n(b) is chosen according to m ′′(b) . In the next block, the relay codeword x n(b+1) r is chosen given u n(b+1) . Note that x n(b+1) r is coordinated with s n through u n(b+1) .
The decoding procedure is done forward using a sliding window technique, derived by Carleial [24]. At each block b, the decoder imitates the encoder procedure for every possible m ′(b) ∈ [1 : 2 nR ′ ] and findsk (b) (m ′(b) ) and Then, the decoder looks for (m ′(b) ,m ′′(b) ) such that: 1) all sequences at the current block are coordinated, and 2) (s n(b+1) , u n (l (b) (m ′(b) )), x n r (u n (l (b) )), y n(b+1) ) are coordinated. Setting R ′′ < I(X; Y |Z, X r , U, S) and R < I(X, X r ; Y |S) ensures reliability in the decoding procedure.
We will now give a formal proof for the achievability part. Fix a PMF p U|S p Xr |U p X|Xr,U,S and let p Z|Xr ,U,S be such that p Z|Xr,U,S p X|Z,Xr ,U,S = p X|Xr,U,S 1 Z|Xr,X,S . We use block-Markov coding as follows. Divide the transmission into B blocks, each of length n. At each communication block b, we transmit a message M (b) at rate , with corresponding rates R ′ and R ′′ , respectively.
n is generated as follows: -Binning: Partition the set Z n into 2 nRB bins, by choosing uniformly and independently an index bin (b) (z n ) ∼ U 1 : 2 nRB .
-Cooperation codewords: Generate 2 nRB u-codewords -Relay codewords: For each u n ∈ U n generate x r -codeword x n r (u n ) ∼ n i=1 p Xr |U (x r,i |u i ). -z-codewords: For each u n ∈ U n , x n r ∈ X n r and s n ∈ S n , generate 2 n(R ′ +R) z-codewords -Transmission codewords: For each z n ∈ Z n , u n ∈ U n , x n r ∈ X n r and s n ∈ S n draw 2 nR ′′ x-codewords consist of all the sequences that was generated for this block. Note that by this construction, all block-codebooks are independent of each other.
This block prefix is done in order to begin the transmission with coordinated cooperation sequence, which is not yet known at the relay. Assume that l (b−1) is known due to former operations at the encoder, and denote First, the encoder finds k (b) such that and sets ) . Then, it sends We abbreviate the notation by .
) . Denote this sequence by x n r l (b−1) . After the relay observes z n(b) , it determines l (b) = bin(z n(b) ).
Decoder: We perform decoding using a sliding window; this is a decoding procedure that decodes from block 1 to B − 1, and therefore reduces the delay for recovering message bits at the decoder 2 . We start at block 2, since the first cooperation sequence is not necessarily typical with the states at that block. Moreover, since the first message is fixed, the decoder can imitate the encoders operation and find l (1) .
Assume l (b−1) is known due to previous decoding operations. At block b, the decoder performs: , s n(b+1) ) the same way that the encoder does. We denote these indexes byk 2) Look for unique (m ′ ,m ′′ ) such that (26) are satisfied.
Analysis of error probability: The code C n is defined by the block-codebooks and the encoders and decoder functions. We bound the average probability of an error at block b, conditioned on successful decoding in blocks The average probability of an error is upper bounded by where the second inequality follows from union bound and conditioning 3 . We will now investigate the probability of each event. - By lemma 1, the probability of seeing less than 2 n(R−∆n) different bins (indexed by l) goes to 0 ifR < H(Z|S, U ) − δ 1 and R B >R + δ 2 . Denote A = there are less than 2 n(R−∆n) different bin indexes (29a) where (a) follows by union bound. Therefore, this probability goes to zero if . Therefore, by the conditional typicality lemma [23, Chapter 2.5], the probability of this event goes to zero as n goes to infinity. - We need to distinct between the events in block b and b + 1. Note that conditioning on E c 2 ensures us that for m ′(b) = 1 we havel (b) (m ′(b) ) = L (b) . Therefore, at block b + 1, with high probability. Therefore, we consider two cases: The statistical relations between the chosen sequences (bym ′(b) andm ′′(b) ) are summarized in Table II. A standard application of the packing lemma [23, Lemma 3.1] derives with the following bounds: Markov chain and Z is a function of (X, X r , S).
Following this derivation, the probability of an error goes to zero if Performing Fourier-Motzkin elimination (can be done using [25]) on the rates in (39) yields Cardinality bounds on the auxiliary random variable U are obtained by performing Convex Cover Method [23, Appendix C].

B. Converse
Assume that the rate R is achievable, By Fano's inequality, we have Note that ǫ n → 0 when ǫ → 0. Consider where: where: (e) -since S n is i.i.d., (g) -since X i r is a function of Z i−1 , (h) -by definition of Q as a time-sharing random variable, (i) -since Q ⊥ ⊥ S Q . Therefore, the following hold: The second bound is obtained by The second term is upper bounded by We need to show that the following conditions hold: • p YQ|XQ,Xr,Q,ZQ,SQ,UQ,Q (y|x, x r , z, s, u, q) = p Y |X,Xr ,Z,S (y|x, x r , z, s) The first condition holds due to the i.i.d. distribution of the states sequence S n . The distribution on the random variables is p(m,s n , x n , x n r , z n , y n ) The Markov chains in the second condition can be readily seen from this distribution. Moreover, for each i, and the third condition also holds. By defining U = (U Q , Q), X = X Q , X r = X r,Q , S = S Q and Z = Z Q , we derive with the following bound: with PMF that factorizes as that satisfies I(U ; S) ≤ H(Z|X r , S, U ). This completes the proof for the converse part. 22

A. Direct
We first discuss the achievability scheme for the case where S 2 is degenerate. To ease the notation we use S = S 1 .
The coding scheme for this case gives the key steps for the general case (in Theorem 2). Note that Encoder 2 plays here a double role: First, it helps Encoder 1 to deliver his message M 1 by cribbing Z i−1 at each time i. This is done using the cooperative-bin-forward scheme as in the SD-RC in section V. Second, it delivers its own message M 2 to the decoder using the same transmission sequence X n 2 . To do so, a superposition code is built on the shared common information which is represented by the sequence U n . This common information also coordinated with the states sequence S n . The decoding procedure however is done backward, which is called backward decoding.
We will now give a detailed proof for the achievable rates. ) accordingly.

Codebook:
The codebook C n is defined to be collection of block-codebooks, C n is generated as follows: -Binning: Partition the set Z n into 2 nRu bins, by drawing uniformly and independently an index -Cooperation codewords: Generate 2 nRB u-codewords -Cribbed codewords: For each u n ∈ U n and s n ∈ S n , generate 2 n(R ′ 1 +R1) z-codewords, -Transmission codewords at Encoder 1: For each z n ∈ Z n , u n ∈ U n and s n ∈ S n generate 2 nR ′′ 1 x 1 -codewords, -Transmission codewords at Encoder 2: For each u n ∈ U n , draw 2 nR2 x 2 -codewords, The block-codebook C n consist of all the sequences that was generated for this block. Note that by this construction, all block-codebooks are independent of each other.
Prefix and suffix blocks: Let m 2 , m 2 , k (B) and l (1) be equal to one. Namely, at blocks 1 and B the encoders don't send any message, and hence these blocks are prefix and suffix for the transmission. Here, in addition to the suffix block that is used in block Markov coding schemes, the prefix block is used for the encoders to agree on the second cooperation codeword that is typical with s n(2) . However, for l (0) the corresponding cooperation sequence u n (l (0) ) is not necessarily typical with the states in the first block. Due to prefix and suffix blocks, the average rates areR 1 = B−2 B R 1 andR 2 = B−2 B R 2 . By choosing B sufficiently large, the average rates can be made close to R 1 and R 2 as desired 4 .
the cribbed codewords from codebook C n . At block b, the encoder looks for k (b) such that If such k (b) cannot be found, choose k (b) uniformly. If more than one was found, choose the first. Set and transmits the codeword The transmitted codeword is denoted by x n 1 (m ). Encoder 2: Assuming that l (b−1) is known from previous encoding operations, at block b Encoder 2 transmits 2 |u n (l (b−1) )). We denote this codeword as x n 2 (m 1) ). At the end of the block, this encoder observes z n (m s n(b) )). 4 One can show that for every fixed B, we can take n to be large enough to make the probability of an error small as desired.
Decoding: The decoding procedure is done backwards, Laneman and Kramer 5 . We start decoding from block B to block 2. Assuming l (b) is known by decoding former blocks, the decoder performs: 2) Denote the channel's output at blocks b by y n(b) . Find a unique tuple l (b−1) ,m cannot be found, choose each uniformly.
Recall that m Define the events The average probability of an error is upper bounded by the union of these events in all blocks, We need to satisfy that the probability of seeing U n (l (b) ) that is jointly typical with S n,(b+1) go to 1 as n goes to infinity. According to Lemma 1, we can ensure that Therefore, this probability goes to zero as n → ∞ if R ′ 1 +R 1 < H(Z|U, S) − δ 3 (ǫ) and , the cooperation codeword U n (L (b−1) ) and S n(b) are jointly typical. Recall that by the codebook generated i.i.d, each block-codebook is independent of each other and the channel is memoryless. Thus, by conditional typicality lemma, the probability of this event goes to 0 when n → ∞.
-Event E 4 (i)|E c 5 (b + 1),Ẽ c 1 (B) : There are several cases in which this error can occur: The probability of each case is bounded by standard application of the packing lemma as follows. The statistical relations between the codewords are summarized in Table III. Denote byẼ i,j,k (b) the event that (59) is satisfied 2 ) = (i, j, k). By union-bound, case 1 is upper-bounded by where: (a) -follows since for each i = L b−1 , U n (i) is independent of S n(b) , (b) -follows since for each i = L b−1 , the codewords corresponding to (i, j, k) are independent of Y n(b) given Similarly, case 2 is upper bounded by j>1,k>1 since for j = 1, k = 1, the codewords corresponding to j, k are independent of Y n(b) given S n(b) and U n (L (b−1) ).

Case 3 by
)), and case 4 by since X n 1 (1|1, K (b) , L (b−1) ) is independent of Y n(b) given (S n(b) , U n (L (b−1) ), X n 2 (k|L (b−1) ), S n(b) ). Thus, the above probabilities goes to zero as n → ∞ if Henceforth, we derived with the following bounds together with the identity R 1 = R ′ 1 + R ′′ 1 and non-negativity of all rates. Applying Fourier-Motzkin elimination (can be done using [25]) to eliminate R ′ 1 , R ′′ 1 ,R 1 and R B , yields I(U ; S) < H(Z|U, S) (70a) This closes the proof for the direct part.

B. Converse
Assuming that the rate pair (R 1 , R 2 ) is achievable, By Fano's inequality, we have where H b (·) is the binary entropy function. Define Note that ǫ n → 0 when ǫ → 0. To show that the region in (53) is an outer bound, we first identify the auxiliary where: (a) -since S n is i.i.d, Therefore, we have shown that An upper bound on R 1 is established as follows where:  (75) and time-sharing variable Q.
Applying similar arguments, we get an upper bound for R 2 where: The first upper bound for the sum-rate is: and the second upper bound by: where the last inequality is due to the Markov chain Q ↔ (X 1,Q , We note that the following conditions must hold: • S Q is independent of Q, and p SQ (s) = p S (s).
together with the Markov chains The first condition holds since S n is i.i.d. The fourth condition holds since for each i ∈ [1 : n], Z i = z(X i , S i ).
To prove the second condition, consider ), y n(b) ∈ A (n) ǫ (p S1,S2 P U,X1|S1 P X2|U,S2 P Y,Z|X1,X2,S1,S2 ) (85a) which proves that for each i ∈ [1 : n], the Markov (X 1,i , S i ) ↔ (Z i−1 , S i−1 ) ↔ X 2,i holds, and therefore the Markov in the second condition holds. The third condition is due to the memoryless property of the channel and that for random time Q, the channel's input are (X 1,Q , X 2,Q , S Q ). To see this, consider the PMF of the random variables, that is given by It is easy to verify that the Markov chains in Eq. 80 also hold due to this distribution.
Note that I(U Q ; S Q |Q) = I(U Q , Q; S Q ) due to the first condition. Let U = (U Q , Q), S = S Q , X 1 = X 1,Q , X 2 = X 2,Q , Z = Z Q and Y = Y Q . Thus, the rate-bounds become with PMF that factorizes as This completes the proof for the converse part.

A. Direct
The achievability part of Theorem 2 is based on previous section, with additional operation at Encoder 2. To avoid unnecessary repetitions, we only provide the differences in the achievability part relative to that in the previous section.
the average rates can be close to R 1 and R 2 by taking sufficiently large B. Next, we describe the transmission at block b. Assume that from previous operation, l ′(b−1) and l ′′(b−1) are known at both encoders.
This procedure is illustrated in Figure. 11. The cooperative bin index is a superbin, that contains several u n sequences. The selected superbin contains a sequence u n that is coordinated with the states.
Encoder 2: At the end of each block (b − 1), the superbin index l b−1 is known from the cribbed sequence ǫ (P S2,U ). Then, the encoder sends . Decoder: The decoding is done backwards. Assume that l (b) = (l ′(b) , l ′′(b) ) is known from previous decoding operations.
) the same way that encoder 2 does. Then, findm . If there are multiple functions that satisfies the above, choose one uniformly. Note that there are total of 2 nR ′ B tuples of functions, since we choose exactly one tuple for each l ′(b−1) ∈ [1 : 2 nR ′ u ].
Note at at block B, the decoder knows the messages and therefore it needs only to find l (B−1) according to the first operation.
Error analysis: Without loss of generality assume all messages m 2 are equal to 1 for all b ∈ [1 : B]. We begin with the event of encoding error. Recall that according to Lemma 1 we can ensure that we will see approximately R ′′ B +R different indexes by takingR ≤ H(Z|U, S 1 ) andR < R ′ B . Thus, the existence of a sequence U n(b+1) that is coordinated with S n(b+1) 1 is also ensured by taking I(U ; S 1 ) < R ′′ B +R. Moreover, it follows from Markov lemma [23,Lemma 12 with high probability (goes to 1 when n goes to infinity). Denote the selected superbin of the next block by L ′(b) and the selected index in the bin by L ′′(b) . At Encoder 2, we ensure that there exist only one l ′′(b) such that the sequence U n (L ′(b) , l ′′(b) ) is jointly typical with S n(b+1) 2 ; this is done by taking R ′′ B < I(U ; S 2 ). At the decoder, an error occurs if equation (85) is −1) , 1, 1, 1). This event is bounded by the union of the following events: Following similar steps as in Section VI, a standard application of the packing lemma results in and the encoding constraints areR Performing FME on (87) yields for all PMFs that factorize as P X1,U|S1 P X2|U,S2 and Z = z(X 1 , S 1 ). Note that I(U ; S 1 ) − I(U ; S 2 ) = I(U ; S 1 |S 2 ) since S 2 ↔ S 1 ↔ U form a Markov chain.

B. Converse
where (a) follows since S n 1 and S n 2 are drawn i.i.d in pairs, (b) follows by our definition of U i and (c) is derived by setting Q ∼ Unif[1 : n] to be a time sharing random variable. Note that the following Markov chains hold: Fig. 12: Proof for Markov chains S 2,i ↔ S 1,i ↔ U i and S 2,i ↔ (S 1,i , U i ) ↔ Z i using an undirected graphical technique [27]. The undirected graph corresponds the PMF P (s n 1 , s n 2 , z n ) = P (s i−1 1 , s i−1 2 )P (s 1,i , s 2,i )P (s n 1,i+1 , s n 2,i+1 )P (z n |s n 1 ). The Markov chains follows since all paths from S 2,i to all other nodes go through S 1,i .
Recall that the PMF on (m 1 , m 2 , s n 1 , s n 2 , x 1,i , z n , x 2,i ) is Note that Z n is a deterministic function of (M 1 , S n 1 ) since X n 1 is. Therefore, the Markov chain (S 1,i , X 1,i ) ↔ (S 2,i , U i ) ↔ X 2,i is readily proven from the PMF. As for the other Markovs in (90), we use an undirected graphical technique in Figure 12. It is also straightforward to show that S 2,Q ↔ (S 1,Q , U Q , Q) ↔ Z Q holds. Therefore, Note that due to this identity, I(U Q ; S 1,Q |S 2,Q , Q) ≤ H(Z Q |S 1,Q , U Q , Q). We proceed to bound R 1 and R 2 . Note that by Fano's inequality, where ǫ n → 0 when n → ∞. Bounding R 1 yields where: (c) -follows since M 2 ⊥ ⊥ (M 1 , Z n , S n 1 , S n 2 ). It follows that where (d) follows since X 1,i is a function of (M 1 , S n 1 ) and (e) follows by moving (M 2 , Y i−1 , Z n i+1 , S n 1,i+1 , S i−1 2 ) from the conditioning to the left hand side of the mutual information; since the channel is memoryless and without feedback, Following similar steps, we have The sum-rate R 1 + R 2 is upper bounded by and therefore, it follows from the identity in (92) and the above that n(R 1 + R 2 ) ≤ n I(X 1,Q , X 2,Q ; Y Q |Z Q , S 1,Q , S 2,Q , U Q , Q)+ H(Z Q |S 1,Q , U Q , Q − I(U Q ; S 1,Q |S 2,Q )) + ǫ n (101a) and the second upper bound by: where the last inequality is due to the Markov chain Q ↔ (X 1,Q , X 2,Q , S Q ) ↔ Y Q . Thus, we obtained the following region for PMFs of the form p(q)p S1,S2 (s 1,q , s 2,q )p(u q , x 1,q |s 1,q , q)p(x 2,q |u q , s 2,q )p Y |X1,X2,S1,S2 (y q |x 1,q , x 2,q , s 1,q , s 2,q ).
Note that the PMF in (104) regarding S 1 , S 2 and Y follows since the states are i.i.d. and the channel is memoryless and fixed (per state). The rest of the proof (regarding the removal of the time sharing random variable Q) is straight-forward using the same steps as in the case of one state component in Appendix VI-B. Therefore, by letting U = (U Q , Q), X 1 = X 1,Q , X 2 = X 2,Q , Y = Y Q , Z = Z Q , S 1 = S 1,Q and S 2,Q we obtain the capacity region in Theorem 2.

VIII. PROOF FOR THEOREM 3
The proof for this theorem heavily relies on the proofs from previous sections. The achievability part build on cooperative-bin-forward scheme from section VII by combining it with instantaneous relaying (a.k.a Shannon strategies). To avoid unnecessary repetition, we only provide the differences on the achievability part and the proofs for Markov chains in the converse.
Achievability: The codebook generation is done similarly as in VII-A, with additional conditioning on Z when . Namely, the codebook constructed for Encoder 2 are as follows. For each block b, s n 2 ∈ S n 2 , z ∈ Z and (l ′(b−1) , l ′′(b−1) ), draw 2 nR2 codewords In each transmission block, Encoder 1 performs the same operations as before. Encoder 2 also performs the same operation, but at each time i it transmit x 2,i (m . The decoder performs backward decoding as before w.r.t. the new codebook. All other operations are preserved and the same error analysis holds.
The derivation result in the same achievable rate region, under the new PMF factorization p U,X1|S1 p X2|Z,U,S2 .
Converse: The only difference in the converse compares to that of the previous section is that we need to show the PMF factorization and prove the new Markov chains. The rate bounds on R 1 and R 2 are the same and obtained using the exact same arguments. Continuing the derivation from this point, we need to show that the following Markov chains hold Note that now the PMF of the random variable is Now x 2,i is also a function of z i and not only z i−1 . Therefore, the first two Markov-chains hold due to the same arguments in the previous section. As for the last Markov, consider Summing for (m 1 , m 2 , , z n i+1 , s i−1 2 , s n 1,i+1 ) results in in which (S 1,i , X 1,i ) ↔ (S 2,i , S i−1 1 , Z i−1 , S n 2,i+1 , Z i ) ↔ X 2,i is Markov. All other arguments regarding the memoryless property of the channel and the time-sharing random variable Q hold. This concludes the proof of Theorem 3.

IX. CONCLUSIONS AND FINAL REMARKS
Using a variation of the cooperative-bin-forward scheme, we have found the capacity of the SD-RC and MAC with partial cribbing, when non-causal CSI is given to the decoder and one of the transmitters. Remarkably in the both setups only one auxiliary random variable is used for obtaining the capacity region. The same cooperation codeword is designated to play both the roles of common message and compression of the state sequence. It is evident that in the special case of the MAC the non-causal access to the state endowed states compression and, consequently, increased the capacity region.
Cooperative-bin-forward heavily relies on the fact that the link for the cooperation, i.e., the link from the encoder to the relay (or the cribbed signal in the MAC) is deterministic. Since the transmitter can predict and dictate the observed output (by the relay) it can coordinate with the relay based on the same bin index. However, it is not known how the cooperative-bin-forward scheme can be generalized to cases where the link between the encoder and the relay is a general noisy link.

APPENDIX A PROOF FOR INDIRECT COVERING LEMMA
In section V we presented an indirect covering lemma. Although we do not actually perform covering in a traditional manner, we do ask for the number of seen bin indexes. Namely, we want to bound the following probability P (n) e P |{l : ∃k s.t. Bin (Z n (k)) = l}| < 2 n(R−δn) ≤ ∆ n and ensure that both δ n and ∆ n goes to zero as n goes to infinity.
Assume v n ∈ A (n) ǫ (p V ) and recall that according to the random experiment, we have Bin (Z n (k)) = Bin (Z n (j)) ∀j = k and k, j ∈ D 2 } (112c) and the events By definition of E 3 and law of total probability, it follows that We will bound each probability separately.
2) We will now deal with E 2 . First, note that given E c 1 , we have with probability one that |D 1 | > 2 n(R−δ (1) n ) .
3) We will follow similar arguments as the previous bound. Define 1 [∃j = k : Bin(Z n (j)) = Bin(Z n (k)), j ∈ D 2 ] (120) and recall that the probability of each bin index is independent of the realization of {Z n (k)} k . It follows that By Markov's inequality, for any γ ′ 2 > 0 n → 0 and δ Finally, for any γ 1 , γ 2 > and n sufficiently large, if

APPENDIX B PROOFS FOR SPECIAL CASES OF MAC
The special cases in section III are captured by Theorem 2. We restate here the region as a reference for the following derivations. To simplify the derivations, we consider the region for only one state component S which is available only at Encoder 1. The capacity region for discrete memoryless MAC with non-causal CSI in Fig. 3 is given by the set of rate pairs (R 1 , R 2 ) that satisfy Case A: Multiple Access Channel with states (without cribbing): This case is captured by Theorem 2 by setting z(x 1 , s) = 0, ∀x 1 ∈ X 1 , s ∈ S, since in this configuration there is no cribbing between the encoders. The inequality in (125e) results in I(U ; S) ≤ 0, which enforces U to be independent of S. Thus, region in (125) becomes with PMF of the form p U p X1|U,S p X2|U . Note that U ↔ (X 1 , X 2 , S) ↔ Y forms a Markov chain. Therefore, the last inequality is redundant. It also implies that the capacity region in (126) is outer bounded by (8); degenerating U achieves that outer bound.
Case B: State dependent MAC with cooperation: We investigate capacity region for the case of orthogonal cooperation link and channel transmission, as depicted in Fig. 6. The cooperation link here is strictly causal due to the cribbing, i.e., X 2,i = f (M 2 , X n 1,p ). First, note that the region in (9) is an outer bound, since it is the capacity region of non-causal cooperation, i.e., when X 2,i = f (M 2 , X n 1,p ). The strictly causal configuration is captured by the cribbing setup when setting X 1 = (X 1c , X 1p ), Z = X 1,p and the channel transition PMF to p Y |X1c,X2,S . Then, the region in (125) becomes for PMFs of the form p U|S p X1|U,S p X2|U , Note that I(X 1c , X 1p , X 2 ; Y |S) = I(X 1c , X 2 ; Y |S) because X 1p ↔ (X 1c , X 2 , S) ↔ Y is a Markov chain. We identify the rate H(X 1p |U, S) as the cooperation rate R 12 . Let p X1|U,S = p X1p|U,S p X1c|U,S , and P X1p|U=u,S=s be a uniform distribution for every (u, s) ∈ U × S. By doing so, H(X 1p |U, S) = log 2 |X 1p | and I(X 1 ; Y |X 2 , U, S, X 1p ) = I(X 1c ; Y |X 2 , U, S). The latter holds since X 1p ↔ (X 1c , X 2 , S) ↔ Y is a Markov chain and X 1c is independent of X 1p . By denoting R 12 = log 2 |X 1p |, the regions in (9) and (127) coincide.
Case C: Point-to-point with non-causal CSI: First, note that the channel depends only on X 1 and S. Encoder 1 sends a message over the channel, and the states are revealed to it non-causally at the beginning of the transmission.
Encoder 2, however, has no message to send; in fact, it cannot send anything over the channel since the channel's output is not affected by X 2 at all. Therefore, the rate R 2 is 0. This configuration is captured by the MAC when Inserting (128) This region is smaller or equal to (10); if we drop the first and last inequalities, we get the expression for capacity 6 .
On the other hand, to show that the capacity in (10) so Z = f (X 1 , S), thus I(X 1 ; Y |S) = I(X 1 , Z; Y |S). Therefore, the first inequality becomes R 1 ≤ I(X 1 ; Y |S) + H(Z|S, Y ), which is also redundant due to the second. 6 The expressions for the capacity after dropping the constraints are not exactly the same, since the PMF domains are different. However, the capacity coincide, due to the objective and maximization.