On Capacity of the Writing onto Fast Fading Dirt Channel

The classic Costa's"writing on dirty paper"capacity result establishes that full state pre-cancellation can be attained in the Gel'fand-Pinsker channel with additive state and additive Gaussian noise. This result holds under the assumption that perfect channel knowledge is available at both transmitter and receiver: such an assumption is not valid in wireless scenarios affected by fading and with limited feedback rates. For this reason, we investigate the capacity of the"writing on fast fading dirt"channel, a variation of the"writing on dirty paper"channel in which the state sequence is multiplied by an ergodic fading process unknown at the transmitter. We consider two scenarios: when the fading process in not known at either the transmitter or the receiver and when it is known only at the receiver. For both cases we derive novel assessments of capacity which clearly indicate the limits of state pre-cancellation in the presence of fast fading.


INTRODUCTION
Costa [1] showed that the capacity of the "writing on dirty paper" channel, the Gaussian version of the Gel'fand-Pinsker [2] channel, is equal to the capacity of the Gaussian point-to-point channel without state, regardless of the power or the distribution of the state sequence. Albeit very promising, this result assumes that perfect channel knowledge is available at the users and the attainable rate quickly degrade with any uncertainty in the channel knowledge. The assumption of perfect channel knowledge does not hold in communication scenarios where channel conditions vary over time and with limited feedback between the receiver and the transmitter. This paper investigates the effects of partial channel knowledge on the performance of state precancellation and the topic of robust pre-coding. More specifically, we study the "dirty paper channel with fast fading dirt", a variation of Costa's "writing on dirty paper" channel in which the state sequence is multiplied by an ergodic fading process. in which only the receiver has knowledge of the fading realization. For both models, we derive the approximate capacity for some classes of fading distributions, also including classic distributions such as Rayleigh and Gaussian fading. These results are the first closed-form characterizations of capacity to be derived for models combining state pre-cancellation and ergodic fading.
Related Results: The Gel'fand-Pinsker (GP) channel is the point-to-point channel in which the channel output is obtained as a random function of the input and a state sequence which is provided non-causally at the transmitter. The capacity of this model is determined in [2] and is expressed as the maximization of a non-convex function for which the optimal solution is not easily determined, either explicitly or through numerical optimization. For this reason, very few closed-form expressions of the capacity result in [2] are available in the literature. In [1], Costa determines an explicit expression of capacity for the Gaussian version of the GP channel, termed the "Writing on Dirty Paper" (WDP) channel. Perhaps surprisingly, the capacity of the WDP channel coincides with the capacity of the Gaussian point-to-point channel, thus implying that it is possible for the transmitter to fully pre-code its transmission against the channel state sequence.
In the literature, few authors have investigated extensions of the WDP channel to include fading and partial side information.
The first channel model to include state pre-cancellation and slow fading is the "carbon copying onto dirty paper" (CCDP) channel of [3]: the CCDP channel is the M -user compound channel in which the output at each receiver is obtained as a linear combination of the input, Gaussian noise and one of M possible state sequences, all non-causally known at the transmitter.
The CCDP channel in which the M state sequences are scaled versions of an identical sequence, reduces to the WDP channel with slow fading and each receiver corresponds to one of M possible fading realizations.
The authors of [4] consider the case in which both the input and the state sequence are multiplied by the same fading value which is known at the decoder but not at the encoder. They consider both the case of fast and slow fading and evaluate the achievable rates using Costa pre-coding for these scenarios. It is show, in particular, that the rate loss from full state pre-cancellation is vanishing in both the fast and slow fading case as the transmit power grows. This result holds because the fading identically affects the channel input and the state: for this reason Costa pre-coding is still as effective as in the WDP channel.
The WDP channel with fast fading in which fading only affects the state sequence is first studied in [5] which focuses, in particular, on the case of phase fading. In [6], the same authors study this model from an outage analysis perspective and derive inner and outer bounds for the outage probability.
Achievable rates under Gaussian signaling are derived in [7] for the WDP channel in which the state sequence is multiplied by an iid Gaussian fast fading process. These attainable rates are also compared to lattice coding strategies and some numerical observation are provided on the performance of various coding choices.
In [8] we derive the first approximate capacity result for a class of WDP channel with slow fading in which a Gaussian state sequence is multiplied by one of M possible fading realizations. In this class of channels, which we term "strong fading" regime, the fading coefficients are increasingly spaced apart which makes it is impossible for the transmitter to simultaneously code for multiple fading realizations. Consequently, the best transmission strategy is to pre-code against the fading-time-state realization at each receiver for a portion 1/M of the time. Accordingly, capacity scales as 1/M times the capacity of the Gaussian point-to-point channel: this shows that the presence of fading can drastically affect the ability of the transmitter to exploit the channel state knowledge.
Contributions: In this correspondence, we investigate the capacity of the "Writing onto Fast Fading Dirt" (WFFD) channel 1 , a variation of the WDP channel in which the channel state is multiplied by an ergodic fading sequence. We focus on the case in which the state sequences is a zero-mean iid Gaussian process so that the channel is parameterized by the transmit power, the state variance and the fading distribution. We study (i) the case of "No Channel Side Information" (WFFD-NCSI), when the fading sequence is unknown at both the transmitter and the receiver and (ii) the case of "Receiver Channel Side Information" (WFFD-RCSI), when the fading sequence is known only at the receiver. In both scenarios, we derive the first approximate characterizations of capacity available for this class of models which include both channel state and fast fading.
A summary of the contributions for these two models is as follows: • Sec. III, Writing onto Fast Fading Dirt with No Channel Side Information (WFFD-NCSI) channel: We show that, when the fading entropy is sufficiently high, capacity can be approached by pre-coding the input against the state times the mean of the fading. Although conceptually simple, this result clearly shows that channel state knowledge can have very little value in the presence of even a small amount of channel uncertainty.

• Sec. IV, Writing onto Fast Fading Dirt Channel with Receiver Channel Side Information (WFFD-RCSI) channel:
The availability of receiver channel side information makes it possible the transmitter and the receiver to develop distributed pre-coding strategies which take advantage of the partial knowledge available at each terminal. The analysis is divided into the case of discrete and continuous fading distributions: • Sec. IV-A: Discrete fading distribution: We begin by determining capacity to within a constant gap for the case of uniform, antipodal fading: for this simple fading distribution capacity can be approached by transmitting the superposition of two codewords: one codeword treats the fading-times-state as noise while another codeword is pre-coded against one of the fading realization times the channel state. This result is extended to two classes of fading distributions: the class of distributions with a mode larger than a half and the class of uniform distributions with exponentially spaced points in the support.
• Sec. IV-B: Continuous fading distribution: We investigate the case of continuous fading by first discretizing the continuous fading values and then applying the approximate capacity results in Sec. IV-A. We numerically evaluate the gap between inner and outer bound for some typical fading distributions such Gaussian and Rayleigh fading. Unfortunately, the case of continuous fading is fundamentally harder to characterize and we are able to derive only partial results for this scenario.
The main theoretical contributions of the paper consist in the development of new outer bounding techniques to characterize the capacity of a model comprising both channel states and fading. In particular, we adapt the outer bounding techniques of [3] for the CCDP channel to the case of fast fading and further improve on the results in [3] by showing a tightening of the outer bounds for both the 2-user CCDP channel as well as the M -user CCDP channel. On the other hand, the inner bound used throughout the paper are rather straightforward combinations of time-sharing, Costa pre-coding and superposition coding. This is somewhat disappointing as it Paper Organization: The remainder of the paper is organized as follows: in Sec. I we introduce the channel models of the WFFD-NCSI channel and WFFD-RCSI channel. Sec. II presents related results available in the literature. In Sec. III we derive the approximate capacity results for the WFFD-NCSI channel while, in Sec. IV, we derive the approximate capacity results for the WFFD-RCSI channel. Finally, Sec. V concludes the paper.

I. CHANNEL MODEL
In the GP channel [2] the output of a point-to-point channel is obtained as a random function of the channel input and a state sequence. The encoder has non-causal knowledge of the state sequence and can thus design the channel input as to minimize the adverse effects of the state on the channel output. The WDP channel [1] is the GP channel in which the output is obtained as a linear combination of the channel input, the state sequence and a Gaussian noise sequence. We consider a variation of the WDP channel in which the state sequence is multiplied by a iid fading sequence. The state is assumed to be Gaussian distributed, while the fading can have any distribution, discrete or continuous, P A , with Var[A i ] = 1. The channel output is therefore obtained as: for E |X N | 2 ≤ N P and Z i , S i ∼ N (0, 1), i.i.d. and A ∼ P A i.i.d. and i ∈ [1 . . . N ], c ∈ R + and Var[A i ] = 1.
In the following we consider two distinct channel models: the channel in which the fading sequence is unknown at either the transmitter or the receiver and the channel in which it is available only at the receiver: • the "Writing onto Fast Fading Dirt with No Channel Side Information" (WFFD-NCSI) channel for which for Y N in (1).
• the "Writing onto Fast Fading Dirt with Receiver Channel Side Information" (WFFD-RCSI) channel for which for Y N and A N in (1).
A graphical representation of the WFFD-NCSI channel and the WFFD-RCSI channel is depicted in Fig. 1: the switch in the figure indicates whether the fading sequence is known at the receiver (WFFD-RCSI) or not (WFFD-NCSI).
In the study of the WFFD channel, standard definitions of code, achievable rate and capacity are employed. Fig. 1  encoding and a decoding function respectively defined as: where K N is the support of the Random Variable (RV) K N .
The rate R is achievable if there exists a sequence of codes of rate R such that the probability of decoding error goes to zero as the block-length N goes to infinity.
Definition 2. Achievable rate. Let the probability of decoding error for a (2 N R ; N ) code be defined as then, a rate R ∈ R + is said to be achievable if there exists a sequence of codes such that lim N →∞ P e (2 N R ; N ) = 0.
Note that the error probability in (5) is averaged over all state sequences and messages. Inner and outer bounds for which for some constant ∆ ∈ R + are said to characterize the capacity to within an additive gap of ∆ bits-per-channel-use (bpcu).
At first sight, the model in (1) seems to be quite restrictive but actually encompasses a large class of channel models. To observe this, consider a more general formulation than that of (1) where for some Z ′N , S ′N iid Gaussian distributed while E[|X ′N | 2 ] ≤ N P ′ . The noise variance and input gain in (7) can be reduced to one by scaling the channel output and the channel input and accordingly adjusting the input power constraint. The fading-times-state term c ′ A ′N S ′N can be rewritten (dropping the superscript N for convenience) as: Note now that µ A µ S in (8) is constant term that can be subtracted from the channel output. The term µ S A N can be subtracted from the output in the WFFD-RCSI channel while it is equivalent to additive noise in the WFFD-NCSI channel. For these reasons it is possible to assume µ S = 0 with only a small loss of generality.
Note also that which shows that the coefficient c in (1) can be used to normalize the variance of both fading and state to one. Finally, we can assume c > 0 since the state distribution is symmetric. Since we are interested in determining the capacity to within a small gap, we shall actually assume c > 1, since capacity for c < 1 can be attained to within one 1 bpcu by treating the fading-times-state term as noise.

II. RELATED RESULTS
In this section, we briefly introduce some classic results available in the literature which are relevant in the study of the WFFD-NCSI channel and the WFFD-RCSI channel.
• The Gel'fand-Pinsker (GP) channel. The capacity for the GP channel is a classic information theoretic result.
This expression in (10) is convex in P X|S,U for a fixed P U|S which implies that X can be chosen to be a deterministic function of U and S. On the other hand, (10) is neither convex nor concave in P U|S for a fixed P X|S,U . Given the fact that (10) contains an auxiliary RV U and its convexity properties, determining a closed-form solution for the maximization in (10) is extremely challenging. Additionally, the lack of tight bounds on the cardinality of U makes in very hard to obtain numerical approximations of the optimal value.
• The "Writing on Dirty Paper" (WDP) channel. One of the few channel models for which the maximization in (10) is known in closed-form is the WDP channel for which the assignment in (10) yields C = 1/2 log(1 + P ), regardless of the distribution of S N .
• The "Carbon Copying onto Dirty Paper" (CCDP) channel. In the CCDP channel [3], the channel outputs in a M -user compound channel are obtained as a linear combination of the input, Gaussian noise and one of M possible state sequences.
The transmitter has knowledge of all of the M state sequence and must guarantee correct decoding at all the receivers. More specifically, the channel output at the m th receiver is obtained as: and lower bounded as The outer bound in (13) is derived through a clever bounding of the entropy terms obtained from Fano's inequality which we later improved in [9]. The inner bound in (14) is obtained by sending the superposition of two codewords: the base codeword treats the states as additional noise while the top codeword is pre-coded against a combination of the state sequences. In [10] we show that the inner bound in (14) is to within a small gap from capacity.
• Writing onto Fast Fading Dirt with Receiver Channel Side Information (WFFD-RCSI) channel. In [11], the RHS (10) is optimized for the WFFD-RCSI channel in which S, U, A and X are restricted to be jointly Gaussian RVs.
for K ⊂ [−1, 1] 3 then, the following rate is an inner bound to capacity of this model: for The result in Th. II.3 attempts to generalize the result of [1] to the fast fading case although, in all likelihood, one needs to consider a wider class of joint distributions P UX|S than the jointly Gaussian distribution, even when all the variables in (10) are Gaussian distributed.
III. THE WRITING ONTO FAST FADING DIRT WITH NO CHANNEL SIDE INFORMATION (WFFD-NCSI) CHANNEL.
The WFFD-NCSI channel is the WDP channel in which the state sequence S N is multiplied by a fast fading realization A N , unknown at both the transmitter and at the receiver. In this section we show that, when A has large entropy, capacity can be approached by ignoring the randomness in the fading and instead pre-coding the channel input against the sequence cµ A S N . Intuitively, when the entropy of the fading process is sufficiently high, then the value of the fading-times-state is too unpredictable and the best coding option is to pre-code against the average fading value times the state sequence. Although conceptually simple, this result makes it possible to obtain a tight characterization of capacity for a number of canonical fading distributions, such as Gaussian, log-normal and Rayleigh fading. Note that the capacity of the WFFD-NCSI channel is implied by Th. II.1, but no closed-form expression of the maximization in (10) is known.
Theorem III.1. Outer bound and approximate capacity for the WFFD-NCSI with a continuous fading distribution.
Consider the WFFD-NCSI channel and let A have a continuous distribution with entropy h(A) = 1 2 log(2πeα) for some α ∈ (0, 1], then capacity C is upper bounded as and the exact capacity C is to within a gap of max{G NCSI , 1} bpcu from R OUT in (18b) for Proof: The proof can be found in App. A and only an outline of the proof is provided here. The outer bound in (18) is obtained by providing the state to the receiver as a genie-aided side information. Since neither the transmitter or the receiver have knowledge of the fading value, this genie-aided model is equivalent to a fading point-to-point channel in which the fading-times-state term acts additive noise with random but known variance. In the inner bound, the transmitter pre-codes against the sequence cµ A S N while treating the term c(A N − µ A )S N as additional noise. Note that, since in Costa pre-coding the input is independent from the state, this additive noise is also independent from the input.
to obtain a unitary variance,then the capacity of the WFFD-NCSI channel is to within a gap for µ Z in (20), from the outer bound in (18b).
Note that G NCSI log is not finite for all values of σ 2 Z and is actually increasing with σ 2 Z . In particular, we have that α in (18b) for this distribution is so that α log → 0 as σ 2 Z → ∞.
The result in Th. III.1 is substantially a negative result since it shows that, for a number of typical fading distributions, the best strategy is to Costa pre-code against the mean value of the fading-times-state and treat the remaining randomness as additional noise. Note, additionally, that when the fading has zero mean, state knowledge at the transmitter is useless.
The gap between inner and outer bounds in Lem. IV.5 is evaluated using the outer bound in (18b): a tighter characterization of capacity can be obtained by numerically evaluating the outer bound in (18a) for the different distributions in Lem. IV.5.
Similarly, it is possible to obtain larger inner bound by explicitly evaluating the attainable region for each different output  distributions. This tighter characterization of capacity is presented in Fig. 2 for P = 100, c ∈ [0, 10] and where: • R OUT P2P is the point-to-point capacity 1/2 log(1 + P ) which is a trivial outer bound to the capacity of any WFFD-NCSI channel, • R OUT GA is the genie-aided outer bound in (18a),

IV. THE WRITING ONTO FAST FADING DIRT WITH RECEIVER CHANNEL SIDE INFORMATION (WFFD-RCSI) CHANNEL
The WFFD-RCSI channel is obtained from the WFFD-NCSI channel by additionally providing the receiver with the fast fading sequence A N . As for the WFFD-NCSI channel, the capacity of the WFFD-RCSI channel is a special case of the GP capacity in Th. II.1 but obtaining a closed-form expression is very challenging.
In actuality, the study of the WFFD-RCSI channel is more arduous than that of the WFFD-NCSI channel because of the distributed way in which transmitter and receiver can cooperate in managing the fading-times-state term. As an illustrative example, consider the WFFD-RCSI channel with no additive noise and in which the state and the input are restricted to take Given the cardinality of the input, the capacity of this channel is at most 1 bpcu and can be attained by setting of the distribution of S. With this assignment, U can be recovered from the squared channel output, in fact: so that the receiver can reconstruct U as regardless of the distribution of A. This simple example shows that the maximization in (10) for the WFFD-RCSI channel might yields some unexpected results and that jointly Gaussian assignments are, in general, far from optimal.
Another interesting observation is with regard to the effectiveness of DPC in the WFFD-RCSI channel: since the receiver has knowledge of the fading, the transmitter could choose a fading value k ∈ A and pre-code the transmitted codeword against the fading-times-state realization ckS N . The receiver would then attempt to decode the transmitted codeword from those channel outputs in which the fading realization is sufficiently close to the value k. This transmission strategy attains From the expression in (25) we observe that fading values such that a ∈ [k − 1/c, k + 1/c] can be decoded without incurring in much rate loss from the point-to-point capacity. It then follows that the value k should be chosen so as to maximized and thus the best choice of k might not corresponds to the mean or the mode of the distribution P A .
Let then denote I as the interval of length proportional to 1/c which contains most of the probability in P A 2 : the expression in (25) can then further be bounded as where κ is a constant proportional to the length of I. Note that the negative logarithmic term in the RHS of (26) can be considered small for all practical purposes (as the decoder could disregard the channel outputs corresponding to problematic fading realizations) so the overall attainable rate is P A (I) times the point-to-point capacity. This implies that, from a high level perspective, DPC as in the WDP channel only attains a fraction of the point-to-point capacity corresponding to the probability that the fading value falls in chosen interval of size proportional to 1/c. When the transmit power grows to infinity, then, a positive transmit rate can be attained for all fading distributions, although this strategy might be far from optimal.
In the next subsection we investigate the case of discrete fading distributions and, in the following subsection, extend these results to the case of continuous fading distributions. The approximate capacity results for discrete fading distributions are obtained by adapting and improving on the outer bounding techniques for the CCDP in [3]. The outer bound in [3] holds for a compound channel and is adapted to the WFFD-RCSI channel by letting each fading sequence in the typical set of P A be a compound user. Since the distribution of the typical sequences is roughly uniform over the typical set, it is possible to adapt the compound channel approach to the fast fading scenario, although one needs to carefully account for the fact that the number of "virtual" compound users grows with the block-length N .
Another novel ingredient which allows for a crucial tightening of the outer bounds in [3] is obtained from the following observation.
Lemma IV.1. The capacity of the WFFD-RCSI channel is decreasing in c. The capacity of the WFFD-RCSI channel is decreasing in the parameter c, the gain of the fading-times-state.
Proof: The proof is presented in App. B.
Although seemingly straightforward, Lem. IV.1 is key in tightening the outer bounds in presented in the remainder of the section.

A. WFFD-RCSI channel with a discrete fading distribution
In this subsection we first derive the approximate capacity for the case in which A is uniformly distributed in {−1, +1} to introduce the main inner and outer bounding techniques. This result is then extended to the case in which A has a mode larger than a half and for the case in which A is uniformly distributed over a set of increasingly spaced points.
1) WFFD-RCSI channel with antipodal, uniform fading: we start by showing that, when A is a uniform antipodal sequence, one can adapt the result in (13) for the 2-CCDP channel to obtain a an outer bound for the WFFD-RCSI channel. This is despite of the fundamental difference between the two models, being the CCDP channel a slow fading model while the WFFD-RCSI channel is a fast fading model: in the former model a fading value is randomly chosen and kept fixed throughout the transmission block-length while, in the latter model, the fading value is randomly drawn at each channel transmission.

Theorem IV.2. Outer bound for the WFFD-RCSI channel with antipodal uniform fading. Consider the WFFD-RCSI
channel in which A is uniformly distributed over the set A ∈ {−1, +1}, then the capacity C is upper bounded as by considering a channel with a parameter c ′ = min{ √ P + 1, c} ≤ c. Such a channel has a larger capacity than the original channel but provides a tighter outer bound to the capacity of the original channel.

Theorem IV.3. Outer bound and approximate capacity for the WFFD-RCSI channel with antipodal uniform fading.
Consider the WFFD-RCSI channel where A is uniformly distributed over the set {−1, +1}, then the capacity C is upper bounded as and the exact capacity C is to within a gap of G RCSI Proof: The proof is a continuation of the proof of Th. IV.2 in App. C and can be found in App. D.
Capacity in Th. IV.2 can be approached by the inner bound represented in Fig. 4 using the notation in [12], in which the channel input is comprised of the superposition of two codewords: let α ∈ [0, 1] and α = 1 − α then • The bottom codeword X N PAS , at power αP , conveys the message W P AS at rate R P AS by treating the fading-times-state sequence cA N S N as additional noise.
• The top codeword X N SAN , at power αP , conveys the message W SAN at rate R SAN and is pre-coded against the state sequence cS N as in the WDP channel.
Since the codeword X N PAS treats the fading-times-state as noise, it attains the rate The inner bound approaching capacity in Th. IV.2, represented using the notation in [12]: nodes represent codewords, solid arrows superposition coding and dotter arrows binning.
On the other hand, since A N is equal to one for half of the time on average, the codeword X N PAS is correctly pre-coded against cA N S N for half of the time as well. Accordingly we have as argued in (26) and the total transmission rate is The expression in (31) can be optimized over the power allocation ratio α: this optimization yield an expression identical to that of (28) minus 1/2 bpcu.
As for the result in Th. III.1, the gap in Th. IV.3 can be tightened by numerically evaluating inner and outer bounds for the given fading distribution, as shown in Fig. 5. In Fig. 5 we also include the performance of different Gaussian coding strategies inspired by the result in [1] which illustrate the decay in performance of jointly Gaussian strategies in the presence of fading.
As in Th. II.3, one can consider the class of strategies in which P UX|S in (10) is restricted to jointly Gaussian distributions.
This class of strategies attains the rate for and any ρ US , ρ UX and ρ XS in (15). More precisely, Fig. 5 depicts • the outer bound R OUT CC (CC is in "Carbon Copy" in reference to [3]) in (28), • the inner bound R IN SAN (SAN as in "State As Noise") in which the fading-times-state is treated as noise, • the inner bound R IN GA (LA as in "Gaussian Assignment") for jointly Gaussian [X U S] in (32) and finally • the inner bound R IN Costa is the optimal assignment in (32) when ρ XS = 0, that is, when the input is independent from the  state as in [1].
The result in Th. IV.3 can be extended beyond the case of binary support and beyond the case of uniform distribution: both these extensions are considered in the remainder of this subsection. In must be noted that the outer bounding technique utilized in Th. IV.3 relies on the fact that P A is a discrete distribution for which one can define typical sets using strong typicality. As we shall see, extensions to the case of continuous fading distributions are generally harder to tackle.
2) WFFD-RCSI channel with a discrete fading distribution with mode larger than half: Although valid only for one discrete fading distribution, the result in Th. IV.3 is the first characterization of capacity for the WFFD-RCSI channel. By building upon this result, in this section we derive the approximate capacity for the WFFD-RCSI channel in which P A is discrete distribution with mode larger than one half. From a high-level perspective, for this class of fading distributions the transmission strategy in Fig. 4 is still as effective as in Th. IV.3 since full state cancellation can be attained for more than half of the time by having the transmitter pre-code against the mode of A times the state, Theorem IV.4. Outer bound for the WFFD-RCSI channel with a discrete fading distribution with mode larger than half. Consider a WFFD-RCSI channel in which A has a discrete distribution with mode larger than a half, that is then the capacity C is upper bounded as and the exact capacity C lies to within a gap of min{G RCSI Proof: The proof can be found in App. E.
Note that the gap between inner and outer bound in Th. IV.4 depends only on the fading distribution and is otherwise independent on the other channel parameters, P and c.
The inner bound in Th. IV.4 is a variation of the attainable scheme in Fig. 4 in which the top codeword is pre-coded against the mean fading cmS N .
It is interesting to notice that the result in Th. IV.4 depends on the mean of the fading µ A : one might expect a better attainable strategy could be attained by pre-coding the base codeword against the codeword cµ A S N so as to remove the dependency of (35) from µ A . Unfortunately this is, in general, not the case as argued at the beginning of the section, when discussing the performance of DPC in the WFFD-RCSI channel.
The result of Th. IV.4 can be readily evaluated for some simple discrete distributions.
Lemma IV.5. Gap from capacity for some discrete fading distributions.
• Geometric distribution. Consider the case in which A is distributed according to a geometric distribution, i.e.   • Binomial Distribution. Consider now the case in which A has a binomial distribution of the form For this distribution we can approximate G RCSI B using the Gaussian approximation of the binomial distribution while G ′RCSI B can be easily evaluated from the fact that the distribution's mode is in zero:  Lemma IV.6. Outer bound and approximate capacity for the WFFD-RCSI channel with a discrete fading distribution with mode larger than half. Consider a WFFD-RCSI channel in which A has a discrete distribution with mode larger than a half as in Th. IV.4 and assume moreover than then the capacity C is upper bounded as and the exact capacity is to within a gap of 2 bpcu from the outer bound in (40).
Proof: The proof is provided in App. F.
We refer to the condition in (41) as "strong fading condition". For this regime, we show that the encoder cannot simultaneously pre-code the input against multiple fading realization and thus it can effectively cancel the channel state only by pre-coding the input against the state sequence ckS N for some k ∈ A. This strategy, as shown in (26), can only attain a fraction 1/M of the point to point capacity.
The condition in (41) does not seem to offer itself to any intuitive or operational interpretation. Indeed, the deterministic binary linear approximation of Gaussian networks in [13] can be used to glean some important intuitions on the nature of this result. The deterministic approximation technique of [13] focuses on the interaction between different the components in the channel outputs through a powerful visualization: we briefly introduce the model here, solely for the illustrative purposes: more details can be found in [13] and it the related literature. Consider the channel where S is a binary matrix with S ij = δ i−1,j for (i, j) ∈ [1 . . . m] 2 and X Through the approximation in Fig. 8 we provide an intuitive explanation of the "strong fading" condition in (41): consider This result is similar in spirit to our result in [10] for the WFFD channel with slow fading that is, for the channel in which a fading coefficient is constant through for the transmission block-length.
Theorem IV.7. Outer bound and approximate capacity for the WFFD-RCSI channel in the "strong fading" regime.
Consider a WFFD-RCSI channel in which A is uniformly distributed over a discrete set A for some 1/(c − 1) < a 1 < a 2 < . . . < a M , while a i+1 − a i > 1/c then, if a i+1 ≥ ca i , the capacity C is upper bounded as and the exact capacity lies to within a gap of max{G RCSI Proof: The proof is provided in App. G.
The achievability proof in Th. IV.7 again relies on the superposition of two codeword: one treating the fading-times-state as noise and one pre-codes against one realization of the fading times the state sequence, as for the proof of Th. IV.3 and Th.
IV.4. The pre-coding done by the top codeword allows for perfect state pre-cancellation only for a portion 1/M of the time, but no better pre-coding is possible under the conditions in (41).
Remark IV.8. The result in Th. IV.7 can be generalized to the case in which a i+1 ≥ κca i for some κ ∈ R + . In this case the gap between inner and outer bound becomes a function of κ and such gap increases as κ decreases. We choose to focus on the case κ = 1 for the sake of clarity and ease of exposition.
Lemma IV.9. Gap from capacity for a fading distribution. As an example of the result in Th. IV.7, consider the case in which A is uniformly distributed over the set

B. WFFD-RCSI channel with a continuous fading distribution
The results derived so far are limited to the case of discrete fading distributions: although relevant from a theoretic standpoint, this case is not particularly meaningful in practical applications. In this subsection we investigate the how the bounding techniques developed so far can be applied to the case of the continuous fading distributions.
The results obtained so far all rely on a rather intuitive attainable strategy in which part of the transmitter power is used to send a codeword that treats the fading-times-state term as additional noise while the remaining power is use to pre-code against one realization of the fading times the state. We conjecture that this strategy also approaches capacity in the case of a continuous distribution and we thus focus on the derivation of outer bounds to match this inner bound expression.
We first derive an outer bound along the same lines as the outer bound in Th. IV.3 and argue that this outer bound is tight when the loss from binning is relatively small.
A more complex set of outer bounds is then derived which relies on upper bounding the capacity of a WFFD-RCSI channel with continuous fading with the capacity of a WFFD-RCSI channel with discrete fading. The gap between the capacity of the two models can be made small by letting the discrete distribution be an appropriately quantized version of the continuous distribution. More specifically, the capacity of the original channel is enhanced by providing a genie-aided side information at the receiver of the form where A N is a quantized version of the fading sequence A N so that E N has the form of a quantization error.
and the exact capacity lies to within a gap of max{G RCSI Proof: The proof is provided in App. H.
Note that the expression in (49a) is equal to infinity for all discrete distributions and might, similarly, be infinity or not defined for continuous distributions. Additionally, the term in (49b) is small only when most of the probability is concentrated in a small interval of size 1/c. For this reason we wish to develop and alternative approaches to the derivation of the approximate capacity for the case of continuous fading distributions.
Using the genie-aided bounding technique in (47) Proof: The proof is provided in App. I.
• Continuous uniform fading. We begin by considering the case in which A is uniformly distributed in the continuous interval also depicted in Fig. 12(a). According to Th. IV.11 we have that the outer bound in (28) is to within a gap G ∆ U from the capacity of the original channel for Note that the above derivation can also be repeated for the case in which the distribution has mean µ A by using the result in Th. IV.4.
• Gaussian distributed fading. Consider the case in which A is Gaussian distributed with mean zero and variance one: in this case we can consider the following quantized version of A, also depicted in Fig. 12 this case we can consider the following quantized version of A, also depicted in Fig. 12(c) where the parameters b i , the edges of the quantization intervals, are obtained through the recursion . Similarly the centroids of the quantization regions, the parameters a i , are obtained through the recursion The values b i are chosen so that the value a i has probability 1/M . The term ǫ controls the sway of the centroids and is chosen so as to obtain unitary variance variable.
Since quantization interval each capture a fraction 1/M of the probability and since the cdf of the Rayleigh distribution grows to one as a negative exponential, the space between the centroids increases exponentially. As a consequence of this, the and let and P A (I) = 1 − P A (I), then the capacity C is upper bounded as for some constant k which does not depend on the channel parameters but only on P A . Moreover the exact capacity lies to within a finite gap G conj (which again depends only on P A ) from the outer bound in (56).
The achievability of the outer bound proposed in Conj. IV.1 is simple: as for the proof of all the theorems so far, the transmitter sends the superposition of two codewords: a base codeword treating the interference as noise and a top codeword which Costa pre-codes against ca ′ S N . In this case, the realization of the fading which is chosen is the one for which, in an interval of size proportional to 1/c most of the probability is concentrated. Unfortunately we are currently unable to provide a matching outer bound for this attainable rate.

V. CONCLUSIONS
In this paper we study the capacity of the Writing of Fast Fading Dirt (WFFD) channel, a variation of the classic "writing on dirty paper" channel in which the state sequence is multiplied by an ergodic fading sequence. We focus on the case in which the state is in iid Gaussian sequence while the fading is at iid sequence with either a discrete or a continuous distribution.
We study both the case in which the fading sequence is unknown at both the transmitter and the receiver ("No Channel Side Information"-WFFD-NCSI channel) and the case in which the fading sequence is known only at the receiver ("Receiver Channel Side Information"-WFFD-RCSI channel). The outer bound in (18) is obtained by providing the sequence S N to the receiver as side information: since the sequence A N is unknown at both the transmitter and the receiver, providing S N as a side information does not drastically increase the capacity of the genie-aided channel.
In the inner bound, the transmitter simply pre-codes its transmission against the sequence cµ A S N as in the WDP channel.
• Capacity outer bound. Using Fano's inequality we write: where (57a) follows from the GME property given that A ⊥ X and Var[A] = 1 by definition and from the fact that

Equation (57b) follows from Jensen's inequality. Note that the mean of A does not influence the bound in (57). For the term
where (58a) follows from the Markov Chain When P A is continuous, the term −H(S j , A j |S j ) can be rewritten as where γ is the Euler's constant γ ≈ 0.577. Combining the bounds in (57) and (58) we obtain the expression in (18b).
• Capacity inner bound. Consider Costa's dirty paper coding strategy to pre-cancel the sequence cµ A S N while disregarding the remaining randomness in the fading. This can be attained with the assignment in (10) which attains the rate R IN (k) for which where (59) is obtained by upper bounding H(U |Y ) in (10) using the GME property. The optimal choice of k in (59) is which achieves R IN for as expected.
• Gap between inner and outer bound. First note that, if c 2 ≤ 3, a trivial outer bound to capacity is so that the gap between this outer bound and the inner bound in (61) is When c 2 > 3], by comparing the outer bound expression in (18b) and the inner bound expression in (61), we have that the difference in the two expressions is where (63a) follows from the assumption that c 2 > 3.

APPENDIX B
PROOF OF TH. IV.1 Consider two sequences S N 1 and S N 2 obtained as iid sequence from the distribution N (0, Q m ), m ∈ {1, 2} such that Q 1 + Q 2 = 1 and independent from each other. In the WFFD-RCSI channel, the interference sequence S N can be equivalently written as: The outer bound in (27) is obtained by adapting the derivation of the outer bound in (13) to the fast fading scenario. This is attained by defining a "conjugate" fading sequences a n (a N ) = −a N and combining the negative entropy terms containing the channel input given the transmitted message obtained from Fano's inequality for A N = a N and A N = a N .
• Capacity outer bound. Using Fano's inequality we write For the term H(Y j |A j ), regardless of j, we have where (67a) follows from the Gaussian Maximizes Entropy (GME) property by letting X Gj and S Gj be jointly Gaussian RVs with correlation ρ XS and Var[X] = P . The expression obtained with this bounding can be optimized over the correlation ρ XS between X Gj and S Gj to obtain: where the maximum is attained for ρ XS = 0.
We next wish to bound the term H(Y N |A N , W ) in (66): to do so let d(a N ) be the decimal representation of the antipodal sequence a N obtained letting −1 → 0 and 1 → 1, ie With the notation in (68) we write: where a N = −a N . The equality in (69a) follows from the fact that, if d(a N ) where (70a) follows from the transformation  which has Jacobian equal to one. Equation (70b) follows from the fact that W ⊥ S N ; (70c) from the fact that X N is a function of W and S N and from the Markov chain W − X N , S N − Y N . Next, we observe that 2 N −1 terms in the summation in the RHS of (70d) are all identical and equal to 1/2 log(2πe4c 2 ) + 1/2 log(2πe) so that from which it follows that the outer bound in (66) can be rewritten as The outer bound in (28) is obtained by employing Lem. IV.1 to argue that the outer bound in Th. IV.2 can be tightened by setting the fading-times-state gain to a value c * < c.
The inner bound approaching the expression in (28) is derived by considering the transmission strategy in Fig. 4 and optimizing the power allocated to each codeword in (31).
therefore, if c 2 ≥ P + 1, then the minimum of (28) is for c * = c. On the other hand, if 0 ≤ c 2 < P + 1, the optimal the optimal value is for c * = √ P + 1. For the case of c 2 < 1, instead of the outer bound in Th. IV.2, we consider the capacity of the point-to-point channel as a trivial outer bound. In the regime where the gain of the fading-times-state is small, capacity can be approached by simply treating the fading-times-state term as additional noise.
• Capacity inner bound. The transmission strategy in Fig. 4 achieves the rate region while the overall transmission rate is obtained as R = R SAN + R PAS .
For the scheme in (74) consider the assignment for any α ∈ [0, 1], α = 1 − α. The assignment in (75) attains Note that the term I(Y ; U PAS |A = −1) − I(U PAS , S) can take negative values: this corresponds to the fact that the rate necessary to create redundant codewords to perform binning is larger than the information rate. Luckily, we can bound this term as follows: since which is always verified.
From (77) and (78) we obtain that, for a given α, the attainable rate of the scheme in Fig. 4 satisfies The derivative of the lower bound in the RHS of (79) in α is therefore, if c 2 > P + 1, then α = 1 is optimal. If 1 ≤ c 2 < P + 1, the optimal α is c 2 −1 P while, if c 2 < 1, the optimal value is α = 0. We then conclude that the optimization of the parameter α, ie power allocated to each codeword in the scheme in The outer bound in (35) is obtained by generalizing the outer bound derivation for the Th. IV.2 and Th. IV.3. This requires two distinct bounding techniques: first we expand the bounding in (69a) by carefully defining a "pairing" of the fading sequences a N and a N so as to make it possible to repeat this derivation to a more general fading distribution. After this first step, the proof proceeds in a further and rather interesting generalization of the bounding in (70). Typicality of the fading sequences is used to create a set of "virtual" compound users as in the CCDP channel. The number of typical sequences grows with N but, perhaps surprisingly, this does not hamper the generalization of the outer bound.
Finally, Lem. IV.1 is again used to tighten the outer bound expression over the gain of the fading-times-state term.
The inner bound derivation is rather trivial, as we employ a modification of the scheme in Th. IV.3 in which the base codeword X N PAS is also pre-coded against the sequence cmS N . • Capacity outer bound. Using Fano's inequality we write where (82a) follows from the GME and (82b) follows from Jensen's inequality. Next we derive a bound on H(Y N |W, A N ) based on the typical realizations of the fading process, that is for those sequence a N such that where N (k|a N ) is the number of symbols k ∈ A is the sequence a N , that is Accordingly, the ǫ-typical set T N ǫ (P A ) is defined as the set of a N ∈ A N which satisfy (83), i.e. : Using the letter-typicality (also referred to as strong-typicality) in (83), we write: Let now the block-length N sufficiently large such that ǫ ≤ With this definition of a N , we next introduce the channel output Y N given A = a N as where Z N has the same marginal distribution of Z N but any chosen joint distribution.
If a N is in the typical set T N ǫ (P A ) as defined in (85), so is a N , since a is a permutation of a N and the definition in (85) does not depend on the ordering of the elements. Since that A N is an iid sequence, we also have that P (a N ) = P (a N ) and, from its definition, we have that the mapping a N (a N ) is a one-to-one mapping of the typical set onto itself and thus With these observations, we can now consider the output Y N |A = a N together with the output Y N |A = a N and bound the negative entropy term in (86) as in (69): where (89b) follows from the fact that S N and the additive noises are independent from W . The passage in (89c) follows by To bound the term in (89c) we can make use of the following properties of the typical sets: for δ ǫ = 2|A|e −n2 min k PA(k) .
We can use the proprieties of the typical sets in (90) to bound the probability and cardinality of the typical sequences in the summation in (89c): more specifically, using (90a), we write: where (91a) follows from the bound in (90a) while (91b) follows from the fact that S N , Z N and Z N are iid sequences.
Since we have set Z i ⊥ Z i in (89c), we have where (93b) follows from the cardinality bound in (90b) and (93c) from the definition in (85). When N is sufficiently large and ǫ sufficiently small, we then have that for some ǫ all that goes to zero as N → ∞.
Using the bound in (94) in (82b) and for some ǫ all sufficiently small as N → ∞, we obtain We next optimize the above expression over the parameter c 2 over the set [0, c 2 ] using the result in Th. IV.1. The optimal choice of c 2 in (95) is When P m c 2 (1 + µ 2 A ) ≥ P m (1 + P ), the optimal choice of c in (96) yields the outer bound where h 2 (x) in (97a) indicates the binary entropy function. With the bound in (97), we obtain the outer bound in (35).
• Capacity inner bound. The inner bound that approaches the outer bound in (35) is the same as in Th. IV.3 but where the top codeword is pre-coded against the sequence cmS N . With the same assignment as in (75) but for U SAN in (75a) defined as we obtain the latter term is bounded as With the bounding in (99) we obtain the achievable rate where (100b) follows from the fact that the bound in (99) holds for any P , and thus also for αP . The optimal value of αP in (100b) is so that, when P m ≤ P m c 2 (1 + µ 2 A ) ≤ P m (P + 1) we have Finally, we have shown the achievability of the outer bound • Gap between inner and outer bound. A gap between inner and outer bound of 3 bits in the interval P m c 2 (1+µ 2 A ) > P m can be obtained by comparing the two expressions in (35) and (102) in the cases i) P m ≤ P m c 2 (1+µ 2 A ), ii) P m ≤ P m c 2 (1+µ 2 A ) ≤ P m (P + 1) and iii) P m c 2 (1 + µ 2 A ) > P m (P + 1).

APPENDIX F
PROOF OF LEM. IV.6 The condition in (39) guarantees that the gap G ′RCSI G is at most one bit. Under this condition, the outer bound expression also takes a simpler form, since the optimization over c is no longer necessary.
• Capacity outer bound. If P < 1 or c 2 < 1, then the capacity is at most 1 bpcu: for this reason we assume that P ≥ 1 in the following. From (94a) we have the outer bound expression which is obtained by letting Z ⊥ Z in (92a).
• Capacity inner bound and gap from capacity. The inner bound in (99) can be rewritten as log (a − m) 2 c 2 P P + 1 + P + c 2 a 2 + 1 P + 1 .
When P + 1 > a 2 c 2 , ∀ a and since P ≥ 1, we have log (a − m) 2 c 2 P P + 1 + P + c 2 a 2 + 1 P + 1 which shows the desired gap between inner and outer bounds.
The definition in (104) is graphically presented in Fig. 14 for the case A = {α 1 , α 2 , α 3 } and α 1 ≤ α 2 ≤ α 3 : • The portion of a N equal to α 1 , is equal to α 2 in a N (1) and equal to α 3 in a N (2) . • The portion of a N equal to α 2 , is equal to α 3 in a N (2) and equal to α 1 in a N (3) . • The portion of a N equal to α 3 , is equal to α 1 in a N (2) and equal to α 2 in a N (3) .
Note that, as for the definition of a N (a N ) in Fig. 13, we have that P (A N = a N ) = P (a N (k) (a N )) and a N ∈ T N ǫ =⇒ a N (a N ) ∈ T N ǫ , since all the symbols in A are equiprobable.
Accordingly, we conclude that Sequence a N Sequence a N Portion of a N with a i = α 3 Portion of a N with a i = α 2 Portion of a N with a i = α 1 as in (87), as in for Z N (k) being an iid Gaussian sequence with zero mean and unitary variance, as in (106).
A first bounding of capacity follows the same bounding as in (82) and results in where (107b) follows from Jensen inequality. In the following we will focus only on the bounding of the negative entropy term −H(Y N |W, A N ): this bounding involves a recursion which we illustrate first for the case M = 3 and successively for the case of a general M .
Again for A = {α 1 , α 2 , α 3 } and, as in (92a), we have that the term in (107b) can be rewritten as Using the definition of a N (k) (a N ) in (104), we write Consider the set from the definition in (104), we have this set consists of a permutation of the sequence where Z N 12 is obtained as a Continuing the series of inequalities in (108), we have where (112b) follows from the observation in (110) and (112c) follows again from the fact that this transformation has unitary Jacobian. As in (113), we have that coincides to the vector c(a 3 − a 1 )S N + Z N 13 , for Z N 13 defined similarly as in (111) where (116a) follows from the fact that a 1 > 1/(c − 1). Similarly − N H c 2 (a 3 − a 1 )(a 2 − a 1 ) 1 − c(a 2 − a 1 ) c 2 (a 2 − a 1 ) 2 + 1 S + Z 13 − c 2 (a 3 − a 1 )(a 2 − a 1 ) c 2 (a 2 − a 1 ) 2 + 1 Z 12 ≤ − N 2 log 2πe 1 + c 2 (a 3 − a 1 ) 2 1 + c 2 (a 2 − a 1 ) 2 where, in (117a), we have used the fact that a 3 > ca 2 implies (a 3 − a 1 ) > c(a 2 − a 1 ) and the fact that c 2 (a 2 − a 1 ) 2 > 1 by assumption.
The term in (115a) only contains independent noise terms, so that We can now write −3H(Y N |A N = a N ) ≤ −2 N 2 log 2πe(c 2 + 1).
• The passage in (115) is repeated M − 2 more times to successively remove the terms ∆ i S N + Z i(i−1) from the negative entropy expression in (121b) so that where Z i,i+1 is defined analogously to Z 12 in (111).
Each term H(∆ i S + Z i |∆ 1 S + Z 1 . . . ∆ i−1 S + Z i−1 ) in (123) can be evaluated as The condition in (41) can be used to bound the term in (124) as a j > ca j−1 =⇒ a j − a 1 > c(a j−1 − a 1 ) =⇒ This proof is an adaptation of the proof of Th. IV.4 in App. E for the case of a continuous fading distribution. Instead of carefully defining a "pairing" of sequences as in Fig. (13), we simply consider a randomly chosen sequence to bound the negative entropy terms. This implicitly weakens the outer bound but, on the other hand, provides an outer bound which is easier to compute.
• Capacity outer bound. Using Fano's inequality we write where Y ′N in (128a) is an equivalent channel output [Y ′ A ′ ] ∼ [Y A]. We continue the bounding of entropy the terms in where (129a) holds only for continuous distributions. By combining (129b) with (128a) we obtain the outer bound in (48), after the optimization for c * ∈ [0, c]. Note that (129a) requires a continuous fading distribution.
• Capacity inner bound. In the inner bound derivation we adapt the achievable scheme in Th. IV.4 and the bounding in (99) to the case of a continuous fading distribution to obtain the attainable rate by optimizing over α we obtain the desired result as for the previous theorems.
APPENDIX I PROOF OF LEM. IV.11.
Let A ∆ be any discrete RV and define for Z i ∼ N (0, 1) iid. Consider now the case in which E N in (131) is provided to the receiver as a side-information, then where, in (132), we define Y N = X N + cA N S N + Z N .
Now for the term I(E N ; W |A N , Y N ) we have Note A N is a deterministic function of A N , so that we bound the remaining entropy as which concludes the proof.