A generalized Ozarow-Wyner capacity bound with applications

In this paper, a generalized Ozarow-Wyner capacity bound is presented that holds for arbitrary noise channels. The bound is then used to approximate the capacity of a large class of additive noise channels that are subject to a p-th moment input constraint, where p is some positive real number, as well as to the Cauchy noise channel with a logarithmic moment constraint. For both channel models the gap to the capacity is precisely specified.


I. INTRODUCTION
One of the crown jewels of Shannon's seminal paper [1] has been the characterization of the fundamental limit of reliable communication over an arbitrary memoryless communication channel called its capacity. The capacity is given by the mutual information between the channel input and the output maximized over a feasible set of channel input distributions. Up to the writing of this paper, however, a closed form solution to this maximization problem is known only for a handful of channel models such as the additive white Gaussian noise (AWGN) channel. While finding the optimal solution might even be impossible for certain channels, there is hope of finding a solution that is at least approximately optimal. The objective of this work is therefore to provide a simple and yet powerful tool that may lead to "good" capacity approximations. At the heart of our technique is a generalization of the Ozarow-Wyner capacity bound given in [2].

A. Related Work
The original bound of [2] was established by Ozarow and Wyner in order to demonstrate that pulse-amplitude modulation (PAM) signaling can achieve the capacity of an AWGN channel that is subject to an average transmit power constraint to within one bit. This result analytically confirmed an observation made by Ungerboeck in [3]. In [4], the bound of [2] has been sharpened and was then extended to AWGN channels with more general input constraints in [5]. The bound has also been proven to be useful for other problems such as two-user Gaussian interference channels [4], [6], communication with a disturbance constraint [7], energy harvesting problems [8], [9], and information-theoretic security [10]. For the AWGN channel there exist a number of other bounds that use discrete inputs as well (see [11]- [14] and references therein). The advantage of using Ozarow-Wyner type of bounds, however, lies in their simplicity as they only depend on the number of signal constellation points and the minimum distance of the constellation.

B. Contribution and Paper Outline
In Section II, we present an Ozarow-Wyner bound that holds for arbitrary channels (not necessarily additive ones). In Section III, we use this bound to approximate the capacity of additive noise channels that are subject to a p-th moment input constraint, where p is any positive real number. In particular, we provide an approximation that is based on discrete uniformly distributed inputs and precisely determine the gap to the capacity. In Section IV, we show that our technique can also be applied to more exotic additive noise channels such as the Cauchy channel with a logarithmic moment constraint (recall that Cauchy noise has infinite variance). Finally, Section V concludes the paper.

C. Notational Remarks
Deterministic scalar/vector quantities are denoted by lowercase normal/bold letters, random variables by uppercase letters, and random vectors by bold uppercase letters. The trace of a matrix is denoted as Tr(·). The support of a random vector X is written as supp(X). For x ∈ R, sec(x) denotes the secant and Γ(x) the gamma function. The entropy of any discrete X is denoted as H(X), whereas h(X) is used whenever X is continuous. All logarithms are taken to the base e and entropies and mutual informations are measured in nats.

II. A GENERALIZED OZAROW-WYNER BOUND
Let U ∈ R n be some random vector. Then, for any p > 0 For n = 1, we simply have U p p = E[|U | p ] and so we refer to U p p as the p-th moment of U in what follows. The following bound on the conditional differential entropy will be useful throughout the rest of the paper. Theorem 1. Let p ∈ (0, ∞) be arbitrary and g : R n → R n a measurable function. Then, for any pair of random vectors U ∈ R n and V ∈ R n such that h(U) < ∞ and U p < ∞, we have (2) Proof: The proof follows from the proof of [5, Th. 1]. Now, consider any time-discrete channel determined by the conditional probability distribution P Y|X , where X ∈ R n denotes the length-n channel input sequence and Y ∈ R n the corresponding sequence of channel outputs. 1 We make use of Theorem 1 in order to obtain a novel bound on the capacity of P Y|X if the inputs are restricted to be discrete, which is a generalization of the Ozarow-Wyner bound given in [2].
be a discrete random vector of finite entropy, g : R n → R n a measurable function, and p > 0. Furthermore, let K p be a set of continuous random vectors, independent of X D , such that for every U ∈ K p , h(U), U p < ∞, and Proof: Let U and X D be statistically independent. Then, the mutual information I(X D ; Y) can be lower bounded as Here, a) follows from the data processing inequality as X D + U → X D → Y forms a Markov chain in that order, and b) from the assumption in (3).
By using Theorem 1, we have that the last term in (7) can be bounded from above as 1 For Theorem 2 to be valid we do not need Y to be from R n . Actually, Y can be an element of any space.
Combining this expression with (7) results in with G 1,p and G 2,p as defined in (5) and (6), respectively. Maximizing the right-hand side over all U ∈ K p , measurable functions g : R n → R n , and p > 0 provides the bound.

Remark 1.
Note that the bound in Theorem 2 is very general in the sense that no assumptions have been made about P Y|X .

Remark 2.
In the context of the AWGN channel, the term G 2,p corresponds to the shaping loss, which vanishes as n tends to infinity (i.e., G 2,p ∈ O(log(n)/n)).

III. ADDITIVE NOISE CHANNELS WITH p-TH MOMENT INPUT CONSTRAINT
In this section, we demonstrate the usefulness of the generalized Ozarow-Wyner bound of Theorem 2 by applying it to additive noise channels whose inputs are constrained in their p-th moment. More precisely, let p > 0 and A ≥ 0 be real numbers and consider the channel model for some noise variable Z such that |h(Z)| < ∞ and where Z is independent of X. 2 Based on Theorems 1 and 2, in the following two subsections we provide an upper bound on the capacity of (8) as well as a bound on the gap to the capacity.

A. Capacity Upper Bound
Theorem 3. Let F be a convex and compact set of channel input distributions that fulfill the p-th moment constraint E[|X| p ] ≤ A and let C denote the capacity of the additive noise channel given in (8). Then, for some X ∈ F (dependent on p) and k 1,p as defined in (2).
Proof: By means of Theorem 1, we can upper bound the mutual information between the channel input and output as This leads immediately to an upper bound on the capacity: Note that a) follows from the fact that max inf ≤ inf max and b) since F is convex and compact and log k p 1,p ·e −ph(Z) E[|X+ Z| p ] is concave in the distribution of X, we have that for a given p the maximizer X exists.

B. Gap to Capacity
As evaluating the capacity bound of Theorem 3 is notoriously difficult, in the following we determine the gap to the capacity under some simplifying assumptions. In doing so, let An important ingredient to specify the gap will be PAM.
refers to a PAM constellation with N equally spaced points and average power P . Moreover, we use X D ∼ PAM(N, P ) to denote that the discrete random variable X D is uniformly distributed over PAM(N, P ).

Remark 3.
Notice that the Euclidean distance between adjacent points in PAM(N, P ) is 2Δ. It can be easily verified that E[X 2 D ] = P if X D ∼ PAM(N, P ). Theorem 4. Let the noise, Z, be distributed such that (9) exists and let X D ∼ PAM(N, P ) with and P chosen such that , where X as defined in Theorem 3. Then, for A ≥ 1 and any p > 0 Gap(r, s) with Gap as defined in (12) at the top of the next page.
Proof: Fix r such that 0 < r ≤ min(p, q), choose X according to Theorem 3, and note that by Jensen's inequality we have that Let X D ∼ PAM(N, P ) with N as in (11) and P chosen in the way that E[|X D | r ] = E[|X | r ]. Observe (see Remark 3) that the squared Euclidean distance between any given pair of adjacent points in PAM(N, P ) can be bounded from below as where the inequalities follow from: a) dropping the floor function and the minus one; b) using the bound r 2 ; and c) using that A ≥ 1. Now, let ξ r := max(2 r−1 , 1) and observe that Here, a) follows from the fact that x ≥ x 2 for every x ≥ 1, b) from E[|X D | r ] = E[|X | r ], c) by using the bound E[|X + Z| r ] ≤ ξ r (E[|X | r ]+E[|Z| r ]), and d) by means of Theorem 3.
We proceed by letting U be a continuous random variable that is uniformly distributed over the interval [−Δ, Δ) and independent of X D and by choosing g(y) ≡ y. Then, based on Theorem 2, we upper bound the gap term as follows: 3 gap(U, g, s) where the (in)equalities follow from: a) using , which holds for any U and V ; b) using the facts that E[|U | s ] = (2Δ) s 2 s (s+1) and h(U ) = log(2Δ); and c) using the bound in (13).
Combining (14) with (15) and taking the infimum over r and s proves the theorem.

C. Discussion
In the special case of Z ∼ N(0, σ 2 ) and p = 2 (i.e., an average power constraint E[|X| 2 ] ≤ P ) the number of constellation points (i.e., codewords) has to be chosen as where C(P, σ 2 ) refers to the AWGN channel capacity. For arbitrary Z and p, however, the number of constellation points has to be chosen such that r ≈ e C(A,p,q) , 0 < r ≤ min(p, q), which is the reasoning behind (11). The product k r 1,r e −rh(Z) can be interpreted as the r-th entropy power of Z. In [15] it has been shown that it is related to E[|Z| r ] in the sense that If a one dimensional signal constellation is used for the AWGN channel, the asymptotic gap to the capacity is This high power loss can be decomposed into the shaping loss 1 2 log πe 6 and the coding loss 1 2 log(2). Similarly, we may define the asymptotic loss of an arbitrary additive noise channel with a p-th moment constraint as Moreover, for channels where we approximately have equality in (16) and where 2 ≤ min(p, q) it follows that showing that the AWGN channel with a second moment constraint has the worst asymptotic loss.
IV. CAUCHY-NOISE CHANNEL WITH A LOGARITHMIC MOMENT CONSTRAINT Now, we apply the generalized Ozarow-Wyner bound of Theorem 2 to an additive noise channel with a different type of constraint: the Cauchy-noise channel with a logarithmic moment constraint. 4 The output of this channel is given by Y = X +Z with Z being a Cauchy random variable of density p Z (z) = 1 πγ γ 2 z 2 + γ 2 , z ∈ R , γ > 0, whereas the channel input is restricted to the space (4) .
(17) In [16], it has been shown that for A ≥ γ, the capacity of the Cauchy-noise channel with logarithmic moment constraint is which is achieved by a Cauchy-distributed input. Under a mild assumption on the ratio A/γ, however, using X D ∼ PAM(N, P ) as the channel input results in a gap to the capacity of at most 2.5 nats.
Theorem 5. Let X D ∼ PAM(N, P ) with N = A/γ and Then, X D ∈ F and for A γ ≥ 1 Proof: The fact that X D ∈ F follows immediately by using Jensen's inequality: