On Additive Channels with Generalized Gaussian Noise

—This paper considers a problem of communication over an additive noise channel where the noise is distributed according to a Generalized Gaussian (GG) distribution. Inthe ﬁrst part of the paper, a number of properties of the family of GG distributions are derived which are of independent interest. For example, considerable attention is given to the properties of the characteristic function of the GG distribution. In the second part of the paper, the capacity of an additive noise channel with GG noise is considered under p -th absolute moment constraints. It is shown that, even though Shannon’s upper bound is achievable in some instances, in general such achievability is not possible. Moreover, it is shown that discrete inputs can achieve capacity within a constant gap or full degree of freedom for any p -th absolute moment constraint. Following the seminal work of Smith, the paper also gives a condition under which discrete inputs are exactly optimal.


I. INTRODUCTION
It is well known that the traditional assumption of Gaussian noise does not capture scenarios arising in modern applications. The goal of this work is to study a large family of probability distributions termed generalized Gaussian (GG) that have received attention in many communication settings. We shall refer to X with the GG distribution defined by the probability density function (pdf) as X ∼ N p (μ, α p ). Another commonly used name for this type of distribution, especially in economics, is the Generalized Error distribution. The flexible parametric form of the pdf of GG distributions allows for tails that are either heavier than Gaussian (p < 2) or lighter than Gaussian (p > 2) which makes it an excellent choice for many modeling scenarios. The origin of the GG family can be traced to the seminal work of Levy [1]. The best-known cases of this family of distributions include: the Laplace distribution for p = 1; the Gaussian distribution for p = 2; and the uniform distribution on [−β, β] for p = ∞ and α = lim p→∞ Past Work. In communication theory, the GG distribution finds many modeling applications in impulsive noise channels which occur when the noise pdf has a longer tail than the Gaussian pdf. For example, in [2] it is shown that in ultrawideband systems with time-hopping (TH-UWB) the interference should be modeled with probability distributions that are longer-tailed than Gaussian. Moreover, it has been shown that for moderate and high signal-to-noise ratio (SNR) the TH-UWB interference is well modeled by the GG distribution with parameter p ≤ 1. In [3] and [4] many atmospheric noises were shown to be impulsive and GG distributions with parameter values of 0.1 < p < 0.6 were shown to provide good approximations to their distributions. In [5] GG has been recognized as a model for the underwater acoustic channel where values of p = 2.2 and p = 1. 6 have been found to model ship transit noise and sea surface agitation noise, respectively. In [6] the GG distribution has been shown to model RF signals in medical applications that are involved in recovering echocardiogram images.
Other application of the GG distribution include modeling of Gabor coefficients in face-recognition problems [7]; load demand in power systems [8]; pixels forming fine-resolution synthetic aperture radar (SAR) images [9]; and texture retrieval problems [10].
As the pdf of the GG family has a very simple form, many quantities such as moments, entropy and Rényi entropy can be easily computed; see for example [10] and [11].
From the information theoretic perspective the GG distribution is interesting because it maximizes the entropy and Rényi entropy under a p-th absolute moment constraint [12], [13]. The maximum entropy property can serve as an important intermediate step in a number of proofs. For example, in [14] it has been used to generalize the Ozarow-Wyner bound [15] on the mutual information of discrete inputs over arbitrary channels.
While the number of applications of the GG distribution is large, many of its properties have been drawn from numerical studies, and few analytical properties of the GG family are known beyond the cases p = 1, 2 and p = ∞. For instance, very little is known about the characteristic function of the GG distribution and only expressions in terms of hypergeometric functions are known [16]. Paper Outline and Main Contributions. Our contributions are as follows: In Section II, Theorem 1 connects the pdf of GG distributions to positive definite functions. In particular, we show that for p ≤ 2 the pdf of the GG distribution is a positive definite function and for p > 2 the pdf is not a positive definite function. Moreover, it is shown that for p ≤ 2 the pdf of the GG distribution can be expressed as an integral of a Gaussian pdf with respect to some non-negative finite Borel measure. In Section III, Proposition 1 gives inequalities and properties of moments and absolute moments of the GG distribution. In Section IV we study properties of the characteristic function of the GG distribution: in Proposition 2 we show conditions under which the characteristic function of the GG distribution is a real analytic function; in Theorem 3 we study the distribution of zeros of the characteristic function of the GG distribution. In particular, it is shown that for p ≤ 2 the characteristic function of the GG distribution has no zeros and is always positive, and for p > 2 the characteristic function has at least one positive-to-negative zero crossing. In Theorem 4 we study the conditions under which a GG distribution of order p can be additively transformed into another GG distribution of order q. In Section V we study properties of information measures of the GG distribution: in Theorem 5 and Theorem 6 we review properties of the differential entropy of the GG distribution; in Proposition 5 we derive the derivative of the mutual information with respect to SNR of an additive channel with GG distributed noise; and in Proposition 6 some properties of the relative entropy of GG distribution are considered. In Section VI we study properties of an additive channel with GG distributed noise of order p under a q-th moment constraint: in Proposition 8 we show that Shannon's upper bound, besides being achievable for (p, q) = (2, 2) (i.e., Gaussian noise and the second moment constraint), is also achievable for (p, q) = (1, 1) (i.e., Laplace noise and the absolute first moment constraint); and in Proposition 9, using the deconvolution results from Theorem 4, it is shown that in general Shannon's upper bound is not achievable; in Proposition 10, it is shown that for any (p, q) Shannon's upper bound is achievable within a constant gap by a uniformly spaced equally distributed discrete input, implying that discrete inputs achieve the full degrees of freedom; in Proposition 11 we show that the GG distributed input can also be optimal within an additive gap; in Proposition 13 we show that when p and q are even integers then the optimal input distribution is discrete with finitely many points; and in Proposition 14 we study the case of p = ∞ (amplitude constraint) and give a condition under which the optimal input distribution is binary.
Due to space limitations, the proofs are omitted and can be found in an extended version of this paper [17].
II. RELATION TO POSITIVE DEFINITE FUNCTIONS As will be observed throughout this paper the GG family exhibits different properties depending whether p ≤ 2 or p > 2. At the heart of this behavior is the concept of positive-definite functions.
is positive semi-definite. Our first result relates the pdf of the GG distribution to the class of positive definite functions.
where μ(x) is finite non-negative Borel measure on R + . The concept of positive-definite functions will also play an important role in examining properties of the characteristic function of the GG distribution.

III. MOMENTS
The moments of the GG distribution are given as follows: Proposition 1. (Moments [11].) For any p > 0 and k > −1 the moments of X ∼ N p (0, α p ) are given by Moreover, for any p > 0 and k > −1 the absolute moments are given by The following corollary, which relates k-th moments of two GG distributions of different orders, is useful in many proofs.

IV. THE CHARACTERISTIC FUNCTION
The focus of this section is the characteristic function of the GG distribution. This analysis will play an important role in analyzing the output distribution of additive channels with GG distributed noise. Theorem 2. For p > 0 the characteristic function of X ∼ N p (0, α p ) is given by Examples of the characteristic function of X ∼ N p (0, 1) for several values of p are given in Fig. 1.
a) Analyticity Properties of the Characteristic Function: An important question, in particular for numerical methods, is whether the characteristic function of a random variable can be represented by a power series of the form i.e., is real analytic on some domain. The above expression is especially useful since the moments of the GG distribution are know for every k; see Proposition 1. • |t| < 1 2 for p = 1; and • t = 0 for p < 1.
The above results also imply that for p > 1 the moment generating function of X ∼ N p (0, 1) exists for all t.
b) On the Distribution of Zeros of the Characteristic Function: As can be seen from Fig. 1 the characteristic function of the GG distribution can have zeros. The following theorem gives a somewhat surprising result on the distribution of zeros of ψ X,p (t). Theorem 3. (Distribution of Zeros.) The characteristic function ψ X,p (t) has the following properties: • for p > 2, ψ p (t) has at least one positive to negative zero crossing; and • for 0 ≤ p ≤ 2, ψ p (t) is non-negative and for some non-negative finite Borel measure μ.
The following result is immediate from Theorem 3. Corollary 2. For 0 < p ≤ 2, ψ p (t) is a strictly decreasing function for t > 0. c) Deconvolution Results: Next we seek to understand when is again a characteristic function for all α ≥ 1. Such a question is important since often we are interested in the existence of random variable X independent of Z such that where Y is the target probability distribution. Here we are interested in answering whether it is possible to transform Z ∼ N p (0, 1) into αY where Y ∼ N q (0, 1). We first focus on the cases when ψ (p,q,α) (t) is not a characteristic function.
We would like to point out that for 2 < q ≤ p there are cases when ψ (p,q,α) (t) is a characteristic function. The most trivial case is p = q and α = 1 in which ψ (p,q,α) (t) = 1, which is a trivial characteristic function. A less trivial example is when p = q = ∞ in which case ψ X,∞ (t) = sinc(t) and ψ (∞,∞,α) (t) = sinc(αt) sinc(t) .
In the above example since zeros of ψ p (t) occur periodically we can select an α such that for ψ (p,q,α) (t) the poles and the zeros cancel out. However, we conjecture that such examples are only possible for p = ∞, and for 2 < p < ∞ zeros of ψ p (t) do not appear periodically and the second statement of Theorem 4 can be improved to the following. Conjecture 1. For 2 < q ≤ p < ∞, ψ (p,q,α) (t) is not a characteristic function for any α > 1.
It is not difficult to check, by using the property that convolution with an analytic function is again analytic, that Conjecture 1 is true if p is an even integer and q is any non-even real number.
One of the consequence of Theorem 4 is that the additive transformation from one GG distribution to another GG distribution is not always possible. Proposition 3. Let Z ∼ N p (0, 1). Then we have • for (p, q) ∈ S and any α ≥ 1 there exists no random variable X independent of Z such that where Y ∼ N q (0, α q ); and • for any {(p, q) : 2 < q ≤ p} there exists some α ≥ 1 such that there is no random variable X independent of Z such that In view of Theorem 4 it remains to answer what happens to ψ (p,q,α) (t) when 0 < q = p < 2.

t) is monotonically decreasing for α > 1 then it is a characteristic function.
For example, in the case of p = 1 (Laplace) we have that which is a monotonically decreasing function for α > 1 and corresponds to a random variable X with the pdf given by . At this point we are unable to show that ψ (p,q,α) (t) is monotonically decreasing for {(p, q) : 0 < q = p < 2} even though the numerical simulations seem to suggest so.

V. INFORMATION MEASURES a) Entropy and Conditional Entropy:
From an information theoretic point of view the class of GG distributions is interesting since it maximizes the entropy subjects to an absolute p-th moment constraint. Theorem 5. (Maximum Entropy Distribution [12].) Let The equality in (8) is attained iff X ∼ N p (0, α p ).
The conditional version of Theorem 5 was show in [14] and is given below. Theorem 6. For any U such that h(U ) < ∞ and E[|U | p ] < ∞ for some p ∈ (0, ∞), and for any V , we have where f (·) is an arbitrary measurable function.
The inequality in Theorem 6 results in a sharper version of the continuous version of Fano's inequality [12] given by Theorem 6 also plays a key role in the improvement of the Ozarow-Wyner bound that lower bounds the mutual information of discrete inputs over arbitrary channels [14].

b) Mutual Information and Conditional Mean Estimation:
The I-MMSE relationship [18] has found many applications in the context of Gaussian noise channels. Next, we give a similar expression for channels with GG distributed noise. Proposition 5. For any p ≥ 1 and any X ∈ L p independent of Z ∼ N p (0, 1), and Y = √ snrX + Z, we have Note that for p = 2, by using the orthogonality principle and the identity ( which recovers the I-MMSE result. c) Relative Entropy: Another question that often arrises in information theoretic applications is the relative entropy distance between Y = x + Z and Y = X + Z where x is a realization of some random variable X. Proposition 6. Let D(x) = D(N p (x, 1) f Y ). Then for any p > 0 with F (·) is the cumulative distribution function (CDF) of N p (0, 1). Moreover, whereŶ ∼ N p (x, 1). Proposition 6 gives the local behavior of relative entropy and will be useful later when we demonstrate under which conditions binary communication is optimal in channels with amplitude constraint.
where Z ∼ N q (0, 1) and F p (α) = X : Then the set F p (α) is convex and compact, and the supremum in (9a) is achievable for all p > 0. Moreover, where a p,q = p 2 cq cp p e q−p q and the inequality holds iff Y = X + Z is GG distributed of order p.
The upper bound in (9b) is often referred to as Shannon's upper bound. There are several instances when the upper bound in (9b) is tight, the most famous of which is (p, q) = (2, 2), i.e., Gaussian noise with a second moment constraint. Next, we look at the case of (p, q) = (1, 1); that is Laplace noise and the absolute moment constraint. Proposition 8. For (p, q) = (1, 1) Shannon's upper bound in (9b) is achievable. Moreover, the pdf of the optimal input distribution is given by f X (x) = 1 for some β ∈ (α, α + 1). The Laplace distribution has many connections to the exponential distribution. The achievability of Shannon's upper bound for channels with exponential noise and the first moment constraint was shown in [19].
However, in general Shannon's bound cannot be achieved and the inequality in (9b) is actually strict. Proposition 9.
• The inequality in (9b) is strict for all α > 0 and (p, q) ∈ S. • For every {(p, q) : 2 < q ≤ p} there exits an α > 0 such that the inequality (9b) is strict. Even though Shannon's bound is not achievable in general, we can show that it is asymptotically tight. Moreover, we show a remarkable fact that equally spaced and uniformly distributed input (PAM) achieves the full degree of freedom for any p, q. p . b) Achievability with GG Input: It is also interesting to note that discrete inputs are not the only inputs that can get close to the capacity; as shown next, this can also be done with inputs from the GG family. Proposition 11. (GG input is not too bad.) For any p, q > 0 where X ∼ N p (0, α p ) and . c) On Capacity Achieving Distributions: Next we examine the structure of the optimal input distribution. Proposition 12. (On the Optimal Input Distribution [20].) For C(α; p, q) the support of the optimal input distribution is • unbounded for q < p; and • compact for p > q. Proof. The proof follows by verifying [20,Theorem 7]. An important question is: When does the optimal input distribution in addition to being compactly supported have finite support? Using the method of Smith [21] we have the following. Proposition 13. For any even integers p and q such that p > q (including the case of p = ∞) the optimal input distribution for C(α; p, q) is discrete with finitely many points. d) Optimal Input Distribution in a Small Amplitude Regime: Next we investigate a case of C(α; ∞, q), i.e., an amplitude constraint. Specifically, we are interested in identifying when binary communication is optimal. In other words, for what values of α is the input X = {±α} equally likely optimal. The next result gives necessary and sufficient conditions for such optimality. Proposition 14. For q even, the binary input is optimal in C(α; ∞, q) if and only if α ≤ᾱ q wherē whereȲ ∼ fȲ (y) = F (y)−F (y−α) α and F (·) is the CDF of N q (0, 1).