An Upper Bound on the Number of Mass Points in the Capacity Achieving Distribution for the Amplitude Constrained Additive Gaussian Channel

This paper studies an n-dimensional additive Gaussian noise channel with a peak-power-constrained input. It is well known that, in this case, the capacity-achieving input distribution is supported on finitely many concentric shells. However, due to the previous proof technique, neither the exact number of shells of the optimal input distribution nor a bound on it was available.This paper provides an alternative proof of the finiteness of the number shells of the capacity-achieving input distribution and produces the first firm upper bound on the number of shells, paving an alternative way for approaching many such problems. In particular, for every dimension n, it is shown that the number of shells is given by O(A2) where A is the constraint on the input amplitude. Moreover, this paper also provides bounds on the number of points for the case of n = 1 with an additional power constraint.


I. INTRODUCTION
We consider an additive noise channel where the inputoutput relationship is given by where the input X ∈ R n is independent of the standard Gaussian noise Z ∈ R n . We are interested in finding the capacity of the channel in (1) subject to the constraint that X ∈ B 0 (A) where B 0 (A) is an n-ball centered at zero with radius A (i.e., peak-power constrained input), that is In his seminal paper [1], for the case of n = 1, Smith has shown that an optimizing distribution in (2) is unique, symmetric, and, perhaps surprisingly, discrete with finitely many mass points. Using tools such as the Identity Theorem from complex analysis, Smith has proven that the cardinality of the support set of the optimal input distribution cannot be infinite, and, thus, must be finite. Employing this proof by contradiction, Shamai and Bar-David [2] have extended the method of Smith to n = 2, and showed that, in this setting, the maximizing input random variable is given by X = R · U where the magnitude R is discrete with finitely many points and the random unit vector U , which is independent of R , has a uniform phase on [0, 2π). In other words, the support of X consists of finitely many concentric shells. As a matter of fact, this phenomena that the optimal input distribution lies on finitely many concentric spheres remains true for any n ≥ 2, cf. [3], [4] and [5].
Regrettably, the method of proof by contradiction does not lead to a characterization of the number of spheres (number of mass points when n = 1) in the capacity-achieving input distribution. In fact, as of the writing of this paper, very little is known about the structure of that distribution, and a very simple question remains open about 50 years after Smith's contribution: When n = 1, what is the cardinality of the support of the optimal input distribution as a function of A?
In this work, we provide the first firm upper bound on the number of points for n = 1 and the number of shells for every n ≥ 2, partially answering the above question. Furthermore, for the case of n = 1, using similar methods, we also provide an upper bound on the cardinality of the support of the distribution achieving a) Prior Work: The history of the problem begins with [1], where Smith proves the discreteness of the capacityachieving input distribution and also shows the optimality of the equiprobable binary input on {±A} so long as A ≤ 0.1. Sharma and Shamai [6] extend the result of Smith, and argue that an equiprobable input on {±A} is optimal if and only if A ≤Ā ≈ 1.665. The proof of the result in [6], which generalizes to vector channels, is shown in [7].
A progress on the algorithmic aspect of computing the optimal input distribution is made in [8] which proposes an iterative procedure that converges to the a capacity achieving distribution based on the cutting-plane method. The bound on the number of mass points found in this work is relevant for numerical methods as it reduces the optimization space.
A number of papers have also focused on upper and lower bounds on the capacity in (2). Broadly speaking, there are three types of capacity upper bounding approaches. The first approach uses the maximum entropy principle [9,Chapter 12] and upper bounds the output differential entropy, h(Y ), subject to some moment constraint [10]. The second approach uses a dual capacity characterization where the maximization of the mutual information over the input distribution is replaced by minimization of the relative entropy over the output distribution. A suboptimal choice of an output distribution in the dual capacity expression results in an upper bound on the capacity [11], [12], [13]. The third approach uses a characterization of the mutual information as an integral of the minimum mean square error (MMSE) [14], and leads to an upper bound by replacing the optimal estimator in the MMSE term by a suboptimal one [7].
There is also a substantial literature that extends the proof recipe of Smith to the other channels. For example, the approach of Smith for showing discreteness of an optimal input distribution has been extended to complex Gaussian channels [2], Rayleigh fading channels [15], and Poisson channels [16]. For an overview of the literature on various optimization methods that show discreteness of a capacityachieving distribution the interested reader is referred to [17].
b) Contributions and Paper Outline: In what follows: 1) Section II presents our main results; 2) Section III provides an upper bound on the number of extreme points, for the scalar case, of an arbitrary output probability density function (pdf) of the Gaussian channel described in (1). The proof of this result exploits the analyticity of the Gaussian density together with Tijdeman's Number of Zeros Lemma [18, Lemma 1]; 3) Section IV provides the proof of our main result for the case of n = 1. The idea behind this proof is to show that the maximum number of extreme points of the output pdf provides an upper bound on the number of mass points of the optimal input distribution. The main element of the proof relies on Karlin's Oscillation Theorem and the bounds on the number of extreme points developed in Section III. The proof for the vector case (n ≥ 2) follows along the same lines as the proof for the scalar case (n = 1), albeit with a more involved algebra, therefore it is omitted to abide the space constraints; and 4) Section V concludes the paper with some final remarks. c) Notation: Throughout the paper, the deterministic scalar quantities are denoted by lower-case letters, deterministic vectors are denoted by bold lowercase letters, random variables are denoted by uppercase letters, and random vectors are denoted by bold uppercase letters (e.g., x, x, X, X). We denote the distribution of a random vector X by P X . Moreover, we say that a point x is in the support, denoted by supp(P X ), of the distribution P X if for every open set O x we have that P X (O) > 0. The number of zeros of a function f : R → R on the interval I is denoted by N(I, f ).
Similarly, if f : C → C is a function on the complex domain, N(D, f ) denotes the number of its zeros in the region D.

II. MAIN RESULT
Theorem 1, stated below, gives the first firm upper bound on the support size of the capacity-achieving input of the additive Gaussian channel with an amplitude constraint. Theorem 1. Consider the amplitude constrained scalar additive Gaussian channel Y = X + Z where the input X, satifying |X| ≤ A, is assumed to be independent from the noise Z ∼ N (0, 1). Assuming A ≥ 1, let P X be the optimizing input distribution for this channel. Then, P X is a symmetric discrete distribution with where a 2 = 9e + 6 √ e + 5, a 0 = e + 2 log 4 √ e + 2 + 1.
Shown in Section IV, the proof of Theorem 1 uses an upper bound on the number of extreme points of Gaussian convolution that can be found in Section III.
Remark 1. Observe that the upper and lower bounds in (4) are not of equal order. We conjecture that the order of the lower bound is tight. See the extended version of this paper in [19] for a more detailed discussion.
Theorem 2. Consider the amplitude constrained vector additive Gaussian channel Y = X + Z where the input X, satisfying X ≤ A, is assumed to be independent from the white Gaussian noise Z ∼ N (0, I n ). Let X ∼ P X be the optimizing input for this channel. Then, P X is unique, radially symmetric, and the distribution of its amplitude, namely P X , is a discrete distribution with where Omitted because of the space constraint, the proof of Theorem 2 benefits from the same technique that is used in the proof of Theorem 1. For details, see [19].
Remark 2. Note that when the vector channel is of dimension 2, Theorem 2 gives an upper bound on the number of shells of the optimal input distribution for the additive complex Gaussian channel with an amplitude constraint.
For the sake of demonstrating the versatility of our novel method, shown next 1 is an upper bound on the support size of the optimal input distribution for the scalar additive Gaussian channel with both a peak-and an average-power constraints.
Theorem 3. Consider the amplitude and power constrained scalar additive Gaussian channel Y = X +Z where the input X, satisfying |X| ≤ A and E[|X| 2 ] ≤ P, is assumed to be independent from the noise Z ∼ N (0, 1). Assuming A ≥ 1, let P X be the optimizing input distribution for this channel. Then, P X is a symmetric discrete distribution with where a P0 = (1 + 2λ P )e + 2 log 2 + 4 √ e(1 + 2λ P ) 1 − 2λ P + 1, 1 Proof is omitted because of space constraints. See [19] for the details.
Remark 3. In the case of P ≥ A 2 , the power constraint is inactive and Theorem 3 recovers the result of Theorem 1.
III. BOUNDS ON THE NUMBER OF EXTREME POINTS OF A GAUSSIAN CONVOLUTION This section presents some of the main tools required in our analysis. Specifically, given an unknown constant 0 ≤ κ 1 ≤ max b f Y (b), our aim is to find a worst case upper bound on the number of zeros of the shifted output pdf Here f Y denotes the pdf of the random variable Y = X + Z, where X is an arbitrary zero mean 2 random variable at the input of the channel satisfying the amplitude constraint: |X| ≤ A; Z is the standard Gaussian random variable independent from X; and Y is the random variable induced by the input X at the output of this additive Gaussian channel.
As a starting point, before chasing after the number of zeros of f Y − κ 1 , the following lemma shows that the zeros of f Y − κ 1 are always contained on an interval that is only "slightly" larger than [−A, A]. Remark 4. Since the capacity-achieving input distribution for the channel in question is unknown, throughout this paper, the bounds are uniform over all possible inputs X satisfying |X| ≤ A. Equivalently, the bounds involving the output pdf f Y are uniform over all possible output distributions.

Lemma 1 (On the Location and Finiteness of Zeros). For a fixed κ
Since the exact value of the constant κ 1 is unknown, in counting the number of zeros of f Y − κ 1 , a worst-case approach needs to be taken. In an attempt at doing so, the following elementary result from calculus provides a bound on the number of zeros of a function in terms of the number of its extreme points. As simple as it is, Lemma 2 is one of the key steps in this paper. It states that, to find a bound on the number of zeros of f Y − κ 1 , it suffices to find a bound on that of f Y , eliminating the dependence on the nuisance constant κ 1 .

Lemma 2. Suppose that f is continuous on
where f denotes the derivative of f .
Thanks to Lemma 2, to upper bound the number of zeros of f Y − κ 1 , all that is needed is to find an upper bound on the number of zeros of the derivative of f Y , namely At this point, there are several trajectories that one could follow to produce an upper bound on the number of zeros, see the full version of this paper [19] for a detailed discussion. The method used in this paper is based on Tijdeman's Number of Zeros Lemma, which is presented next.

Lemma 3 (Tijdeman's Number of Zeros Lemma).
Let R, s, t be positive numbers such that s > 1. For the complex valued function f = 0 which is analytic on |z| < (st + s + t)R, its number of zeros N(D R , f ) within the disk D R = {z : |z| ≤ R} satisfies The following two lemmas, whose proofs (see [19]) are omitted because of space constraints, find upper and lower bounds on absolute value of the complex analytic extension 3 of f Y over a disc of finite radius centered at the origin.
Lemma 4. Let f Y : R → R as in (11) and letf Y : C → C denote its complex extension. Then, Lemma 5. Let f Y : R → R as in (11) and letf Y : By assembling the results of Lemmas 3, 4 and 5, Theorem 4 below provides an upper bound on the number of oscillations of a Gaussian convolution.
Proof. Let D R ⊂ C be a disk of radius R centered at z 0 = 0, and note that where (16) follows because zeros of f Y are also zeros of f Y ; (17) is a consequence of Lemma 3; (18) follows from Lemma 4 with B ← (st+s+t)R and Lemma 5 with B ← tR; and in (19) we use the fact that t = A R is the minimizer. Finally, combining the results of Lemmas 1 and 2, and Theorem 4, the following corollary presents the desired result of this section. Corollary 1. Given an arbitrary constant κ 1 ∈ 0, 1 2π , suppose R > A + log

IV. PROOF OF THE MAIN RESULT
This section proves the result presented in Theorem 1. The first ingredient of the proof is the following characterization of the optimal input distribution shown in [1, Corollary 1]. Lemma 6. Consider the amplitude constrained scalar additive Gaussian channel Y = X + Z where the input X, satisfying |X| ≤ A, is independent from the noise Z ∼ N (0, 1). Then, P X is the capacity-achieving input distribution if and only if where with h(Z) = log √ 2πe denoting the differential entropy of the standard Gaussian distribution, and f Y (y) denoting the output pdf induced by the input P X , that is, for X ∼ P X , Remark 5. An immediate consequence of Lemma 6 is the fact that x ∈ supp(P X ) =⇒ i(x; P X ) − C(A) = 0. In other words, where the function Ξ(·; P X ) : R → R is defined as a) Connecting the Number of Oscillation of f Y to the Number of Masses in P X : This section gives an alternative proof that P X is discrete by relating the cardinality of supp(P X ) to the number of zeros of the shifted output pdf f Y − e −C(A)−h(Z) . The following definition sets the stage.
Definition 1 (Sign Change of a Function). The number of sign changes of a function ξ is given by where N {ξ(y i )} m i=1 is the number of changes of sign of the sequence {ξ(y i )} m i=1 . Proven in [21], the following theorem is the main tool in connecting the number of zeros of an output pdf f Y to the number of mass points of a capacity-achieving input distribution P X .
Note that Theorem 5 is applicable in our setting as the Gaussian distribution is a member of the set of Polyá type-∞ functions [21]. The following result shows the connection between the support size of P X and the number of zeros of the optimal output pdf f Y .
Lemma 7. The support set of the capacity-achieving input distribution P X satisfies where κ 1 = e −C(A)−h(Z) and R > A + log Proof. First, observe that Ξ A (x; P X ), defined in (28), can be written as follows: where ξ A (y) = log 1 Next, using the fact that the Gaussian distribution is a member of Polyá type-∞ functions, where (34)  b) Proof of the Upper Bound: We begin by simplifying the previously provided upper bound on B κ1 . Note that an amplitude constraint |X| ≤ A induces a second moment constraint E[X 2 ] ≤ A 2 , and therefore 4 A function f : I1 × I2 → R is said to be strictly Polyá type-n if det [p(x i , y j )] m i,j=1 > 0 for all 1 ≤ m ≤ n, and for all x1 < · · · < xm ∈ I1, and y1 < · · · < ym ∈ I2. If f is Polyá type-n for all n ∈ N, then f is Polyá type-∞.
Remark 6. A more careful optimization of (45) over the parameter s would lead to better absolute constants a 0 , a 1 and a 2 . However, the order A 2 in (45) would not change. c) Proof of the Lower Bound: Using the fact that the optimizing input distribution is discrete with finitely many points and denoting by H(P X ) the entropy of the optimizing input distribution P X , it follows that ≤ H(P X ) (47) ≤ log (|supp(P X )|) , where (46) is thanks to Shannon [22,Section 25].
V. CONCLUDING REMARKS This paper has introduced several new tools to study the capacity of amplitude constrained additive Gaussian channels. Not only are the introduced tools strong enough to show that the optimal input distribution is discrete with finite support, but they are also able to provide concrete upper bounds on the number of elements in that support. Moreover, the method has been demonstrated to be easily generalizable to other settings such as a scalar additive Gaussian channel with both peak and average power constraints. In addition to the scalar cases, the method is shown to work for a vector Gaussian channel with an amplitude constraint A. In particular, for an optimal input X it has been shown that its magnitude X , is a discrete random variable with at most O(A 2 ) number of mass points for any fixed dimension n. Finally, it is highly likely that the presented approach generalizes to other (possibly non-additive) channels where channel transition probability is given by a Polyá type-∞ function, e.g., the Poisson channel.