On Information Rates Over a Binary-Input Filtered Gaussian Channel

We study communication systems over band-limited Additive White Gaussian Noise (AWGN) channels in which the transmitter’s output is constrained to be symmetric binary (bipolar). We improve the available Ozarow-Wyner-Ziv (OWZ) lower bound on capacity which is based on peak-power constrained pulse-amplitude modulation, by introducing new schemes (achievability) with two advantages over the studied OWZ schemes. Our schemes achieve a moderately improved information rate and they do so with much fewer sign transitions of the binary signal. The gap between the known upper bound, which is based on spectral constrains of bipolar signals, and our new achievable lower bound is reduced to 0.93 bits per Nyquist interval at high SNR.


I. INTRODUCTION AND PROBLEM DEFINITION
W E STUDY communication systems over band-limited Additive White Gaussian Noise (AWGN) channels in which the transmitter's output is constrained to be bipolar, as presented in Figure 1.Such systems arise when the power efficiency must be high or when the transmitter needs to be of very low complexity.Those systems are usually implemented by some form of Pulse Width Modulation (PWM), Pulse Position Modulation (PPM), or similar schemes, operating over Gaussian noise channels [1], [2], [3].Communication systems with binary transmitted signals are of recent practical interest in millimeter-wave wide-band applications, e.g., [4], [5].
In this work, we examine theoretical limits on communication with binary transmission, not limited to PWM.We are interested in the reliable information rate supported by this system focusing mainly on the region of asymptotically high SNR.This theoretical problem was addressed by Ozarow, Wyner, Ziv (OWZ) [6] using the Pulse Amplitude Modulation (PAM) method.OWZ [6] showed that performance, measured by mutual information, achievable with a signal peak-limited to ± √ P can also be achieved with a binary-valued ± √ P signal with a very high Sign Transition Rate (STR).They applied this finding to design a PAM scheme with symbols uniformly distributed in [− √ P, + √ P], which provides an achievable lower bound on the capacity of the system.As implied by [6], peaklimited continuous-time signals such as filtered PAM, can in principle be also band limited [7] and hence represented by sampling at an appropriate rate, while the equivalent (in the sense of [6]) bipolar processes cannot be strictly bandlimited [8].A lower bound exceeding for low SNR that of [6], was presented in [9], based on improved bounds for intersymbol-interference Gaussian channels.Additional results on capacity of systems with binary inputs, some with additional constraints on average transition rate, minimum inter-transition time and out of band power, are presented in [10] and [11].Systems with limited minimal transition times were investigated in [12], including systems with mild filtering, that is, not strictly bandlimited as in [6].
The binary channel input carries information in its transition times.Sampling the binary input at a Nyquist rate corresponding to the channel bandwidth would degrade the performance severely, thus the system in Figure 1 falls in general into the category of Faster Than Nyquist (FTN) signaling which is of wide current theoretical and practical interest.In recent years FTN signaling approaches, forms and extensions of classical pulse-amplitude modulation strategies have emerged.See overviews of these relevant domains in [13], [14], [15] and references therein.See also recent examples of advanced theory and techniques in [16], [17], [18], [19].FTN can provide significant advantages in terms of capacity with prescribed modulation techniques and signaling strategies, though the resultant channel may suffer significant inter-symbol-interference, which demands higher complexity detection procedures.Yet, no peak-power restrictions are imposed on the resultant time-continuous process, which is a central part in our scheme.This is motivated by practical constraints, as was also the case in [6], reflecting the constraints of magnetic storage media.
In this work, we present new schemes with two advantages over [6].They achieve a moderately improved information rate and do so with much fewer sign transitions of the binary signal.The new schemes require STR of only up to twice the Nyquist rate of the channel, while [6] uses STR many folds higher than the Nyquist rate; if implemented fully, the STR in [6] is infinite.Low STR is easier to implement in systems which are already wide-band and in which each sign transition must pass a power amplifier such as [4], [5].We extended the technique of analysis in [6] to the new schemes in which the transmitted signal is a non-linear function of the information sequence.
The studied communication system is presented in Figure 1.It comprises an encoder producing a binary-valued ± √ P input x(t) where P is the transmit power, AWGN channel with noise Power Spectral Density (PSD) of 1  2 N 0 watt/Hz (double-sided) and a receiver.The channel has a frequency response H(f ), in our case a unity frequency response at frequencies from 0 to B and zero otherwise.The channel output y is where z(t) is the filtered desired signal and n(t) is the Gaussian noise.We denote by B the bandwidth of the low pass brick-wall filter in Hz, T = 1/(2B) is the Nyquist sampling period associated with B, ρ = P N 0 B is the signal to noise ratio (SNR), log denotes the natural logarithm and bold lower-case letters denote vectors and sequences.

II. KNOWN PERFORMANCE BOUNDS
Shamai and Bar-David [20], derived an upper bound on the system capacity, based on the fact that the Power Spectral Density (PSD) of a binary-valued signal is limited by certain constraints presented in [21] and [8].They analyzed limits on spectral densities of binary signals and then upper bounded the capacity of the system by Mutual Information (MUI) when the channel input has the capacity achieving Gaussian distribution with the same PSD as the binary-valued signal.For high SNR they proved that relative to the capacityachieving frequency-flat Gaussian input there is a power loss at least by a factor of γ = 0.9337, see definition of γ below.The same paper considers Random Telegraph Signal (RTS) as an interesting example rather than a bound and the power factor there is around γ = 0.63 which is an upper bound on the performance of RTS.The capacity C G bits/second of the channel with PSD given as S(f ) used in [20] is the well-known expression where C G is achieved with a Gaussian input.In the limit of asymptotically high S(f ) N 0 the capacity C G becomes For a frequency-flat Gaussian signal of bandwidth B this yields Multiplying S(f ) in ( 2) by a factor γ increases C h G by B • log 2 γ information bits per second which are = 1 2 log 2 γ bits per Nyquist sampling interval.Consequently, the equivalent SNR gain is defined, for a scheme with bandwidth B, as a function of difference in information per Nyquist interval between the scheme and the AWGN channel with the same bandwidth and transmit power as, OWZ [6] derived the following achievable lower bound using the modulation method [6] described in the introduction.
where I OWZ stands for mutual information per Nyquist interval.This corresponds to γ OWZ = 2e π 3 = 0.1753.The bipolar signal that achieves the performance of the PAM modulation technique in [6] involves high transition rate of the binary signal.An improved lower bound in the low SNR regime is reported in [9].

III. NEW ACHIEVABLE SCHEMES
The main results of this work are the improved lower bounds on the capacity of the bipolar-input bandlimited AWGN channel, see Proposition 1.The proposition is proved by introducing and analyzing new communication schemes.

FIGURE 3. Binary signals x(t) of type A and B. The dashed line is the impulse response of the channel filter.
We discuss four schemes, denoted by A, B, C and D. In all of them the time axis is partitioned into successive intervals of duration T equal to the Nyquist interval corresponding to B. In scheme A, the binary signal in each interval n of time t spanning (n − 0.5)T ≤ t < (n + 0.5)T is where a n s are the information-carrying variables, uniformly, independently, and identically distributed (u.i.i.d.) over [−0.5, +0.5].Thus, information is conveyed by the time of sign reversal of the signal, see Figure 2 for an illustration.
We denote the sequence of all a n by a, denote the binary transmitted signal in interval n as x n (t) and the overall transmission by x(t) or x.Scheme A changes sign twice in each Nyquist interval T, thus it's STR is 4B.
Scheme B is derived from scheme A by inverting the signal in successive intervals of length T to eliminate half of the sign transitions of the binary signal, reducing it's STR to 2B.See Figure 3.
Scheme C is derived from scheme A by inverting the signal in successive intervals at random where the signs s n valued as ±1 are used as additional information inputs.The signs s n are equi-probable and independent.The signaling in scheme C comprises a n and s n , thus the signaling rate is twice the Nyquist rate.The STR of scheme C is 3B, since the random sign reversal related to s n occurs in half of the transmission intervals.Denote by s the sequence of the sign inversions s n in schemes A, B and C, so that s n = −1 for the inverted symbols and s n = 1 otherwise.
Scheme D is derived from scheme B. It eliminates pairs of consecutive sign-transitions with exceptionally short intertransition time, which we found detrimental to our lower bound.Scheme D improves upon scheme B by introducing a minimal inter-transition interval T g = 0.2T and by extending the range in which each sign-transition time can occur.In particular, as in scheme B, each transmission interval of duration T is associated with one sign transition.However, the transition time specified in (4) as uniformly distributed over the nth transmission interval spanning (n − 0.5)T ≤ t < (n + 0.5)T in scheme B, is, in the new scheme D, distributed uniformly over a window W s (n), which starts T g after the previous sign-transition and ends, as in scheme B, at the end of the current interval.This is illustrated in Figure 4.Note that the sign transition associated with the nth transmission interval may occur in the nth interval or in one of the few intervals preceding it.The STR of scheme D is 2B, identical to scheme B.
Computing the exact capacity of the four schemes, that is, the MUI between the binary input x(t) and the channel output y(t), seems intractable.We therefore resorted to deriving upper and lower bounds.The lower bound on performance of each scheme applies to the scheme itself and it is also a lower bound on the channel capacity.Each upper bound applies only to the specific scheme while the upper bound on the capacity of the system is the upper bound on binary schemes in Table 1.
To compute upper bounds on the communication rates of schemes A, B and C, we first evaluate the PSD of the signals.We assume that the signal is randomly shifted as a whole by a delay distributed uniformly over (0, T) to render it stationary.The autocorrelations of the signals in the three schemes are derived in the Appendix and summarized in ( 5), shown at the bottom of the page.
The PSD was obtained by numerical Fourier transform of the autocorrelations, see Figure 5, and verified by simulation.The PSD is obtainable analytically from the autocorrelations.For example for scheme C with T = 1 we have the one-sided PSD: The AWGN line in Figure 5 is the PSD of the standard bandlimited capacity-achieving signal without the binary constraint.As well-known from the water-pouring theory, it spreads the available power uniformly over the available bandwidth B. The PSDs of our three schemes suffer the disadvantage of wasting some of the transmitted power out of the channel bandwidth and of not spreading the remaining power uniformly.Scheme C is evidently better than schemes A and B. Indeed the schemes A, B and C are constrained to bipolar transmitted signals and therefore cannot possess a strictly bandlimited spectrum, as we know from [8].The spectra of schemes A and B are identical except for the discrete frequency components (tones) which do not influence the outcome of (2).There are no discrete tones in scheme C since it decorrelates the pulses by random sign inversions, limiting the support of the autocorrelation to [−T, T].
Based on the PSD, we compute the upper bounds on performance at high SNR of the three schemes using (2) and compare them to the optimal input which is a Gaussian signal with power P and a flat PSD from 0 to B. The results are presented in Table 1.
We proceed to derive lower bounds on communication rates of the new schemes.As shown in Figure 1, x(t) passes through the channel filter and is then contaminated by AWGN.The receiver filters the signal by the same low pass filter, which is clearly an information-lossless operation.We sample the filtered channel output at the Nyquist rate 1/T producing an infinite sequence y of samples y n .We denote the signal without the noise component by a sequence z of samples z n , see Figure 1.
We lower-bound the capacity I(x; y) = H(y) − H(y|x) by adapting the approach presented in OWZ [6].Since H(y|x) is the known entropy of the noise, the main term to evaluate is H(y).OWZ lower-bounded H(y) as a function of the entropies of its components H(z) and H(n) using the Entropy-Power Inequality (EPI) presented in [22].OWZ evaluated H(z) using the fact that the channel was an Inter Symbol Interference (ISI) channel representable by a Toeplitz matrix the determinant of which is computable using the Szegö theorem [24].
We begin by determining the entropy of z.The required differential entropy is In schemes A and B, each a n determines one symbol x n and those symbols are linearly filtered to produce z.The sequence a, treated as a vector in the next equation, comprises u.i.i.d.components, and therefore its differential entropy is: where where τ is the largest integer smaller than τ The noiseless sampled output z is a function of a, which we denote by z = m(a).To derive h(z) using the Jacobian formula (7) similarly to [6], we need our transformation z = m(a) to be a bijection and a and z must have identical dimensions.
Lemma 1: For every ε > 0, if the channel's bandwidth is B = 1 2T + ε, where T is the signaling period, then the transformation z = m(a) in schemes A and B is a bijection.
Proof: The modulation scheme in Figure 1 is deterministic, therefore each sequence a can produce only a single sequence z.It remains to prove that there are no two distinct sequences a producing the same z.If this would happen, then there would exist a pair of transmitted signals Since the low-pass filter is linear, such a d would be the low-pass filtered signal x 1 −x 2 .By the construction of x, for schemes A and B, not C, the difference x 1 − x 2 would be a sequence of pulses as depicted in Figure 6 in which each pulse is assigned a symbol interval T during which it has a zero value except for some contiguous duration in which it is ±2 , see Figure 6.
So it is sufficient to prove that such a nonzero signal x 1 − x 2 cannot have zero spectra in 0 ≤ f ≤ B + .This follows directly from [23,Th. 1], which proved that signals with zero spectra in 0 ≤ f ≤ B+ε, which are denoted in [23] as high-pass signals or signals with a zero gap, change sign at average rates higher than 1/T = 2 B, which is the highest possible rate of sign changes of the function 6.Thus, such a nonzero x 1 − x 2 cannot exist and z = m(a) is a bijection.The asymptotically small change in B is immaterial in this work by the problem definition.
Since z = m(a) is a bijection, the entropy h z is where | ∂z i ∂a j | denotes the determinant of the Jacobian matrix of z = m(a), p(a) is the probability density function of a and E a denotes expectation with respect to a.The Jacobian matrix is denoted ( ∂z i ∂a j ) = J.Unlike OWZ [6], in our scheme A, the Jacobian matrix is not Toeplitz since here ∂z i ∂a j depends on each a j .Therefore, we could not follow OWZ using the Szegö theorem [24].Instead, we evaluated the expectation in (7) numerically by generating the signals z with random sequences a, computing J for each z and averaging h z .The Jacobian matrix J is evaluated by where t ij is the time elapsed from the time of transition a i to the sample z i .The sign is positive for transitions from 1 to −1 and negative otherwise.The computation was executed on cyclic sequences 500 and 1000 symbols long and verifying identical result in both cases.Denote The result of numerical evaluation of scheme A is h d = 0.5197 nats for T = 1 and P = 1 and is invariant with T, see ( 4), ( 6) and ( 8).The entropy ( 7) is identical in scheme A and in scheme B with its alternate sign inversions.This is because ∂z i ∂a j changes sign when s j = −1, so for scheme B we can create a new auxiliary vector â = (a 1 s 1 , . . .a i s i . ..) (9) in which ) is identical to ( ∂z i ∂a j ) in scheme A and h(â) = h(a) yielding the same h z in schemes A and B.
The true entropy of z is larger by 0.5 log(P) due to multiplication by √ P and, a has a unity support, so h a = 0, see (6).Thus, The entropy of the sampled noise at filter output is h n = 0.5 log (2π eBN 0 ).
The same analysis technique used here for the brickwall channel response is applicable to a general channel frequency response H(f ).To extend the technique to a more general H(f ), the sinc pulse used in (8) to compute the Jacobian matrix and shown in Figure 3 would be replaced by the new channel impulse response.Furthermore, H(f ) would need to be non-zero over 0 < f < B to fulfill the conditions of Lemma 1.
Next we show an improved performance in scheme C. To increase h(z), the polarity of each pulse is inverted at random.As seen in Figure 5 this also removes the wasted discrete tones from the signal spectra.The analysis above cannot be applied directly since now x(t) −→ z(t) is not a bijection as demonstrated by construction of pairs of signals x(t) the difference of which have period of T and a zero mean, thus zero PSD in the 0 to B frequency band.For scheme C the system mutual information between the modulator inputs a, s and the channel output is: I(a, s; y) = I(s; y) + I(a; y|s).
The second term on the r.h.s. is equal to schemes A and B, the signs s on which this term is conditioned are treated by the auxiliary vector â as defined in (9) for scheme B.
The first term on the r.h.s. is the improvement achieved by scheme C relative to schemes A and B. We lower-bound it as follows.Denote the sequence of derivatives of y(t) at times nT by ẏ = {ẏ n }.Now The first line is since ẏ is a function of y.The second line is by the standard mutual information decomposition [22].The third line is since s n is independent of s n−1 1 , the fourth line is since ẏi is a subset of the sequence ẏ.The last term was evaluated by simulation of scheme C while estimating the symbol-wise probability densities P(ẏ n |s n = 1), P(ẏ n |s n = −1) and P(ẏ n ) as plotted in Figure 7.It adds 0.136 bits per symbol at asymptotically high SNR which is equivalent to a power gain of γ = 1.207.Scheme C achieves γ = 0.20, moderately better than OWZ.We expect that better detectors would improve upon this lower bound.Next we evaluate lower bound on performance of scheme D. Lemma 1 holds for scheme D in which the difference signal is as in Figure 6 with the same average number of pulses except for not confining each pulse to its own T-interval.That is, in both the schemes B and D, the total number of all the negative and positive pulses in the difference signal is half of the total number of sign transitions in x 1 and x 2 .
The entropy of a in ( 6) is now calculated numerically as It is larger by 0.4095 nats than that of scheme B, contributing to the performance.Scheme D achieved the best performance among the four schemes, see Table 2.With scheme D, the achievable lower bound has an advantage of a power factor of 1.47 at all SNRs and of 0.28 bits per Nyquist interval T at high SNR over the scheme reported in [6].Comparing to Table 1, the gap between the upper and the lower bounds specific to the schemes is 0.43 and 0.438 bits per Nyquist interval for schemes A and C respectively.The gap between the upper bound in [20], entry 6 in Table 2, and the best achievable lower bound, scheme D in the table, is 0.93 bits per Nyquist interval.
The four schemes have distinct attributes.Schemes A and B serve to build up the theoretical base and they provide a lower bound on capacity valid for all SNRs.Scheme C is an extension providing an improved lower bound at high SNR and an improved spectra.Scheme D provides the best lower bound at all SNRs.
Proposition 1: The capacity loss incurred by imposing a constraint of a symmetric binary bipolar input on the AWGN frequency-flat low-pass channel with a given average input power is limited to 0.976 bits per Nyquist interval at high SNR.The equivalent power loss ratio is no less than 0.2586 at all SNR.

IV. CONCLUSION AND OUTLOOK
We studied communication systems over the band-limited AWGN channel in which the transmitter output is constrained to be binary bipolar.We presented new schemes which provide an improved lower bound on the capacity of this channel.The gap between the known upper bound and our new achievable lower bound is reduced to 0.93 bits per Nyquist interval at high SNR.Furthermore, the schemes operate at a much lower rate of sign transitions than the bipolar signaling that achieves the PAM based bounds in [6].
There is a room for future work attempting to improve the achievable lower bound.For this purpose signals with spectra more concentrated in the lower frequency regions than our scheme C, see Figure 5, should be investigated.Interestingly, the maximal power factor γ of the Random Telegraph Signal (RTS) is achieved with average transition rate of about 0.67 per Nyquist interval, less than the 1.5 average transition rate of our scheme C leading to a narrower PSD, thus a future analysis of performance of the RTS signaling might reduce the gap between the upper and lower bounds further.
The lower bound on performance presented here might be improved in future work based on techniques that consider PWM and also RTS in terms of lower bounding the filtered minimum mean square error, and incorporating the Information Estimation relations [27].Further interesting useful techniques developed for ISI channels [25], [26] should also be considered.
In this paper the signals are designed for good performance in the high SNR regime while the results for schemes A, B and D hold for all SNRs, see (11).Future work may address the non-asymtotic low and intermediate SNR region based on new schemes adapted to SNR and on advanced FTN techniques listed in the introduction for which the Shamai-Ozarow-Wyner [9] bound is of direct relevance.

APPENDIX-AUTOCORRELATIONS
Denote the autocorrelation of x(t) as where E x,t denotes expectation over x and over −T < t < T.
For scheme C we have The first parenthesis is the correlation given that t and +t are in the same symbol interval, the second parenthesis is the probability of this occurrence.The expression for cases A and B is a little more involved.For |τ | T > 1, x(t), and x(t+τ ) are independent, thus For scheme A, E[x(t)] = 1 − 2/T.It follows by a straightforward integration for scheme A: where τ is the largest integer smaller than τ .For |τ | T < 1, the expectation over t is the sum over the events in which t and τ + t are in the same symbol interval which yields (15) and of a term contributed by the events where t and τ +t fall into successive symbol intervals where (16) applies.The result is: where the sign is positive for A and negative for B. Collecting the equations above yields (5) and Figure 8.

FIGURE 1 .
FIGURE 1. Communication system with binary-valued transmitted signal.

FIGURE 2 .
FIGURE 2. Binary signals x(t) of type A. The single symbol with a variable transition time.

FIGURE 4 .
FIGURE 4. Time diagram of scheme D, the solid blue line is the transmitted signal, the dotted blue line is another possible transmitted signal.

FIGURE 5 .
FIGURE 5. Spectrum of the three schemes and of the frequency-flat AWGN.The continuous parts of the scheme A and scheme B curves overlap.

7 .
Probability density functions of signal derivatives conditioned on signs s n .

FIGURE 8 .
FIGURE 8. Autocorrelation functions for schemes A, B and C.

TABLE 2 . Comparison of different approaches.
Compare scheme D in Table 2 to the first entry in the table. Proof: