Fast Orthogonalization Networks

William S. Hodgkiss, Jr. (S'69-M'75) was born in Bellefornt, PA, on August 20, 1950. He received the B.S.E.E.  degree  from Bucknell  University, Lewisburg, PA, in 1972, and the M.S. and Ph.D. degrees from Duke University. Durham, NC, in 1973 and 1975, respectively. From 1975 to 1977 he worked with the Naval Ocean Systems Center, San Diego, CA. From 1977 to 1978 he was a faculty member in the Electrical Engineering Department, Bucknell University, Lewisburg, PA. Since 1978 he has been a member of the faculty of the Scripps Institution of Oceanography and on the staff of the Marine Physical Laboratory, University of California, San Diego. His present research  interests are in the areas of adaptive  digital  signal  processing, adaptive array processing, application of these to underwater acoustics, and the statistical properties of ambient ocean noise. Dimitri Alexandrou ("8s) was born in Salonica, Greece, on August 23, 1954. He received the B.S. and M.S. degrees in electrical engineering from the University of Illinois, Champaign-Urbana, IL, in 1976 and 1978, respectively, and the Ph.D. degree in  electrical engineringhean science  from the University of California, San Diego, in 1985. From 1978 to 1980 he was a member of the Technical Staff at Bell  Telephone  Laboratories, Naperville, IL. His research interests include detection and estimation theory, digital signal processing, adaptive fdtering, underwater acoustics and the statistical properties of oceanic reverberation. He is a member of the Acoustical Society of America and the American Geophysical Union.


T I. INTRODUCTION
HE DIRECT ADAPTIVE filtering of multiple input channels by Gram-Schmidt (GS) orthogonalization has been the subject of intense research during the past decade [ 11- [5].The Gram-Schmidt technique (sometimes called the adaptive lattice filter) has been shown to yield superior performance simultaneously in arithmetic efficiency, stability, and convergence times over other = adaptive algorithms.Arithmetic efficiency has been especially demonstrated in the filtering of stationary covariance sequences [4], [SI.The stability of the algorithm is enhanced because it does not require the calculation of an inverse covariance matrix as does the sample matrix inversion (SMI) algorithm of Reed, Mallet, and Brennan [6].An overview of adaptive lattice filters and a large bibliography on this subject are contained in [5].
Most of the research on adaptive lattice filters has concentrated on the processing of stationary covariance sequences especially in the area of discrete-time linear prediction systems [4], [ 5 ] .This implies that if rjj is the i, j element to of the channel-to-channel covariance matrix, then rjj = rj-j.Hence, this covariance matrix has the Toeplitz form.In this paper, we consider the efficient processing of channels that are not necessarily stationary with respect to one another; however, the input data on a given channel is assumed stationary with respect to other data in that channel.The algorithm developed below is a multichannel adaptive lattice filter that is structured for arithmetic efficiency in addition to retaining the good stability and fast convergence properties of orthogonalization networks.The structuring for the fast orthogonalization network (FON) presented below is given in [3] and [4] for a small number of input channels.In this paper, we have generalized the structuring technique for an arbitrary number of input channels.
Further applications of the FON technique are discussed in [7].In this paper it is shown how to transform a multiple steering vector adaptive filtering problem where the adaptive weights are solved for using the SMI algorithm into a multiple channel adaptive lattice filter which used a FON.This technique is further extended to handle the multiple channel adaptive radar Doppler processing problem.

II. GRAM-SCHMIDT DECOMPOSEION
The Gram-Schmidt decomposition technique is well documented in the literature [1]- [5], so we give only a brief review.GS decomposition is illustrated in Fig. 1.The DP functional block stands for a two-input decorrelation processor building block [3].The output from a given block is statistically decorrelated from the right side input to this block by properly weighting this input and subtracting the weighted input from the left side input.The inputs X , , X,,e , X , U.S. Government work not protected by U.S. copyright.represent N input channels of statistical data that may be correlated from input channel to channel.GS decomposition decorrelates the inputs one at a time from the other inputs by using the basic two-input DP.For example as seen in Fig. 1, in the first level of decomposition, X , is decorrelated with X , , X,, .* , X,v-,.Next, the output channel that results from decorrelating X , with X,v_ I is decorrelated from the other outputs of the first level of DP's.The decomposition proceeds as seen in Fig. 1 until a final output channel is generated.This output channel is totally decorrelated with the input: X,, X , , ., X!v.Note that the GS decomposition is not unique; i.e., the order in which X,, X 3 , . .-, X,v are decorrelated from X , is arbitrary.
For N channels, the total number of DP's needed for GS decomposition is 0.5N(N -1); hence, this number of decorrelation weights must be computed.For a digital implementation, these weights are determined sequentially: the first level weights are estimated; then their output data are calculated.These output data are used as inputs to the second level from which the second level weights can be calculated.The output data of the second level are geilerated by using these weights and the second level input data.The process continues until the ( N -1)th level weight and outputs are calculated.
Let us consider the problem of decorrelating each input channel with all the other input channels; i.e. there will be N GS decompositions.For a complete N-channel GS decomposition (or multiple channel lattice filter), if there were no logic behind the structuring of the decomposition, the number of weights would be equal to 0.5W(N -1).In the following section, we develop an algorithm that requires approximately 1.5N(N -1) weights for the same,decorrelation process.
For notational purposes, we define the channel input appearing on the right side of the basic two-input DP (Fig. 1) as the input whose, output of the building block circuit is decorrelated with the channel appearing on the left side.For the multiple channel case, all inputs appearing to the right of the far left input are decorrelated from the output associated with this input.

EFFICIENT ORTHOGONALIZATION STRUCTURES
In this section, we present a methodology of configuring the two input DP's to synthesize the N-channel orthogonalization 459 network so that numerical efficiency is achieved.We therefore introduce the following notation.A single output channel decorrelator is represented as Chl=[X,, X,, X3, ' 9 XNI where X,, X,, * e , XN are decorrelated from X , and X N is decorrelated first, X*,-, is decorrelated second, and so on.Fig. 1 showed the structure of this decorrelator.The channel variable Ch, references this structure to channel 1 or the XI channel.The X,, n = 1, 2, * a , N are called the elements of the structure.
Numerical efficiency of the algorithm to be presented is achieved by taking advantage of redundancies that can occur for two different decorrelator structures.For example, let there be eight channels.Channels 1 and 4 can be generated as follows: Chl=[XI, X22 X 3 9 X49 x 5 9 x 6 3 Xi, X 8 1 3 ch,=[X4, x 3 9 x 2 3 XI, x 5 9 X69 x,, XSi.
Note that Chl and Ch, have the same four input channels at the far right.In the actual implementation, the substructure associated with these four rightmost channels can be shared by Chl and C h as illustrated in Fig. 2. In fact, anytime two channels have exactly the same far-right channels as indicated by the decorrelator structure, the substructure associated with these far-right elements can be shared in the implementation process.Let N = 2".In general we can configure 2,-l output channels to have the common substructure of 2,-' input channel elements, 2,-' output channels to have the common substructure of 2',-, input channel elements, and so on.
The following algorithm sequentially generates structures that can be implemented in a numerically efficient manner.
Generate a structure that is the inverted order of root structure: [X,,, * -, X , ] .
Generate two structures from the preceding two structures that have the first 2"-' elements of the preceding structures in inverted order.All other elements remain the same.Generate four structures from the preceding four structures that have the first 2"-2 elements of the preceding structures in inverted order.All other elements remain the same.
. . .eIements of the preceding structures in inverted order.All other elements remain the same.
For example, if N = 23 (where m = 3), the following structures would be generated sequentially by using the above procedure: recursively the weights as new data samples arrive or 2) to calculate the weights as a function of a block of N, data samples in each of the N channels.A comparison of two implementations is made in [7] by using the FON technique versus the sample matrix inversion.This comparison showed that the two techniques are approximately equal in numerical efficiency.Again, however, the FON technique has numerical stability advantages over the SMI technique.
If 2m-1 < N < 2", then the structuring for multiple channel orthogonalization can be derived by first generating all the structures for 2" inputs and then deleting the X,, n > 2" -N, elements from these structures.Hence, a FON can be constructed for an arbitrary number of inputs.

IV. SOFTWARE ALGORITHM
A software algorithm called the FON algorithm has been devised that generates the 2" decorrelator output channels from the 2m input channels by using the common substructures of the various channels as described in Section III (note that N = 2").Let each of the 2" input channels have N, sample points.Thus in the nth channel, X(l, n), X ( 2 , n), --., X(N,, n ) are observed.Note, the algorithm to be presented uses block processing of the input data.
The algorithm requires at various points to reduce 2k input channels to 2k-' output channel through a partial orthogonization, i.e., the Note that channels 1, 2 , 3, 4 have the substructure associated with X,, x,, X7, X , and that channels 5 , 6, 7, 8 have the substructure associated with X,, X,, X,, X I .Also note that Ch, and Ch2 have the same six-element substructure as do the channel pairs: (Ch,, Ch3), (Ch,, Ch,), and (Ch5, Ch6).Fig. 3 illustrates complete realization of the eight output channels.Each two input DP's of the FON as depicted in Fig. 3 has a weight associated with it.The number of DP's or weights associated with a FON can be found by considering the number of DP's at each level of the network.From Fig. 3, we see that the number of levels equals N -1.The total number of DP's NDp needed for a FON is derived by adding the number of DP's at each level.It can be shown [7] that The final outputs are contained in T/(2k-' ) ( I , J).Note that the asterisk indicates the complex conjugate.
The FON algorithm also requires that the input channels, (1) XI , X,, * * , X , be COmmuLted SO that the final output channels of the FON are properly aligned.(This problem also The total number of operations associated with a FON occurs with the fast Fourier transform (FFT) algorithm, but a depends on whether the algorithm is implemented 1) to update commutation algorithm matches the proper output channel with the input channel.)By properly aligned, we mean that the kth output channel of the FON algorithm is decorrelated with input channels: 1, 2 , -.., k -1 , k + 1, --., N. The following algorithm called the commutated indexing algorithm computes the commutated indices and stores them in an N element array called INDEX.After defining these preliminary algorithms, we now give the complete FON algorithm.

N S
Calculate the commutated indices by using the Commutated Indexing algorithm.

Calculate v(z~-')(z, J);
a , N, by using the kth order partial orthogonalization algorithm.-"(I, J 3 shows that there are exactly N -1 levels of twoinput DP's associated with a FON, and [7] showed that the maximum number of two-input DP's per level is 2 N -2 and the minimum number is N. Let L k equal the number of DP's on the kth level.For the block processing algorithm, we see that as the data (all NsLk-data points) are processed through the kth level, the input data may be discarded and the output data (NsLk data points) become the new input data set.Hence 2 N -2 parallel two-input DP's could be configured as seen in Fig.

4.
We allow these 2 N -2 DP's to simultaneously perform all of the two-input DP's at a given level as it was shown in Fig. 3.However, this requires a routing algorithm whose function it is to see that the L k -input data channels are inputted into the proper DP's.We do not describe the routing algorithm in this paper.From Fig. 4, we see that the output data are stored back into the input data memory bank.After sequencing N -1 times through the DP's, the algorithm is finished.
It is shown in [7] that the processing time through a DP is proportional to the number of data points per input channel Ns.Since the bank of 2 N -2 DP's is used N -1 times, the total processing time T, by using a parallel arithmetic architecture (Fig. 4), is approximately proportional to the number of input channels and the number of data points per channel; that is, T -NN,.Hence, parallel processing can significantly decrease the processing time.This reduction occurs because of the inherent structure of the FON.In addition, the number of parallel two-input DP's required is 2N -2 which reduces the hardware requirements of a fully implemented FON.
Manuscript received February 11, 1985; revised August 15, 1985.The author is with the Naval Research Laboratory, Washington, DC 20375.IEEE Log Number 8406820.