Time Delay Estimation for Passive Sonar Signal Processing

An overview of applied research in passive sonar signal processing estimation techniques for naval systems is presented. The naval problem that motivates time deiay estimation is the source state estimation problem. A discussion of this problem in terms of estimating the position and velocity of amoving acoustic source ispresented. Optimum bearing and range estimators are presented for the planar problem and related to the optimum time delay vector estimator. Suboptimum realizations are considered together with the effects of source motion and receiver positional uncertainty.

I. INTRODUCTION HE purpose of this paper is to provide a tutorial review of time delay estimation in the passivesonar signal processing field.
In the passive sonar problem of interest here, signals received at two or more receiving sensors or hydrophones are used to estimate the position and velocity of a detected acoustic source. Passive systems, unlike radar or active sonar systems, cannot control the amount of transmitted energy to be reflected off the source; however, the covertness of passive systems can be advantageous, both in military and biomedical applications. A discussion of radar and active sonar can be found in Altes [3]. In practice, the number of receiving sensors, the observation time and the ratio of the background noise to the source signal strength after propagation loss, when balanced against total system cost, dictate the feasibility of passive systems.
In the ocean, sound usually arrives at each individual omnidirectional receiver through more than one path. For example, sound may arrive through the direct and surface-reflected*paths, as shown in Fig. 1. A more complete discussion of recent research in underwater acoustics is available in the texts edited by Oppenheim [44] and Bj$rn$ [7]. To make the problem mathematically tractable from a signal processing point of view, it is convenient to decouple the problem into multipath and planar problems. For multipath signals, a simplistic model of the received signal is that each receiver sees a signal plus an attenuated and delayed signal corrupted by additive uncorrelated Manuscript received September 26, 1980;revised November 20,1980. This work summarizes research completed at the University of Connecticut, portions of which were presented at the 1976 North AtlanticTreaty Organization (NATO) Advanced Study Institute (ASI) on Underwater Acoustics and Signal Processing (UASP) in Portovenere, Italy. It also summarizes work completed since that time by the author and others working in this field. Portions of this work were also presented at IEEE EASCON '79, Washington, DC, at the 13th Annual Asilomar Conference on Circuits, Systems, and Computers, Monterey, CA, and at the 1980 NATO AS1 on UASP, Kollekolle, DenmTric.
The author is with the Naval Underwater Systems Center, New London, noise. For the planar problem (i.e., when all the receivers and the source are in the same plane), the time delay to be estimated is the travel time of the acoustic wavefront between pairs of receivers, so that the source position and velocity can be estimated. The type of signal processing to be used and the bounds on performance for such processing are important subjects of this paper. 11. THE MATHEMATICAL MODEL For the purpose of this paper, only the planar problem will be considered. In particular, it will be assumed that acoustic energy arrives at each receiver through only one propagation path in the same plane with all the receivers and source.
Pictorially, we are interested in the delay of the signal from one receiving sensor to the next, as shown in Fig. 2. bor zero relative time delay, the source bearing is broadside to the sensor pair; and for maximum time delay, the source bearing is endfire. A certain amount of caution must be exercised in applying the results directly to "real world" problems, since the ocean medium is more complex than this simple model. More sophisticated propagation modeling and associated signal processing is considered by Owsley [45]. However, considerable insight 'can be gained in dealing with the passive bearing estimation problem in a simplified decoupled fashion. A simple mathematical model for the received signals in the planar case is where the source signal s(t) and the noises nl (t) and n2 (t) are Gaussian, stationary, and mutually uncorrelated. After the signal has been detected, an important passive sonar problem is to estimate the time delay D between two sensors separated by length L . This time delay estimate then is used to estimate the bearing angle shown in Fig. 2. The bearing estimate is given by the approximate rule s = cos-1 (&L) (2) U.S. Government work not protected by U.S. copyright where in waFr; 1) C (without a frequency argument) is the speed of sound 2) B, is the bearing estimate; 3) D is the time delay estimate. It can be shown that B is the angle that the hyperbolic "line of position" makes with the axis of the receivers; hence the approximation equation (2) is increasingly accurate as the range to the acoustic source increases.

SYSTEM CONFIGURATIONS FOR TIME DELAY
ESTIMATION The critical part of the passive bearing estimation problem is the accurate estimation of time delay. Two conceptual configurations come to mind based on either 1) an intuitive approach and a familiarity with detection theory, or 2) a rigorous application of the maximum likelihood (ML) approach for white signals in white noise. In both conceptual configurations we attempt to "advance" the delayed received signal by a hypothesized amount in order to align it with the other received signal. Then we either sum, square, and average, as shown in Fig. 3, or we multiply and average as shown in Fig. 4. In both cases the hypothesized delays are adjusted in order to maximize the configuration output. As shown in the figures, both configuration outputs consist of "signal-cross-signa1" terms. Further, both configurations are ML estimators for time delay under the assumptions that the signal and noises are white and mutually uncorrelated. A further discussion of the cross correlator configuration for time delay estimation is given by Bendat and Piersol [6]. When the signal and noise spectral characteristics are nonwhite, the received waveforms must be prefiltered with particular equiphase filters (i.e., the prefilters must have the same phase characteristics) to accentuate the frequency bands with good signal-to-noise ratio (SNR). It is of interest to note that a signal detector can be realized by comparing either configuration output to a threshold. Moreover, in terms of detection (but not estimation) in the presence of noise, the system in Fig. 3 outperforms the system in Fig. 4; however, the system in Fig. 3 requires prior knowledge of power levels in order to set the proper threshold. The system in Fig. 4 has a zero mean output in the signal absent, noise present case.
The conceptual systems in Figs. 3 and 4 can be achieved in a number of different ways. The system in Fig. 3 can be instrumented as shown (for three fxed hypothesized delays) in Fig. 5. Fig. 5 also is called a time delay beam former. It is presumed that sit) is a sampled version of the original broadband time signal and that the sampling rate is large in comparison with the required Nyquist rate. In particular, the sampling   Briefly, it consists of taking the inverse fast Fourier transform (FFT) of the product of a weighting function and the estimated complex cross-power spectrum.

IV. ATTAINING THE CRAM~R-RAO LOWER BOUND BY A GCC FUNCTION
The paper on generalized cross correlators by Knapp and Carter [33] puts in perspective several suggested methods of filtering and cross correlation for time delay estimation including the often useful smoothed coherence transform (SCOT) or "rehocence" method that prewhitens and then cross correlates. (See, for example, the paper by KostiC [32] and references contained therein.) Recent extensions to that work, including simulation experiments, have been reported by Hassab and Boucher [27], and detection performance of the SCOT by Kuhn [35]. In related work Chan, Hattin, and Plant [13], using the results of Hamon and Hannan [25], give the minimum variance of time delay error in terms of the number of time samples, the number of frequency points to average, 2' and a dimensionless SNR-type term. That work is consistent with the results of Carter [8], that the Crambr-Rao lower bound (CRLB) for the variance of time delay errors for the physical problem modeled by (1) is given by where 1) OH is the highest source frequency after propagation to 2) Tis the observation time, and 3) C(w) is the magnitude-squared coherence (MSC). Equation (Sa) can be expressed in terms of SNR by observing that the MSC is defined by the receiver, Equation (4) agrees with the equation between (17) and (18) of Hannan and Thomson [26] ; following proper manipulation, (4) agrees with (17) of Hahn [23]. See Carter [8] for the specific filter characteristics for the case of more than two fdters. (In the interest of completeness, we note that there is a typographical error in the filter transfer function given by (5) in Hahn [23] .) In order to estimate the GCC function the estimated (complex) cross-power spectrum between the two received signals is multiplied by a weighting function W(f), and then the inverse discrete Fourier transform (DFT) is computed via digital signal processing techniques such as the FFT. (As noted by Knapp and Carter [33] ,since the variance of the phase error is inversely proportional to MSC over one minus MSC, the GCC function with the ML weighting of (4) attaches most weight to the estimatedphase when the variance of the estimated phase error is lowest. The work by Cleveland and Parzen [15] is one of the early references to using phase to determine time delay.) Another reference to using phase is [ 131 . A fourth item that must be considered in practice when using (3) is that often W ( f ) is a function of unknown spectral quantities that must be estimated; in this case, the process of time delay estimation involves estimating the complex cross-power spectrum and the coherence function. The derivation of (3) presumes the spectral quantities needed for W ( f ) are known. Finally (3) presumes Gaussian received signals. When the source radiates a sinusoidal signal, a tighter Barankin bound (poorer performance) occurs at low SNR. A more complete discussion is given in this issue by Chow and Schultheiss [ 141 .
(3b) V. STANDARD DEVIATION OF BEARING ERROR The trigonometric relationship between time delay and bear-The MSC in (3b) for the model in (1) gives ing yields the result (see, e.g., Tacconi [60] ) that the standard G,2,(w) deviation of the bearing error in radians is given by (3c) where where G(w) is the auto-or cross-power spectrum at frequency w of the subscripted random process(es). Equation (3a) agrees with estimated variances for selected cases of a digital simulation by Scarbrough, Ahmed, and Carter [50] . Utilizing (3c), (3a) is plotted as a function of center frequency and SNR in decibels (10 log,, of power) by Quazi [49] for the case of SNR independent of fr,equency in the signal band.
Several clarifying facts should be pointed out about (3). First, it is dimensionally correct. Second, in the underwater acoustics problem it is an implicit function of range; this is because signal power at the receiver is a function of propagation loss (conceptually owing to effects like spreading loss), which itself is a (or can be an extremely complicated) function of range. Third, the CRLB given by (3a) presumes that the signal and noise power spectral densities are known and that prior to cross correlation the received waveforms r1 (t) and r2 (t) in (1) are fdtered by filters with transfer functionsHl(j') and H z ( f ) , respectively; the filters both have the same phase and the characteristic that the product of Hl (0 with the complex conjugation of Hz (0 forms the real weighting function given by 1) B is the bearing in radians; 2) L is the distance between the two sensors (or subarrays); and 3) C (without a frequency argument) is the speed of sound in water.
This result does not take into account uncertainty in sensor position which increases the variance by an additive amount and will be discussed later. (Note that (5) is dimensionally correct.) Substituting (3) into (5) gives the two-hydrophone result of MacDonald and Schultheiss [37]. Equations (3) and (5) show the effects of array length, source bearing, observation time, and coherence on bearing accuracy. It is noteworthy that when the pairwise sensor coherence is low we can still theoretically attain good performance if the source bandwidth is large or if we cluster a number of sensors together and beam form to increase the observed (or output) SNR. Extensions to (5) for endfire approximations are given by Hinich [29].
It should be noted that these results [ (3) and (5) (4) of one member function of a nominally broad-band Gaussian random process observed in a nondispersive medium with sensors at known positions and with known SNR. Thus, in practical applications, these results will serve as a bound on performance and not necessarily an absolute indicator of performance. Work with nonGaussian signals is discussed by Hilliard and Pinkos [28]. When the noises are correlated, the results are more complicated; this problem has been treated by Howell [30] and Bangs [4] . For dispersive medium effects, see Kirlin [3 11 .
Correlator configurations that both detect and estimate bearing for narrow-band signals are considered by Miller [40] . The problem of processes with random relative phase is treated by Wax [63].
The problem of estimating time delay between two received signals in noise is closely related to the work by Hahn [20] and Hahn and Tretter [22]. In that work, the problem is to estimate the M -1 components of the relative time delay parameter vector (D2 -D l , D3 -D l , . . . , DM -D l ) from the M received waveforms, Other work related to the reduction in painvise time delay error due to additional sensors has been reported by Schultheiss [54] and Bj$rn$ [ 7 ] . For passive localization in a plane, three unique sensors (or subarray beam former outputs) are required. In general, we define the (relative) time delay as Dij=Di-Dj, i , j = 1 , 2 , 3 . 2) Le is the effective half array length (i.e., the sine of the 3) C is the speed of sound in water; and 4) u2 ( ) is the variance of the error in time delay difference measurements.
When comparing (9) with other work, it is noteworthy that by rewriting that equation in terms of the total array length, it apparently becomes 16 times larger. Using Carter [9] we readily can show that for the optimum range and bearing sonar signal processor the variance of the range error is given by bearing times the interarray separation); where the bearing errors are in radians. Using ( 5 ) to relate time delay errors to bearing errors and observing that for an equispaced three-sensor array we have twice the array length and 1.5 times as many sensors as used to obtain (5), we postulate that the variance of bearing errors can be approximated by where u2 ( ) is the variance of the error when estimating the time delay from sensor 3 to 1. We note that (1 1) depends on the effective array length but not on the range to the source R . Substituting (1 1) into (10) yields The variance of the error in estimating the difference of time delays (used in the denominator of (8), the ranging equation) is approximately twice as great as the variance of the error in estimating D 3 ] . (With regard to the approximation see Hahn [23] for an evaluation of the covariance of the errors in estimating DZl and the errors in estimating D3] .) Therefore, we conclude that (12) and (9) agree. Comparing (12) with (11) note that the range error variance and bearing error variance both depend linearly on time delay error variance; however, unlike bearing errors ranging errors are high& dependent on the ratio of the true source range to the length of the effective baseline. Indeed, in passive ranging, it is the fourth power of the range relative to the effective array length that is important to the range error variance. This mathematical result helps develop physical insight into the difficulties of estimating the position of an acoustic source. Specifically, we see that inherent in the physics of the passive ranging problem is the need to have long baselines, extremely accurate time delay estimates, or small (true) range from the receiving array to the acoustic source, or all three. In addition, we point out here that owing to propagation loss the SNR or coherence at the receivers is a function of range from the source to the receivers. Hence, for certain underwater acoustic applications, (12) is a stronger function of range than might otherwise be explicitly apparent.
An empirical method for estimating the variance necessary in (12) was found by Schneider [52] of ENSCO, Inc., who .observed from (7) that That is, in the absence of estimation noise independent of source and receiver geometries, the time it takes an acoustic wave to travel from sensor 3 to 2 plus the travel time from sensor 2 to 1 should equal the travel time from sensor 3 to 1. Hence, a useful internal consistency check for the estimated time delay parameters is Dzl + D32 -DS1. The mean-square value of this three-term sum represents a practical guide to performance bounds without consideration. for source and receiver motion. Schneider [52] and Beam and Carter [5] have found this guide useful for actual sea data. An extension to ageneralized broadband Doppler or relative time compression (RTC) internal consistency check for moving sources has also been done by VII. PERFORMANCE BOUNDS FOUND BY HYPOTHETICAL ARRAYS In the work of MacDonald and Schultheiss [37] the optimum processor for bearing estimation was derived assuming planewaves, Gaussian signals and uncorrelated Gaussian noises of equal strength. The performance of such a system was found together with a bound obtained by considering a hypothetical system in which half the elements are placed at each end of the available aperture. (In practice, though, we postulate that we would require half-wavelength spacing at the design frequency for each subarray; see, e.g., Fig. 8.) An extension (in concert with Pryor [48] ) to unequal SNR was reported by Carter [ 101 .
These results showed that at low unequal SNR the available hydrophones should be divided into two equal groups. Thus we see that passive sonar systems designed to be optimum bearing estimators for equal SNR are also optimum for unequal SNR in the important case when the SNR is low.
When the acoustic source is sufficiently close to the center of the receiving array relative to the length of the array, then, unlike Fig. 8, the received wavefronts appear curved as shown . i n Fig. 9. A bound on the best ranging performance can be calculated by distributing a quarter of the receiving elements at each end and the remaining half of the elements at the middle of the array aperture (see Carter [9]). We emphasize that the bound is good even though the array configuration is theoretical in nature. These results hold for known sensor positions, and extensions are underway to the moving sensor. In particular, Schultheiss and Ianniello [56] consider an ensemble of statically deformed arrays and correctly infer that an equispaced line array might in some situations be preferable to three clustered subarrays. Work on a moving array is in progress. When the source uncertainty area is to be minimized and sensor positions are known, a bound on performance is calculated by a sensor distribution that places one third of the available sensing hydrophones at each end and the middle of the available aperture [9]. Of course, if sensor positions are not known exactly, performance is also expected to be worse than the performance bound found when sensor position is known.

VIII. CONFIGURATION FOR OPTIMUM POSITION
ESTIMATION Configurations for the optimum processor for range and bearing estimation are discussed by Hahn [23] and Carter [9]. One configuration presented by Carter [9] is shown in Fig. 10. We refer to this configuration as a focused beam former because it constrains the hypothesized time delays to focus the acoustic energy from a hypothesized position. It effectively presumes that the receiving sensor positions are known. Unlike the focused beam former, the ML estimate for the time delay vector maximizes the output power J of this network and does not constrain the power to be focused at one hypothesized range R and bearing B. A second realization is to form all possible GCC pairs and combine the delay estimates from each pair according to the rules of Hahn [23]. The input signals are filtered with particular signal-enhancing filters, all with the same phase (e.g., symmetric digital finite impulse response (FIR) filters, all with the same bulk or group time delay); and the delayed outputs are summed, squared, and averaged. The particular filter transfer functions needed to yield the best performance are given by Carter [8]. The ML range and bearing estimates are achieved by adjusting hypothesized range and bearing parameters that, in turn, give rise (with knowledge of the receiving hydrophone positions) to the appropriate time delays to be inserted after filtering. When the total system output power J is maximized, the selected hypothesized range and bearing estimates are the ML source position estimates. In the limit of large observation time these ML estimators are the minimum variance source-position estimators. For systems with the available elements grouped into three subarrays, a suboptimum realization is shown in Fig. 1 1 ; it can be achieved at lower cost and is postulated to have performance similar to that of the optimum system in Fig. 10. In the Fig.  11 suboptimum system we first beam form each of the three subarrays. Subsequently, we perform a GCC on the forward and middle and the middle and aft beam former outputs and read off the correlator abscissa showing the peak. Many commercially available correlators perform similar functions. We also can form a GCC on the forward and after beam former output and form the internal consistency check. For purposes of analysis, we model the GCC as prefilters, a variable delay, and T-second integrator. Then the delay estimates found by the Fig. 11 realization are inserted in the ranging equation (8) to obtain a range estimate.

IX. OTHER LIMITATIONS TO PERFORMANCE
In practice, both the optimum and suboptimum realizations are degraded by realistic factors. A most significant factor can be that signal processing implementations derived from treating the planar problem may not suit even the simple multipath model shown earlier in Fig. 1.
These and other significant environmental factors, such as the sound speed profile, the bottom depth, slope, and layering together with bottom loss, often need to be considered. Limitations due to source motion have been studied and reported by Gerlach [ 181 . These types of studies provide information on the fundamental bounds on performance.
Two other potentially significant effects on passive sonar localization are uncertainty in sensor position and failure to compensate for broad-band Doppler or RTC. The first cause of error to be discussed is depicted in Fig. 12 and has been studied and reported by Carter [l I ] . As mentioned earlier, additional work has been conducted and is continuing by Ianniello and Schultheiss. In bearing estimation two receivers of known position are needed. Consider the two left receivers indicated by squares for the actual positions. If the receiver were perturbed by an amount P (small in comparison with the sensor separation L ) from the position of the square in Fig. 12 to the solid dot position, then the bearing estimate would be in error by approximately an amount of P/L radians. Of course, if P were a random perturbation, the variance of the bearing error could be reduced by averaging a series of independent bearing measurements over a very large observation time. The total bearing error then consists of the sum of two terms, one having to do with SNR and the other having to do with array perturbation. Fundamentally, the relative ranging errors due to unknown random sensor perturbation P about some known mean P are approximately given by where 1) u(P -P) is the standard deviation of the middle sensor 2) R is the range to the source; 3) Le is the effective half array length; and 4) L is half the total array length. Thus, the ratio of the perturbations relative to the array length and the range relative to the effective array length are the dominant contributors to this type of relative ranging error. We note that array perturbations can be viewed as time delay errors with a simple trigonometric mapping for small perturbations. Not unexpectedly then, the form of (14) is consistent with (9). perturbation from a line connecting the two end sensors; X. VELOCITY ESTIMATION Another important type of degradation is caused by failure to compensate for the "deterministic" part of source motion. A simple but useful model of received signals is discussed by Abraham and Carter [l] . A more complete discussion of the model and the ML estimator for the fundamental motion parameter, tantamount to time delay rate or bearing rate, is given by Knapp and Carter [34]. Theoretically though, a suboptimum realization is depicted in Fig. 13. Work in this area has been done at the Naval Ocean Systems Center (NOSC) by Stradling [58] ,McCarthy [38] ,Mohnkern [41], andTrueblood [61]. Other models include one discussed by Adams, Kuhn, and Whyland [2] , Schultheiss [53] , Schultheiss and Weinstein [55] , and Stein [57]. They treat the moving source problem by considering a truncated Taylor series expansion. Such models physically include time delay, time delay rate, and time delay acceleration. Related work includes an extended MSC estimator, similar to an algorithm by Trueblood [61], that was reported by Patzewitsch, Srinath, and Black [46] . Other related work on tracking of moving time delays for velocity estimation has been reported by Meyr [39] and Lindsey and Meyr [36]. Work in this area, including sensing systems for determining train speed, is being continued by Meyr and his students. The work by Moura [43] and Moura and Baggeroer [42] also is concerned with passive velocity estimation. For many practical A BEAMFORMER Fig. 13. Theoretical procedure for range, bearing, course and speed estimation.
geometries when the source range is long compared with the receiver baseline and the SNR is large so that T can be kept small, the estimation of RTC or generalized Doppler is not necessary in order to estimate time delay. For other geometries, RTC or time delay rate estimation is necessary. Even when it is not necessary, RTC estimation can provide rapid early indications of source velocity.
It is analytically useful in this estimation procedure to decouple velocity into two orthogonal components: bearing rate or cross line-ofsight speed (CLSS) and range rate or radial lineofsight speed (RLSS). In particular, the CLSS is defined as the projection of the velocity vector on the perpendicular to the bearing line, and the RLSS is the projection on the bearing line. The problem is to determine the accuracy with which source velocity can be determined. Theoretical considerations indicate that CLSS,is much easier to estimate than RLSS. The RTC can be estimated using Knapp and Carter [34] ; see also Stein [57]. Bearing can be estimated using MacDonald and Schultheiss [37], and range can be estimated by means of the triangulation methods of Hall and Hayford [24] and Hilliard and Pinkos [28].
From these results Carter 1121 has shown the variance in CLSS errors is dominated by range relative to the baseline projection and the variance of the RTC estimation error.
Further, the standard deviation of the CLSS errors is predominantly a function only of the range relative to the effective baseline steered at the source and the standard deviation of the RTC errors. In most practical ways, it is not a function of the course or speed of the source or errors in time delay. However, when the coherence or SNR is low and the observation time T is short, errors in both time delay and RTC or CLSS will be large. Theriault and Berkman [59] have derived an approximate expression for the standard deviation of the errors in the radial component of velocity.
RLSS errors, like CLSS errors, are not predominantly a function of the course or speed of the source. However, unlike CLSS errors, RLSS errors depend on the square of the range relative to the effective baseline.
It was corroborated by computer simulation that the CLSS is much easier to estimate than the RLSS. Also, doubling the range-to-effective-baseline ratio doubled the CLSS errors and quadrupled the RLSS errors. Further, the errors in CLSS and RLSS did not appear to depend on either time delay errors explicitly or source velocity.
XI. SUMMARY Time delay estimation is an important part of passive sonar signal processing. An overview of applied research in passive 469 sonar signal processing estimation techniques has been presented. One problem that motivates time delay estimation is the estimation of the position and velocity of a moving acoustic source. A discussion of this problem in terms of estimating time delay has been presented. In order to develop an understanding of the signal processing required, an approach of decoupling the problem into multipath and planar components was followed. Optimum estimators for acoustic source position were presented for the planar problem and related to the optimum time delay vector estimator with a focusing constraint. In particular, the focused beam former with appropriate prefilters is, in some sense, the best range and bearing position estimator.