An optimal modification of a Kalman filter for time scales

The Jones-Tryon Kalman filter, which was implemented in the time scale algorithm TA(NIST), produces time scales with poor short-term stability. A simple reduction of the error covariance matrix allows the filter to produce time scales with good stability at all averaging times, as verified by simulations of clock ensembles.


I. INTRODUCTION
The purpose of a time scale is to form a virtual clock from an ensemble of physical clocks that are compared with one another at a sequence of dates, where a date is the reading of a clock as determined by counting its oscillations. The virtual clock is defined as an offset from one of the physical clocks, the offset being computed from the comparison measurements by some algorithm. The usual goal of the algorithm design is to produce a virtual clock that is more stable than any of the physical clocks in both the short term and the long term, as measured by some frequency stability measure such as Allan deviation.
A straightforward Kalman filter approach to this problem has been tried at least twice [1][2][3]. The noise of each clock is modeled as a sum of independent white FM and random walk FM noises with known noise levels. The entire ensemble is modeled by a linear stochastic difference equation, whose state vector is estimated by a Kalman filter from the clock comparisons, which are assumed to be noiseless. Under this assumption, if each clock's tick is offset by its Kalman phase estimate, we arrive at a single point on the time axis, called the common corrected clocks. It makes sense to regard this value as the estimated center of the ensemble, and to use the sequence of these values as a time scale, called the raw Kalman scale. This time scale, realized as TA(NIST), was reported to follow the clock with the best long-term stability, regardless of its short-term stability [4,5]. The goals of the present study have been to reproduce this result by simulation, to understand it, and to find a better way to use this Kalman filter in a time scale algorithm.
We shall exhibit two improved algorithms. The first, called the Kalman-plus-weights (KPW) scale, ignores the Kalman phase estimates and uses the frequency state estimates in a weighted-average time scale. The second algorithm, called the reduced Kalman scale, uses the corrected clocks, but with a reduced version of the Kalman error covariance matrix. This scale turns out to be optimal in a sense that is explained in Section VII. When applied to ensembles of dissimilar clocks, both improved scales are observed to combine the best short-term behavior with the best long-term behavior [6][7][8]. These results also hold for clocks with random run FM noise (random walk of drift) in addition to the other noises [7,8].
The measurement dates may be unequally spaced, i.e., 0 τ may depend on t. The Kalman filter is a set of equations that propagate an unbiased minimal-variance state estimate from one measurement date to the next [9][10][11]. We want to obtain the estimate ( ) X t and its error covariance matrix . First we perform the temporal update (prediction) equations: After making the noiseless measurement ( ) HX t we perform the measurement update equations: thus completing the iteration. The ( ) The Kalman filter must be started with some initial X and P ; we shall say more about this in Section V.
For the sake of numerical stability over many iterations, it is better to propagate a Cholesky factor of the covariance matrix; this is an upper triangular matrix There are several "square root filter" algorithms, equivalent to (4) and (5) in infinite-precision arithmetic, that propagate ( ) X t and ( ) C t [10,11]. The author has used one of these methods for all simulations with no trouble.

III. RAW KALMAN SCALE
From (5) we find that Thus, for noiseless measurements, the estimated state satisfies the measurement equations. For the clock ensemble, It follows that the quantities corrected clocks; for convenience we shall use this term for the quantities called the raw Kalman scale. The squares in Fig. 1 show the Allan deviation of the raw Kalman scale for the simulated clocks. After a short burn-in, the scale ignores the masers and becomes the mean of the two ion trap phases. Although its long-term stability is as good as it can be, it does not take advantage of the quieter clocks to improve its short-term stability.
which gives the ensemble offsets x t x t x t = − sequentially from the measurements ( ) ( ) ( ) The presence of Kalman frequency state estimates in (9) is unconventional; the frequency estimates for weighted-average scales are usually obtained by estimating the frequency departure of each clock from the scale itself as it evolves.
Here, we feed no information from the time scale back to the Kalman filter, whose only job at this point is to deliver frequency state estimates from the clock models and measurements.
To select the weights we observe that the ith bracketed term of (9) is the 0 τ -increment of a detrended version of the phase of clock i. Because all these detrended phases have good long-term stability, we assign the weights to optimize their combined short-term stability. By (2) and (9), For now, assuming that The open diamonds in Fig. 1 show the measured Allan deviation of the KPW scale for our four-clock simulation. In the short term it has the stability of the average of the two masers; in the long term it has the stability of one ion trap. But for a lucky discovery, the story would have ended here.

V. COVARIANCE X-REDUCTION
By an x-row or x-column of the Kalman covariance matrix ( ) P t we mean a row or column that corresponds to a phase component of the state vector. According to the third equation in (6), all the x-rows (and hence all the x-columns) of ( ) P t are the same vector after a measurement update; let us call it the x-vector. This is another way of saying that we have perfect knowledge of the clocks with respect to each other, that is, the sets an unknown rigid translation. All the phase error variances (the x-components of the x-vector) are the same. When we run the Kalman filter, we observe that this common error variance diverges fast; the filter seems to know that it is doing a poor job on the phases. On the other hand, the submatrix y P of the frequency error covariances is empirically well behaved, though a slower divergence is not ruled out.
Suppose that at date t we declare arbitrarily that the ( ) i x t are the correct phases. Then their error variances become zero, and hence the x-vector becomes zero. Covariance xreduction consists in setting all x-rows and x-columns of ( ) P t to zero. For two clocks, the operation looks like this: In other words, we may perform the x-reduction at any date without affecting any future frequency state estimates. A proof [8] is carried out by mapping the original X model to one with the state vector By following the Kalman filter equations (4) and (5), one shows that a temporal or measurement update of the X model, with and without x-reduction, maps to the same update of the Y model, whose error covariance is the x-reduced P.
The author first used this result for initializing the Kalman filter as follows: we run the covariance iteration without any measurement data until the y P submatrix settles down, then use the x-reduced P as the initial error covariance; it is reasonable to treat the initial clock phases as known. For the simulations, initial frequency state errors î i y y − were generated as Gaussian random variables with covariance y P .
Another use of x-reduction is to control the growth of ( ) P t , if only to improve the numerical stability of the Kalman algorithm. Brown [3] proposed a more sophisticated method, called transparent variations of covariance, for reducing ( ) P t in a way that preserves future estimates of the entire state vector. When running the KPW scale, one can use the simpler method of x-reduction because only the frequency state estimates matter.

VI. REDUCED KALMAN SCALE
X-reduction is not transparent to the phase estimates. The author tried it in a simulation, performing the x-reduction at each date. To his surprise, the time scale defined by the common corrected clocks (8) no longer resembled the the raw Kalman scale; it became at least as stable as the KPW scale. To repeat, one carries out the Kalman filter as before, but performs covariance x-reduction after each measurement update. Then the common corrected clocks define a new time scale, called the reduced Kalman scale. The circles in Fig. 1 show the measured Allan deviation of the reduced scale for our simulation; it is a slight improvement over KPW over the whole range of averaging times. Other simulations gave similar results. The next section tries to explain this empirical finding.

VII. THE IMPLICIT WEIGHTS
Weiss, Allan, and Peppler [4] observed that the scale defined by the common corrected clocks is actually a weighted-average scale, with weights that come directly from the Kalman gain matrix. Let us see how this works in the present setting. The first ( 1 x ) component of the equation for ( ) X t in (5) can be written as connecting the phase of clock 1 to the 1 n − measurements. The implicit weights are defined by These weights add to 1, but it is possible for some of them to be negative. The same weights can be obtained from any of the x-rows of ( ) From (4) and (8), When this is substituted into (13), the BTSE (9) appears. We state this finding as a theorem. THEOREM 2. No matter how we obtain the state estimate X and P at the previous measurement date The BTSE from the raw scale ignores the H masers entirely.
The weights for the reduced scale are a little more balanced than the KPW weights, while the stability of the reduced scale is hardly better than the KPW stability.
These observations suggest that the KPW and reduced scales are close to an optimal situation of some kind. By Theorems 1 and 2, both scales are weighted-average scales based on the same Kalman frequency state estimates. Our main result says that the reduced scale itself is optimal in the following sense. THEOREM 3. Among all weighted-average time scales that use the Kalman frequency state estimates, the reduced Kalman scale has the implicit weights that minimize the variance of the scale increment between successive measurement dates.
This result is reasonable: by x-reducing the previous covariance matrix, we declare that the previous phase estimates are perfect; thus, when the Kalman filter optimizes the current phase estimates, it also optimizes the estimates of the phase increments.
A proof based on the Kalman equations has been given [8]; we try to motivate it here. The optimal weights can be obtained by minimizing the variance of the expression (10)  (and zeros) if it is x-reduced. One shows that the vector of implicit weights obtained from ( ) K t satisfies the optimality equation that comes from (10). Thus, there is no need to solve explicitly for the weights; they come free with the reduced Kalman scale algorithm.

VIII. FINAL REMARKS
As we saw, the raw Kalman scale tends to adhere to an average of the clocks that are best in the long term, regardless of their short-term behavior. According to Weiss and Weissert [5], this happens because the Kalman filter is an accuracy algorithm; among other things it is trying to minimize the accumulated phase estimation error over time Their insight appears to be correct. To change this behavior, we reach inside the Kalman filter at each step to modify the error covariance matrix by the operation of x-reduction. This operation converts the accuracy algorithm to a stability algorithm, which gives the reduced Kalman scale. The KPW scale may be regarded as an approximation of the reduced Kalman scale with explicit suboptimal weights instead of the automatically generated optimal weights. This study has been carried out in a simulation playpen in which all clocks behave according to their assumed models. There is no provision for outliers, jumps in phase and frequency, changes in the ensemble, and estimation of the process noise parameters.
Perhaps the simplicity and excellent stability of the reduced Kalman scale, under ideal conditions to be sure, will motivate an effort to supply these functions.