Performance Characteristics of Low-Level Motion Estimators in Spatiotemporal Images

The performance characteristics of gradient-based, tensor-based, and phase-based low-level motion estimators are studied analytically. All three techniques yield exact results under ideal conditions (constant motion (cid:12)eld and perfect discretization). With appropriate derivative (cid:12)lters, accuracy well below a hundredth pixel/frame can be achieved. All techniques are quite insensitive to inhomogeneous and accelerated motion. The di(cid:11)erential method, however, is biased by noise while the tensor method is not biased by noise. Under certain conditions, the phase method is insensitive against global illumination changes.


Introduction
This paper presents an analytical, numerical, and experimental study of the performance of low-level motion estimators in spatiotemporal images.Motivation for this work arose from scientific applications of image sequence processing within the frame of an interdisciplinary research unit where the study of transport, exchange, and growth processes with various imaging techniques requires highly accurate velocity estimates [Jähne et al., 1996].These high accuracy demands triggered a revisit of the basic theory of motion estimation in spatiotemporal images.
In this paper only low-level motion estimators are discussed.This is, of course, only a part of the picture.However, systematic errors in low-level estimators will propagate and thus cause also systematic errors in subsequent processing.Only a few systematic studies of the performance characteristics of low-level motion estimators are available in the literature.Kearney et al. [1987] performed an error analysis of optical flow estimation with gradient-based methods, while Barron et al. [1994] used a set of computer-generated and real image sequence to compare various approaches to compute optical flow.Haglund and Fleet [1994] used a natural image and warped it to study the performance of phase-based and energy-based techniques.
None of the above cited papers includes an analytical study of the performance of different techniques.This is the main new point in this paper.The paper starts with a brief outline of differential, tensorial, and phase-based techniques formulated in a unified framework as nonlinear filter operations in continuous spatiotemporal images in Sect. 2. This section forms the basis for the discussion of their performance characteristics.Section 3 discusses the exactness of the techniques under ideal conditions with some remarks on systematic errors introduced by the discretization of the filter operations.Sect. 4 deals with the bias introduced by noise.In the remaining sections a systematic study of the influence of deviations from a constant motion field on the accuracy of the motion estimate is performed: Unsteady motion (Sect.5), motion discontinuity (Sect.6), and illumination changes (Sect.7).

Motion Estimator as Nonlinear Filters in Spatiotemporal Images
The basic term for a unified approach to low-level motion estimation in spatiotemporal images g(x, t) contains the product of derivatives into the different coordinate directions averaged over a certain window w(x, t) [Jähne, 1993]: ( In the lower part of the equation, the convolution integral is abbreviated with angle brackets.The operations in (1) can be performed as a cascade of linear convolution and (nonlinear) point operations: where B and D p are a smoothing filter and a derivative filter into the direction p.
The standard least squares approach with the differential method results in the linear equation system for the optical flux f = (f x , f y ): The tensor method additionally uses the component J tt to form the symmetric tensor An eigenvalue analysis of this tensor yields the optical flow and a characterization of the spatial structure [Bigün and Granlund, 1987;Jähne, 1990 and1993].In case of a distributed spatial structure and a constant motion, only one eigenvalue of J is zero.The optical flow can then be computed from the corresponding eigenvector e 3 = (e 3x , e 3y , e 3t ) as [Karlholm, 1996] For a spatially oriented structure (e. g. an edge), only the velocity component perpendicular to the oriented structure can be computed from the eigenvector to the largest eigenvalue e 1 = (e 1x , e 1y , e 1t ) as The phase method, introduced by Fleet and Jepson [1990], deviates in as much from the previous methods that only normal velocities are determined from images filtered by a set of directional quadrature filters (q + and q − ).Then the component of the optical into the direction of the directional quadrature filter can be computed by The analysis performed in this paper is very general without making any specific assumptions about the types of gray value structures.Most importantly, it does not use Taylor expansions of the gray scale structure which give only approximate results.The following two classes of general continuous gray value structures are applied: planar wave, moving "edge" g(kx − ωt) constant 2-D motion, moving "corner" g(x − u 0 t). ( These elementary classes will then be modified systematically by adding various types of noise and by including unsteady motion, motion discontinuities, and illumination changes.
It is important to note that the analytical study of continuous signals does not include errors due to discretization.Partitioning of the error analysis into an analytic and a discrete part has significant advantages.For example, it is possible to distinguish principal flaws in an approach from errors in the implementation due to an inadequate numerical method.Often these two types of errors are mixed up in the literature.

Exactness
In this section, we will analyze possible systematic errors of the various motion estimators discussed in Sect. 2. We will start with what appears to be trivial -but is not -and verify that the techniques indeed deliver exact results under ideal conditions.The computations also give further insight into the properties of the different techniques.[Scharr et al., 1997].The error bars give the 1-sigma standard deviation when the velocity estimate is averaged over the whole image.

Differential Method
We take an arbitrary spatial gray value structure moving with a constant velocity, g (x, t) = g(x − ut).The first-order partial derivatives are then given by ∇g x = ∇g x and ∂ t g = −u∇g and we obtain provided that the inverse of ∇g∇g T exists.These computations are significant, since they do not depend at all on the specific form of the gray value structure.With g(x − ut), we only made the assumption that an arbitrary spatial structure is moving with a constant speed u.
It is a widespread misconception, which has also found its way into monographs on image sequence processing, that differential methods do not deliver accurate results if the spatial gray value structure cannot be adequately approximated by a firstorder Taylor series (see for example [Singh, 1991]).Kearney et al. [1987] provided an error analysis of the gradient approach and concluded that it gives erroneous results as soon as second-order spatial derivatives become significant.
As shown above, correct results are obtained as soon as the linear equation system can be solved at all.Errors in the estimation of optical flow are only introduced by an inadequate discretization of the partial derivative operators to the extent that the discrete differential operators deviate from the transfer function of an ideal derivative operator.With the standard symmetric difference filter 1/ 2 [1 0 -1], deviations up than 0.1 pixels/frame occur Figure 1a.Interestingly, the errors cancel out at certain velocities (0 and 1 pixels/frame).This is due to the effect that in the solution of (3) only quotients of temporal and spatial derivatives occur.Thus if one of them is zero, e. g., at zero velocity, or temporal and spatial derivatives are equal, e. g. at 1 pixel/frame velocity, the errors cancel out.The error is maximal at a velocity of about 0.5 pixels/frame.The error of the estimate depends on the local wave number and frequency of the local neighborhood since the deviation of the derivative filters from the ideal transfer function of a derivative filter depends on the wave number.
Actually, it is not required that the transfer function of an optimal derivative filter for motion estimation approaches those of the ideal derivative filter, ik q .As mentioned already above, the derivative filters occur only in quotients.This is the case for all low-level motion estimators based on derivative filters.Therefore the derivative filters in the different coordinate directions may contain an arbitrary common function: For the function B(k, ω) typically an isotropic smoothing function is used in order to take advantage of the fact that the derivative filters can be regularized in this way as well.The optimization of this general type of derivative filters is discussed in detail by Scharr et al. [1997].It turns out that an optimized 3 × 3 Sobel-type filter is already sufficient to suppress the error of the velocity estimates well below 0.005 pixels/frame (Figure 1b).For a dilatational motion the error is about the same for the simple symmetric derivative filter (Fig. 2a), while it is slightly higher for the optimized filter (Fig. 2b).
The slight increase in the error in this case becomes clear with the discussion in Sect.6.

Tensor Method
To prove the results for the tensor method, we use a constant motion with and without a spatially oriented pattern and perform the eigenvalue analysis symbolically.For an oriented pattern (planar wave) g (x, t) = g(kx − ωt) moving along with the normal velocity kω/(k 2  x + k 2 y ), where ∂ r g = ∂g/∂r with r = kx − ωt.With (10), the six independent components of the structure tensor are ( Leaving out the constant mean square gradient g 2 r , the structure tensor has the form It is too tedious to compute the eigenvalues and eigenvectors of the above symmetric tensor manually.But it can be performed easily with software for symbolic mathematical computations such as Mathematica from Wolfram Research.The result is that two eigenvalues are zero and that the eigenvector of the nonzero eigenvalue is of the form With this equation we can verify (6) .With a constant moving spatially distributed pattern g (x, t) = g(x − ut) the derivatives are With ( 1) and ( 14), the six independent components of the structure tensor are ( Evaluating the eigensystem of this tensor with Mathematica shows that one eigenvalue is zero and that the eigenvector of the zero eigenvalue has the form verifying (5).

Phase Method
To verify the phase method, we apply an arbitrary 1-D spatial gray value structure moving with constant velocity.We use Then (7) results in This result is surprising because we do not make any use of the fact that g − is the Hilbert transform of g + .Consequently, the phase method gives accurate results with any pair of g + and g − if only the requirement g This result is also important considering the fact that it is difficult to design an effective Hilbert filter [Jähne, 1997a].A rough approximation will already give good results.

Bias by Noise
Noise may introduce a systematic error into the flow estimate.We investigate this by adding isotropic zero-mean noise We assume that the partial derivatives of the noise function are not correlated with themselves or with the partial derivatives of the image patterns.Therefore we use the conditions: ∂ p n∂ q n = 0, p ≠ q, and ∂ p g∂ q n = 0 (17 and the partial derivatives are ∇g = ∇g + ∇n and ∂ t g = −u∇g + ∂ t n. (18)

Differential Method
A noisy image with the conditions as set up above gives the optical flow estimate The key to understanding this matrix equation is to observe that the noise matrix (∇n∇n T ) is diagonal in any coordinate system, because of the conditions set by (17).Therefore, we can transform the equation into the principal-axes coordinate system in which ∇g∇g T is diagonal.Then we obtain The inverse of the first matrix always exists when the variance of the noise is not zero.Therefore, we obtain This equation shows that in noisy images two unfortunate things happen.First, we always obtain a solution and can thus no longer distinguish constant neighborhoods from edges and corners.Second, the obtained estimate is erroneous.It is biased towards lower values.This is also verified by experiments as shown in Fig. 3a  and b.If the variance of the noise is about the squared magnitude of the gradient, the estimated values are only about half of the true values.Thus the differential method is an example of a non-robust technique since it deteriorates in noisy image sequences.

Tensor Method
The key to an easy solution is the transformation of the structure tensor into the local principal-axes coordinate system.A spatiotemporal image consisting only of zero-mean normal-distributed and isotropic noise with the variance σ n results in a diagonal inertia tensor in any coordinate system, σ 2 n I , where I is the unit diagonal tensor.The variance σ 2 n of the noise depends on the degree of smoothing applied when calculating the elements of the structure tensor according to (1).With white noise, σ 2 n is inversely proportional to the volume over which the elements are averaged, i. e., σ 3 of the smoothing filter.
The structure tensor of a noisy spatiotemporal image is therefore given by  From this relation, we can immediately conclude that the eigenvectors for an ideal oriented structure with isotropic noise are the same as for the noise-free case.Therefore, an unbiased estimate of orientation is possible.This is no longer the case, however, if the noise is directional.
If {λ 1 , λ 2 , λ 3 } is the set of eigenvalues of J , the eigenvalues of J are This important relation allows us to check the influence of noise on the various coherency measures.The spatial coherency measure c s [Jähne, 1997b] becomes Note that the variance of the noise is contained only in the divisor.Consequently, noise reduces the coherency measure.If the noise and signal levels are about the same, the maximal coherency is reduced to about half the value of the noise-free case.The effect is similar for the total coherency [Jähne, 1997b]: Because the noise terms cancel in the difference of the eigenvalues, the noise term is again contained only in the divisor, reducing the coherency.

Unsteady Motion
In this section we check the sensitivity of the optical flow estimation to changes in the motion field.By way of example, the temporal changes are discussed.Since motion results in oriented structures in space-time images, changes in the motion lead to curved structures.In order to model an unsteady motion, we introduce an acceleration For an arbitrary gray scale structure we can then write It is obvious that this modified model will influence only the temporal derivatives.
The temporal derivative now becomes where we use a = a x , a y T .
In all methods, weighted products of the derivatives in the form of (1) appear.For example, and x + 2(u x a y + u y a x ) tg x g y + 2u y a y b tg 2 y + a 2 x t 2 g 2 x + 2a x a y t 2 g x g y + a 2 y t 2 g 2 y .
(28) Although these terms look discouragingly complex, they are easy to interpret.We can distinguish terms with linear and quadratic appearances of time t.First, we have terms in which the time appears linearly in the averaging together with the corresponding derivatives.In the ideal case, when the spatial derivatives are evenly distributed over the averaging area, they all cancel out.The expression t is zero since the center of the coordinate system has been placed at the center pixel in (26).Now consider the converse case that the whole neighborhood over which we average contains only a singular steep spatial derivative at time t 0 .Then, for instance, (27) reduces to This equation means that the velocity obtained is not the velocity at time 0 -as expected -but the velocity at time t 0 .Thus the worst that could happen is that the optical flow estimate is biased to a value which is at the edge of the averaging neighborhood.If the spatial gray values are more equally distributed, the shift is of course less.
The discussion so far is valid for all methods, since the effect of the accelerated motion is simply that the correct velocity u has to be replaced by a biased velocity u + at 0 .The upper limit for t 0 is the temporal extension of the averaging mask.It is obvious that the same train of thought is valid for spatial variations of the velocity field.The velocity in the neighborhood is biased at most to a velocity value at the edge of the averaging neighborhood.Generally, we can conclude that the estimate of the optical flow is pulled to the values of regions with the steepest gradients.
One term remains to be discussed which appears only in the tensor method.It is J tt = (∂ t g) 2 (28).If we neglect the terms linear in t, terms that are quadratic in t still remain and (28) reduces to The terms quadratic in t appear as soon as the motion is accelerated.They contain a product of the squared acceleration and a measure of the temporal variance weighted with the steepness of the gray value changes.Since the mathematics becomes quite complex for the 3-D tensor, we illustrate this effect of an accelerated motion with the 2-D case.Then and the 2-D structure tensor is (31) In the limit a 2 x (t ∂ x g) 2  (1 + u 2 x ) (∂ x g) 2 , the eigenvalues are In this approximation the eigenvector e 2 of the smallest eigenvalue is no longer simply [u x , 1] T but Thus there is a small bias in the estimate of the optical flow which goes with the square of the acceleration.As an example we take u x = 1, a x = 0.1, and This means that the motion is changing by ±20 % within the temporal width of the averaging window with a radius of 2.Even with this large relative velocity variation, the relative bias in the estimate of the optical flow is only 2 %.Likewise the decrease in the coherency measure is only marginal: (34) For the above example the coherency decrease is just 8 %.We can draw two important conclusions.First, the coherency is not sensitive to gradual changes in the velocity.There are two other sources that decrease the coherency.We have already seen in Sect. 4 that the coherency also decreases with increasing noise level.This decrease is generally homogeneous in space and time.We will see in the next section that the coherency decreases significantly also at motion discontinuities in space and time.
Secondly, we can average the estimates in time even with accelerated motion.Thus much more reliable motion estimates can be determined than from only two consecutive images.

Motion Discontinuity
Next, we turn to the analysis of motion discontinuities.We discuss only the case of a one-dimensional flow estimate for the tensor and phase methods.The neighborhood is partitioned into two subareas with different velocities.Without loss of generality, we choose g (x, t) = g(x + ut)Π(−x) and g (x, t) = g(x − ut)Π(x), where Π(x) is the step function (we can recover all other cases by rotation of the coordinate system).

Tensor Method
The tensor method gives the following estimate for the optical flow: If the mean square gradient is equal in both regions, the estimated optical flow is zero, as expected.The coherency is where α is the angle between the spatiotemporal orientation in the two regions (α is related by tan(α/2) = u to the velocity in the two regions) and γ is a measure that compares the mean square gradient in the two regions: The factor γ is one if both regions have the same mean square gradient and zero if the mean square gradient in one region is significantly larger than in the other.
Let us first assume that the mean square gradients are equal, i. e., γ = 1.Then the expression for the coherency becomes even simpler: c = cos α.This tells us that the coherency is zero if the orientations in the two regions are orthogonal.If γ < 1, the coherency reaches a minimal value of 1 − γ for regions with orthogonal orientation.In this way, the coherency is a direct measure for the change of the orientation (and thus velocity).Because of the squared sine function in (36), it is quite insensitive for small differences in the orientation angle.

Phase Method
This paragraph on the response of the phase method to motion discontinuities will illuminate a general problem of how to weight estimates of the optical flow properly.We use the same approach as for the tensor method and obtain This result is very similar to the tensor method; only the weighting factors are different.With the phase method, we can avoid weighting with the magnitude or squared magnitude of the gradients.Then we must set some kind of empirical threshold, however, because the phase information obtained from low gray value gradients will not be reliable.More generally, we lose a confidence measure for the phase estimate, since we treat all estimates as if they had the same error.
In any case, a motion discontinuity at a straight line between two regions with random variations in the mean square gradient will not appear as a straight line in the optical flow image.It will rather become a wiggly line because of the bias caused by weighting.The deviation from the true discontinuity can be as much as half the mask size of the averaging filters.We seem to have reached a point in the analysis of motion that cannot be solved on the base of local information alone.At this point, global information must be considered, such as the smoothness of boundaries.

Illumination Change
Finally, we discuss the influence of illumination changes using g (x, t) = g(x −ut, t).Then ∂ x g = ∂ x g and ∂ t g = −u ∂ x g+∂ t g, where ∂ t g is the explicit temporal derivative.

Tensor Method
The estimate of the optical flow for the tensor method gives Even if ∂ x g ∂ t g = 0, the velocity estimate is biased towards higher velocities, as can be seen for u 1: This result is not surprising, since illumination changes occur as additional patterns with an orientation corresponding to infinite velocity.

Phase Method
From a phase-based method we would expect that global illumination changes are entirely suppressed since only the phase and not the amplitude of the signal is used for the determination of displacements.
After some algebra we obtain where ∂ t g again means the explicit temporal derivative.The additional term vanishes when the explicit temporal derivative is directly proportional to the gray value.This is indeed the case for global illumination changes; they are directly proportional to the gray value.Thus the phase method suppresses global illumination changes under very general assumptions.

Conclusions
In this paper analytical techniques have been discussed to study the performance characteristics of low-level motion estimators in continuous space-time images.This technique proved to be powerful because it allows to separate principal characteristics of a technique from errors introduced by the numerical implementation and because it gives conclusive results under very general conditions.Still, a lot remains to be done.Interesting topics of further research are the study of statistical error propagation, a comparative study of gradient-based techniques with quadrature-filter set techniques and correlation-based techniques, and the inclusion of regularization techniques.

Figure 1 :
Figure 1: Systematic error in the velocity estimate as a function of the interframe displacement measured with a moving random pattern.Derivatives computed with a a the symmetric difference filter 1/ 2 [1 0 -1] b an optimized Sobel filter[Scharr et al., 1997].The error bars give the 1-sigma standard deviation when the velocity estimate is averaged over the whole image.

Figure 2 :
Figure 2: Systematic error in the velocity estimate as a function for a pure dilatational motion of a random pattern.The velocity is zero in the middle and ±1 at the edges.Shown is the error in the velocity component in x direction.Derivatives computed with a the symmetric difference filter 1/ 2 [1 0 -1] b an optimized 3 × 3 Sobel-type filter[Scharr et al., 1997].

Figure 3 :
Figure 3: Systematic error in the velocity estimate with a moving random pattern using the least squares gradient method (a and b and the tensor method (c and d for velocities in x direction from 0 to 1.4 pixels/frame.The signal-to-noise ratio is 10 (a and c) and 1, (b and d) respectively.