Design of Linear Cellular Neural Networks for Motion Sensitive Filtering

Recently, several researchers have proposed using spatio-temporal filters for image motion analysis. For example, the optical flow field can be calculated from the output of a set of spatio-temporal filters. Some of the most popular spatio-temporal filters are the space-time Gabor filters, obtained by convolving a time varying image with a space-time Gabor function. Based on the cellular neural network paradigm, we propose a new architecture for spatio-temporal filtering called a CNN filter array and demonstrate the design of CNN filter arrays for motion sensitive filtering. One advantage of this approach to motion sensitive filtering is that a global convolution in space and time can be performed by using only spatially local interconnections and exploiting the continuous time dynamics of the CNN filter array. No storage of any past image frames is required,


Design of Linear Cellular Neural Networks for Motion Sensitive Filtering
Abstract-Recently, several researchers have proposed using spatio-temporal filters for image motion analysis. For example, the optical flow field can be calculated from the output of a set of spatio-temporal filters. Some of the most popular spatio-temporal filters are the space-time Gabor filters, obtained by convolving a time varying image with a space-time Gabor function. Based on the cellular neural network paradigm, we propose a new architecture for spatio-temporal filtering called a CNN filter array and demonstrate the design of CNN filter arrays for motion sensitive filtering. One advantage of this approach to motion sensitive filtering is that a global convolution in space and time can be performed by using only spatially local interconnections and exploiting the continuous time dynamics of the CNN filter array. No storage of any past image frames is required,

I. INTRODUCTION
ECENTLY, several researchers have proposed using R spatio-temporal filters for image motion analysis. This interest has been motivated by several different factors. On the physiological side, it has been discovered that motion sensitive cells in the primary visual cortex are sensitive to certain directions and spatio-temporal frequencies [ 11-[3]. On the computational side, several researchers have pointed out that image motion at a specified velocity can be characterized as an orientation in the space-time domain [4]-[6]. This orientation manifests itself in the spatio-temporal frequency domain as skewing of the image spectrum so that all of the energy of a pattem translating at a velocity 17 is concentrated in a specific region of the spatio-temporal frequency space. The spatiotemporal filtering approach to image motion analysis uses the outputs of appropriately designed filters tuned to these regions, as input to later processing stages.
For example, the problem which has attracted the most attention in this area is the derivation of the optical Row. The optical flow is defined to be the two-dimensional velocity field in the image plane resulting from relative motion between the observer and objects in the environment. Because this flow field contains information about the structure of the environment, there exist many algorithms that attempt to recover information about the environment given the optical flow. Several researchers have proposed different methods for computing the optical flow by comparing the outputs of a Manuscript received July 7, 1992; revised November 3. 1992. This work was supported by the Air Force Office of Scientific Research (AFSUJSEP) under Contract F49620-90-C-0029. This paper was recommended by Asaociate Editor J. A. Nossek. B. E set of spatio-temporal filters tuned to different areas of the spatio-temporal frequency domain [7]- [ IO].
One of the spatio-temporal filters for image motion analysis which has attracted the most attention is the space-time Gabor filter, which we discuss in greater detail in Section 11. One disadvantage of using Gabor filters for image motion analysis is that the velocity at which a sine wave grating must translate in order for the Gabor filter to have its maximal response increases as the spatial frequency of the grating decreases [9, Typically, space-time Gabor filters have been implemented by convolving an image sequence by a space-time Gabor function. Digital computation of the convolution requires that many image frames be digitized and stored for later recall. Since each frame is already a two-dimensional array of pixel intensity values, the memory storage requirements are quite high. In order to operate at speeds on the order of speeds of objects moving in the environment, a system implementing the Gabor space-time filters also requires a wide bandwidth to handle the large amount of data which must be manipulated.
In this paper, we propose an alternative approach to implementing these types of filters. We assume that the image that we wish to filter is sampled in space by a rectangular 710 by 711 array of photodetectors, but not sampled in time. Our approach is to design an 710 by array of continuous time temporal filters, each of which is associated with one pixel in the image. These filters are spatially locally interconnected in such a way that when the filter array is input with the set of analog waveforms from the photodetector array, only the desired spatio-temporal frequencies in the input will appear at the output of the array. Since our architecture is based upon the cellular neural network (CNN) paradigm [ l l ] , [12] as introduced by Chua and Yang, we will refer to it as a CNN filter array.
This approach attempts to overcome the previously mentioned disadvantages of Gabor filters. First, these filters can be designed so that each filter has maximal response at the same velocity for translating sine wave gratings of most spatial frequencies. Second, this method does not require any explicit storage of any frames of the input image. Instead, we use the natural dynamics of the filter array to perform the desired convolution in both time and space. Finally, the image does not need to be digitized as the filter operates directly on the analog signal coming from the sensor array.
Fleet and Jepson 1131 have described work along a similar vein to the results presented here. They define a hierarchical computational structure based upon layers of linear processing fig. 21. U.S. Government work not protected by U.S. copyright units, which they show can be used for orientation and velocity sensitive filtering. The first layer of their hierarchical computation structure is very similar to a CNN filter array in which the temporal filters operate in discrete time. In fact, it is slightly more general. Since their biologically motivated first layer displays no orientation or velocity selectivity, they must add additional processing layers which operate on the output of this first layer. Our work demonstrates that with a structure simpler than the first layer of Fleet and Jepson's hierarchical architecture, we can achieve an orientation and velocity selective filter.
Motion detection algorithms have also been proposed previously for the original CNN paradigm and for the CNN with delay type templates [ 141. These algorithms have several drawbacks which this work attempts to address. The major drawback to all of these algorithms is that are based upon the assumption that the images are black and white. Although this assumption simplifies the task of motion detection, it is not a very realistic assumption for real world images. CNN arrays for motion detection which operate on a continuously varying input require the implementation of delay in the processing. This may be difficult because the delay must be on the order of the motion in the image, which is an order of magnitude slower than the dynamics of the processing. Finally, these methods are based on local pixel differences rather than on the global properties of translating images. Therefore, we would expect that any implementation of these types of algorithms for gray level images will be extremely sensitive to noise.
The paper is organized as follows. After reviewing some preliminaries of spatio-temporal filtering and some of the notation which we will use in this paper in Section 11, we introduce the CNN filter array architecture in Section I11 and show that there exist a spatio-temporal frequency response and a spatio-temporal transfer function describing the mapping from the input to the output of the array. In Section IV, we begin our design of CNN motion sensitive filter arrays by starting the design of a one-dimensional motion sensitive CNN filter array. However, we cannot completely specify the interconnections based on the spatio-temporal frequency response alone. One of the most important considerations which must also be taken into account is stability. As with any analog filter which incorporates feedback, there is a possibility that the filter may be unstable. We address this issue in Section V where we give necessary and sufficient stability criteria for these types of CNN's. These stability criteria take into account the massive redundancy resulting from the assumption of spatial invariance, resulting in simple graphical Nyquist type stability tests. Armed with these results, we complete the design of the one-dimensional CNN motion sensitive filter array in Section VI and give examples of simulation results. Finally, we illustrate how to extend these results to two dimensional images in Section VI1 before summarizing our results in Section VIII. duce one popular filter, the space-time Gabor filter, which has been used by many researchers. We then introduce the notation we use in the following sections to describe multidimensional arrays. The advantage of this notation is that the notation for one-dimensional and higher dimensional arrays is identical.
Consider the special case of a one dimensional sine wave grating translating at speed I J . In this case, the time varying image intensity at point 5 and time t is given by i ( z : t ) = sin (flz(zwt)).
See Fig. 1. The image intensity oscillates both in space and time with frequencies 0, and Rt, i.e., z(z, t ) = sin(R,z + n,t). Equating the arguments in the two expressions yields n, = -vR,.
This relationship between the spatial and temporal frequencies of an image undergoing uniform translation also holds for In general, we will use the hat notation, 2, to denote transforms in

PRELIMINARIES AND NOTATION
In this section, we briefly discuss the theory underlying spatio-temporal filtering for image motion analysis and intro- time, the upside hat notation, 5, to denote transforms in space, and the tilde notation, 2, to denote transforms in both space and time. Equation (1) indicates that for uniform translation in a one-dimensional image plane, all of the energy in the signal lies along the line wi = -vw, in the spatio-temporal frequency plane. In the two-dimensional case, all of the energy in the signal lies along aplane in the spatio-temporal frequency domain given by wt = -(3,, .' ).
The essential idea behind image motion analysis using spatio-temporal filters is to design filters which are sensitive to spatio-temporal frequencies corresponding to different image velocities. One popular such filter is the even-phase Gabor filter, obtained by convolving the image in space and time by a spatio-temporal even-phase See Fig. 2(a). This filter has a spatio-temporal frequency response which is equal to the spatio-temporal Fourier transform Therefore, given an image consisting of a translating sine wave grating with spatial frequency R,, this filter has its greatest response when the grating is oscillating in time with frequency Rt, corresponding to a velocity equal to -Ot/R,. Thus, changing the ratio of Rt and R, changes the velocity sensitivity of the filter. However, as we mentioned in the introduction, the maximal output of the Gabor filter for translating sine wave gratings of different sDatial freouencies occurs at different velocities. depending upon the spatial frequency. This effect is due to the fact that if G X w z > 1, the temporal frequency at which the amplitude of the frequency response is maximal is approximately equal to the constant Rt, for all spatial frequencies w,. Therefore, the velocity at which the response is maximal is approximately equal to Rt/w,.
These linear filters result in an oscillating output which may be difficult to interpret for later processing steps. One way to handle this is to sum the squares of the outputs of two filters which have nearly the same magnitude spatio-temporal frequency response, but which are 90" out of phase with each other. Adelson and Bergen [4], [5] propose to sum the squares of the outputs of the even-phase Gabor filter with the oddphase Gabor filter, obtained by convolving the input with the odd-phase Gabor function This sum of the squared outputs is denoted the motion energy.
In the following, we adopt the convention that all vectors and arrays will be indexed starting from zero. For a ddimensional array, define the size of the array to be the d-dimensional vector 6 = (no. . . . n d -~) , where n; is the size of the array in the ith dimension. Define 2, to be the set of multi-indexes which index this array, i.e., For example, an no x n1 two-dimensional array has size ii = (no, n1) and is indexed by For the purposes of defining multiplication and division between two elements of 26, we shall consider a multi-index as a d-dimensional vector in Rd and define multiplication between two elements of Rd using the normal scalar product in Rd. We define division as the following mapping from Rd x Rd to Rd However, we define addition and subtraction between two elements of 2, or between an element of 1 5 and a ddimensional vector in Zd somewhat differently to ensure that the sum is an element of 26. For all i , n E 2, define i mod n = m where m E (0, . . , , n -l} such that there exists an 1 E Z such that i = 12 + m. Define addition between two elements of Is, 2' and I;, by Define subtraction similarly. The same definitions hold if 2 '~ 2; and IC E Zd.

CNN FILTER ARRAYS
In this paper we will be considering a special type of CNN, which we will refer to as a CNN filter array. This architecture is similar to the generalized CNN architecture, except that it is purely linear [19]. CNN's are a general class of analog recurrent neural networks in which the "cells" or "neurons" are arranged in a regular array and are only interconnected with their nearest neighbors [ll], [12], [20]. In the case of CNN filter arrays, each of these cells is simply a linear continuous time temporal filter whose input/output behavior is described by the transfer function ij(s). A CNN filter array designed to process an no x n1 pixel time varying image will consist of an no x n1 array of such filters where the zth filter is associated with the zth pixel in the image.
The input to each filter is equal to the sum of a feedback term and a feedforward term. The feedback term is a weighted sum of the outputs of the filters in a small neighborhood of the filter. The feedforward term is a weighted sum of the intensities of the pixels in a small neighborhood of the associated pixel in the input. We will assume that the weighting factors are spatially invariant.
More specifically, denot_e the input y d the output of the ith filter at time t by v(k, t ) and y(k, t)_respectively and the intensity of the associated pixel by u ( k . t ) . For a given The input to the i t h filter at time t is given by We will refer to the A and B coefficients as the CNN cloning templates. In the one-and two-dimensional cases, we will specify these (27-+ 1) vectors or (2r + 1) x (27-+ 1) matrices where the center elements correspond to the 0th element of the cloning template. This type of array is related to the original definition of CNN's defined in [lo] and [ l l ] in that if we set j ( s ) = R / ( l + sRC), where R is the parallel resistance in each cell and C is the value of the capacitor in each cell, the differential equation goveming the outputs of the CNN filter array is the same as the equation goveming the evolution of the state of the original CNN, if the output non-linearity in the original system is equal to the identity. This is equivalent to assuming that the input to the CNN is sufficiently small and the dynamics evolve in such a way so that the outputs of the CNN never enter the saturation region of the piecewise linear output nonlinearity.
Due to the assumption of spatial invariance, there is a simple intuitive way to think about these types of arrays, which is I Fig. 3. In a one-dimensional CNN filter array with 1' = 1, the input to each filter is a weighted sum of the outputs and associated inputs of itself and its neighbors to the left and right. mathematically justifiable. Consider a one-dimensional CNN filter array, as shown in Fig. 3, and assume that the array is infinite. Because of spatial invariance, the computation of the weighted feedforward and feedback sums is simple a discrete cross-correlation of the B and A cloning templates with the input and output, respectively. These operations have spatial frequency responses given by the complex conjugates of the discrete space Fourier transforms of the waveforms corresponding to the B and A coefficients. Thus, we can think of the system as the block diagram system shown in Fig. 4. The effect of the cross-correlation of the output with the A template is characterized by the spatial frequency transfer function (2) liB,d where w, is the spatial frequency in radians per pixel. Similarly, the effect of the cross-correlation of the input with the B template is characterized by the spatial frequency transfer By naively writing the loop equations, we can derive what appears to be a spatio-temporal transfer function, Similarly, substituting s = jw, into the above we obtain a what appears to be a spatio-temporal frequency response, Assuming our intuition is correct, if the icput of the CNN filter array is a moving sine wave grating, u ( k , t ) = sin (wtt + 3, . g), the output will be another sine wave grating of the same spatial and temporal frequencies, but scaled in amplitude and shifted in phase,

y ( i , t ) = ljL(;,, j w t ) ( s i n ( w t t + 1 3 , .~+ i j 1 ( 3 , . j w t ) ) .
This intuition can be made more rigorous in a way that also takes into account the fact that in practice we will never have an infinite array. In the finite array case, the CNN linear filter array can be analyzed as an MIMO system by listing the inputs and outputs of the array as a vector. In the two-dimensional case, using the correct ordering, the matrices representing the cross correlations by the A and B templates coefficients are block circulant with circulant blocks [2 11. Using the special properties of these types of matrices, it can be shown that the spatio-temporal transfer function and frequency responses of the finite array of size 6 with periodic boundary conditjons can be obtained by sampling (3) and (4) where E 2,. Therefore, when discussing the spatio-temporal transfer function or frequency response, we will generally mean (3) and (4).
The existence of the spatio-temporal transfer function suggests a convenient way to think about the operation of the CNN filter array. The CNN filter array is essentially a set of decoupled temporal filters, each of which processes a single spatial frequency component of the input. If we take the spatial Fourier transform of the image waveform at each time t , we obtain a time-varying set of Fourier coefficients. The temporal evolution of the Fourier component of the image waveform at frequency G, is specified by a complex scalar valued function of time. The CNN filter array filters this scalar function by a temporal filter that has a transfer function given by (3). Therefore, (3) can be thought of as a spatial frequency-dependent SISO temporal transfer function.
Example 1: The Silicon Retina: Mahowald and Mead have pointed out that the resistive grid of their silicon retina [22] computes a temporally and spatially weighted average of the photoreceptor outputs. A schematic diagram of a one-dimensional silicon retina is shown in Fig. 5. Roughly speaking, the temporal averaging is due to the capacitance associated with each node of the resistive grid and the spatial averaging is due to the effect of the resistors linking adjacent nodes in the grid. The space constant of the array, LY = m, determines the amount of spatial smoothing. The interaction between the spatial and temporal averaging can be easily explained using the concept of the spatio-temporal frequency response.
We can define a CNN filter array which is the input/output equivalent to the resistive grid by setting the filter transfer function i j ( s ) to be equal to the impedance of the nodal capacitance, l/sC, and the A and B templates equal to Using this equivalent CNN filter array, the spatio-temporal frequency responses of the resistive grid and of the silicon  (4), we obtain the spatio-temporal frequency response of the resistive grid, Since the output of the silicon retina is equal to the difference between the output of the photodetectors and the nodal voltages of the resistive grid, the spatio-temporal frequency response of the one-dimensional silicon retina is which has magnitude The spatio-temporal frequency response of the silicon retina offers a rigorous explanation of the effects observed by Mahowald and Mead. For example, they have noted that the time response of the silicon retina varies with the space constant, Q, of the resistive grid. For a low, a test flash of any limited size will produce a sustained output, but as a ---f 03, a test flash will have no sustained output. The sustained output is essentially the steady-state response, i.e., wt = 0. Fig. 6(a), which plots the cross section of the spatio-temporal frequency response at wt = 0 for varying values of a, shows that the amplitude of the spatio-temporal transfer function decreases with increasing cy.
In addition, Mahowald and Mead have noted that for a fixed space constant, the peak value of the output a small test flash is larger than that resulting from a larger test flash. Fig. 6(b) plots cross-sections of the magnitude of the spatiotemporal frequency response at different spatial frequencies.
The magnitude of the spatio-temporal frequency response increases as the spatial frequency increases. Since a small test flash has more high spatial frequency content than a larger flash, the peak of the transient response of the smaller flash should be greater than the peak of the transient response of the larger flash. Interestingly, for higher temporal frequencies, the by sampling the following function in w,: values of U show that the amplitude of the spatio-temporal frequency response decreases with increasing n. This indicates that the sustained output to a test flash will be smaller, the larger the space constant of the resistive grid. (b) These plots of the cross section of the amplitude of the spatio-temporal frequency response of the silicon retina at varying values of dl-for Q = 1 . 5 shows that the amplitude increases with increasing spatial frequency. Since a small test flash has a more high-frequency content than a larger flash, the peak output of the silicon retina is larger for the small flash than for the larger flash. amplitude of the frequency responses decreases slightly with increasing spatial frequency.

I v . DESIGN OF MOTION SENSITIVE FILTERS I
In this section and in Section VI, we discuss the design of a spatio-temporal filter which is tuned for objects moving with velocity +1 pixelk in a one-dimensional image plane. The filter we obtain can be easily extended to two dimensions, as we will see in Section VII.
Consider an n-cell one dimensional CNN with g(s) = s -l . If this CNN is stable, then its frequency response is obtained Separating the real and imaginary parts results in the equation at the bottom of the page.
Our design strategy will be to design the real and imaginary parts of the numerator and denominator of the transform separately. The real part of the numerator (denominator) is determined by the sums of pairs of coefficients located symmetrically with respect to the center element in the B ( A ) cloning template. The imaginary part of the numerator (denominator) is determined by the differences between those pairs. Since specifying the sum and the difference of two numbers uniquely determines the values of those numbers, this design strategy completely specifies the desired A and B cloning templates.
In order to determine the desired form of the real and imaginary parts of the spatio-temporal frequency response, we will consider only sine wave grating input. There are three main reasons for this. First, as a necessary condition for proper execution, a velocity sensitive filter should work properly on a subset of the possible inputs. Second, any more complicated input can be expressed as a weighted sum of sine wave gratings. Third, this technique is well suited for the design of CNN filter arrays since, as pointed out in Section 111, a CNN filter array is essentially a set of decoupled filters, each operating on one spatial frequency component of the input.
The only interaction between the spatial and temporal frequencies in the frequency response occurs in the imaginary part of the denominator. Since the spatio-temporal spectrum of an image translating at velocity '/I is concentrated along the line wt = -uw,, for each spatial frequency, w,, the spatiotemporal frequency response of a CNN filter array tuned to velocity 'ii should achieve its maximum with respect to ut at ut = --'uw,. This ensures that, at least for sine-wave gratings, the filtcr has its greatest response for stimuli translating at velocity w.
To satisfy this requirement, we choose the differences a ( i ) -a ( -i ) such that I' / = I for w, E [ -T , T ) . The region where we fit the two curves is limited to w, E [ -T . T ) because of aliasing due to sampling the image in space. If T = x, this is can be done exactly by setting the differences equal to the Fourier sine coefficients of the sawtooth waveform. However, since, we are interested in keeping the connection radius small in the interests of ease of implementation, we are limited to only a few Fourier coefficients. Simply truncating the Fourier coefficients of the sawtooth waveform leads to an excessive amount of "ringing." Therefore, the Fourier coefficients must be appropriately windowed to reduce these effects. Assume T = 3; the following coefficients were chosen in an attempt to maximize the spatial frequency range within which the two curves match, while minimizing the amount of ringing in that range.
n(1) -a(-1) = -1.90 Fig. 7. To completely specify the A template coefficients, we must also choose their sums. However, considerations based upon the form of the spatio-temporal frequency response are not sufficient. As with any system which incorporates feedback, we must be concemed about stability. Therefore, we digress in the next section to discuss stability criteria for CNN filter arrays, before returning to the design of CNN motion sensitive filter arrays in Section VI.

V. STABILITY
We define stability for CNN filter arrays based upon the stability for CNN filter arrays based upon the stability of MIMO feedback systems [25].
Definition 2: A linear space-invariant CNN is exponentially stable if the associated MIMO feedback system is exponentially stable, i.e., it is I ) proper; 2) all of its poles have negative real parts. Exponential stability implies bounded input-bounded output stability. Thus, if a CNN filter array is exponentially stable, then a bounded input will result in a bounded output. It should be clear that stability is independent of the form of B template.
From Section 111, the CNN filter array can be expressed as a set of decoupled single input-single output systems, each of which determines the evolution of one spatial frequency component of the output. The SISO system governing the evolution of the spatial frequency component at frequency ;,is characterized by the transfer function, 6( 3,) 4 ( s) this section, we state stability criteria for CNN filter mays. D~~ to the assumption of spatial invariance, these graphical tests. In addition, these tests can be interpreted as checking the "poles" of the spatio-temporal transfer function, a concept familiar to engineers from SISO filter design.
The results stated here apply only to linear space-invariant  and filter transfer function j ( s ) = l/s. By the above proposition, to check the stability of this array, we must verify that E [0, 27r) This is equivalent to checking that the real part of ii(w) is negative for all w E [O, 2 n ) . Fig. 8 shows the locus of points ii (w,) in the complex plane as w, ranges from 0 to 27r. Since this locus is contained entirely within the left half plane, all CNN filter arrays in this set with size 6 > 7 are stable.
Checking the above condition is easy in the case of Example 2, where we need only verify that the real part of ii(L3,) is less than zero for all 2, E [0, 2~)~. However, in the case of higher order g( s), direct verification of ( 5 ) requires finding the roots of an infinite number of polynomials. Fortunately, we can avoid this problem by using the following graphical test, based on the Nyquist criterion. Example 3: In an actual implementation of the CNN filter arrays discussed in Example 2, the filters g(s) = 1/s might be implemented as capacitors, the voltage across which would be the filter outputs. The input would be applied as a current injected into the capacitor. The feedback coefficients would be implemented as voltage controlled current sources. The model presented in Example 2 assumes that these transconductance amplifiers have infinite bandwidth, a condition unobtainable in practice. Therefore, we should be concerned about the effect of the limited bandwidth of the transconductance amplifiers on the stability of the array. A more realistic model of the transconductance amplifier is as a single pole system, with a cutoff frequency of p radians per second, i.e., a ( . ) / ( ( s / p ) + 1). For the sake of simplicity, we have assumed that the cutoff frequency of all of the transconductance amplifiers are the same. Due to the linearity of the system, we can take this modification into account by modifying the filter g(s) to be g(s) = l / ( s ( ( s / p ) + l ) ) . Using the above corollary, instead of checking the roots of ( s 2 / p ) + s + ii (w,) for all w,, we need only plot the locus of ii(wT) for w, E [O, 27r) and check that the Nyquist plot of l/g(s) = ( s 2 / p ) + s does not intersect or encircle any of the points of this locus. See Fig. 9.

VI. DESIGN OF MOTION SENSITIVE FILTERS 11
In this section, we complete the design of our onedimensional motion sensitive filter array. We also present simulation results of the filter, demonstrating that it works as expected.
Recall from Section IV that our design strategy consisted of independently designing the real and imaginary parts of the denominator and numerator of the spatio-temporal frequency response. In that section, we discussed how the requirement of velocity selectivity constrained the choice of the imaginary part of the denominator of the spatio-temporal frequency response.
On the other hand, considerably more freedom exists in choosing the real part. Several different factors must be taken into account and weighed against each other. We will discuss four factors in the choice of the even part of the feedback coefficients: 1) stability; 2) velocity tuning; 3) localization in spacehime; 4) robustness to parameter variation.
\::I / Fig. 10. The dotted line shows the ideal real part of the spatio-temporal frequency response specified by (7), which results in a filter with constant velocity bandwidth. However, several other considerations must be weighed against the desire for constant velocity bandwidth, resulting in the real part plotted by the solid line.
Of course, the most important of these considerations is stability. Since g(s) = l / s , the results of Section V show that the stability of the CNN filter array depends upon the real part of ii(w,) being less than zero. Therefore, the sums of the feedback coefficients must be chosen so that the resulting real part of the denominator of the spatio-temporal transfer function is negative.
Velocity tuning refers to how sharply tuned the motion sensitive filter is to a particular velocity. Consider a sine wave grating of spatial frequency R,. For a velocity filter tuned to velocities in the range U -AV to U + AV, the passband of the R, spatial frequency filter should run from wt = (,U -Av)R, to wt = (U + Azi)R,. The lower the spatial frequency, the sharper the required temporal tuning must be for the same velocity tuning. Intuitively, the lines in the spatiotemporal frequency domain corresponding the nonzero parts of the spectra of images undergoing uniform translation all pass through the origin. Therefore, for constant velocity bandwidth, the passband of a spatio-temporal filter must be narrower near the origin.
Define the bandwidth of each spatial frequency temporal filter to be the difference between the frequencies at which the magnitude of the temporal frequency response is 3 dB below the peak value. This occurs when the magnitude of the real part of the denominator equals the magnitude of the imaginary part. This implies that the bandwidth is equal to two times the magnitude of real part. For a constant velocity bandwidth across all spatial frequencies of 2Av, we must set for all w, E [ -T , T ) . Recall that for stability reasons, the real part must negative.
Velocity tuning must be weighed against localization of the filter output in time and space. For a given spatial frequency temporal filter, the sharper the temporal tuning, the longer the transient response of that filter. If we choose the real part according to (7), the temporal transients of the low spatial frequency filters will be very long. This has two important ramifications. First, any initial conditions in the array with low spatial frequency components will take a long time to die out. Second, the response of the filter will not be very well localized in time. In other words, the filter will not respond quickly to as a contour diagram. The filter with feedforward cloning template 0 3 has approximately the same magnitude, but is shifted in phase by 90°. changes in the input velocity. It turns out that localization in time implies localization in space for more general inputs than sine wave gratings.
To avoid these problems, we make the magnitude of the real part very large for low spatial frequencies, while using (7) to   chose the real part for high spatial frequencies. Not only does this ensure that the transients die away quickly, it also has the added benefit of decreasing the gain of the low spatial frequency filters, making the filters relatively insensitive to changes in the overall illumination level. How negative we can make the real part is limited by the last consideration: robustness in the presence of parameter variation. In our design procedure, we specify the sum and the difference of pairs of feedback coefficients, rather than the parameters themselves. If the sum of two parameters is very large, but the difference is small, a relatively small change in one of the parameters may result in a large change in the difference. A similar effect can occur if the difference is large, but the sum is small. If the magnitudes of the sum and the difference are approximately equal, this effect is minimized.
Taking into account all of these factors and setting the desired velocity bandwidth to be 40%, we have designed the following cosine coefficients: This template is the same as that discussed in Examples 2 and 3. Based on our consideration of robustness in the presence of parameter variation, we chose the sum and difference of two pairs of the coefficient pairs to be exactly equal. This results in two of the template elements being equal to zero, which is advantageous if one intends to implement these filters using capacitors and transconductance amplifiers, since the corresponding transconductance amplifiers can just be omitted.
Finally, we must choose the B cloning template coefficients. We have two goals in choosing these coefficients. One is to increase the spatial frequency tuning. In particular, we wish to eliminate any dc response as the output of the filter should be insensitive to changes in the overall illumination. We also wish to decrease the gain of the filters for very high spatial frequencies where the imaginary part departs from the line -wz, as the filter is tuned to the incorrect velocity at these frequencies. The second goal is to create two filters that are 90" out of phase with each other, so that the outputs of the two filters can be squared and summed to obtain the motion energy. To create these two filters, we use the same A template and design two B templates with nearly the same magnitude spatio-temporal frequency response, but which are 90" out of phase. The resulting B templates are given by and their spatial frequency responses are plotted in Fig. 1 1. Fig.  12 displays the spatio-temporal frequency response of the final CNN motion sensitive filter array. As designed, the passband is oriented along the line wt = -wz, resulting in velocity sensitivity for patterns translating at 1 pixells. These same feedback and feedfonvard coefficients can be used in a filter tuned to a velocity 'U. We simply choose g(.s) = l / / v ( s . For negative velocities, the order of the A template coefficients must also be reversed. This results in a filter tuned to the velocity ' U, with the same relative velocity bandwidth, 40%.
The CNN filter array designed above was simulated using a fourth order Runge-Kutta algorithm modified from [26]. Fig.  13 shows the responses of the two filter arrays to inputs consisting of bars 10 pixels wide translating at different velocities. The filter arrays were 50 cells long. Fig. 14 shows the corresponding motion energies. As intended, the filters show velocity bandpass behavior with a peak response for stimuli moving at 1 pixelk Notice also that these filters are direction sensitive.

VII. EXTENSION TO TWO DIMENSIONS
These filters can be used to analyze motion in two dimensions. Assume that we wish to detect motion at 1 pixells in the horizontal direction on the image plane. Other directions can be selected by rotation of the filter array. The corresponding filter array consists of rows of one-dimensional filters aligned in the horizontal direction. If these one-dimensional filter arrays are not interconnected, the two-dimensional CNN filter array is sensitive to any motion in the image plane which has a horizontal component of motion as long as the gradient of the image intensity has a component in the horizontal direction. Given information only within a local neighborhood, the only component of the velocity vector of the optical flow which can be recovered is the component perpendicular to an edge. This is the well known aperture problem [27]. We can construct filters tuned to motion perpendicular to local edges, by adding feedback connections between cells which are vertically aligned. The resulting A template is The value of g controls the width of the convolution kernel in the direction perpendicular to the tuned velocity. For higher values of g, the filter is more sharply tuned to the motion of edges that are perpendicular to the tuned direction. See Fig. 15. In an implementation of this array with capacitors and transconductance amplifiers, the feedback interconnections "i A A Fig. 15. A contour plot of the cross-section of the spatio-temporal frequency response of the two-dimensional filter with feedback coefficients linking adjacent rows in the plane ii'f = -dI shows that the filter is tuned to vertically oriented spatial edges.
correspond to resistors with conductance g linking neighboring capacitors in adjacent one-dimensional filter arrays.

VIII. SUMMARY AND CONCLUSION
Based upon the CNN paradigm, we have introduced a new architecture for spatio-temporal image filtering called the CNN filter array. Because they are linear, these filter arrays have well-defined spatio-temporal transfer functions and frequency responses that characterize the mapping from their input to their output. Simple graphical Nyquist type stability criteria exist for these arrays, based upon the notion of determining the locations of the poles of the spatio-temporal transfer function. Using the fairly general results here, we have demonstrated the systematic design of CNN filtering arrays for motion sensitive filtering. This approach has significant advantages over previous approaches to motion sensitive filtering, such as the Gabor filter, as well as other CNN approaches to image motion analysis.