The neural processing of 3‐D visual information: evidence from eye movements

Primates have several reflexes that generate eye movements to compensate for bodily movements that would otherwise disturb their gaze and undermine their ability to process visual information. Two vestibulo‐ocular reflexes compensate selectively for rotational and translational disturbances of the head, and each has visual backups that operate as negative feedback tracking mechanisms to deal with any residual disturbances of gaze. Of particular interest here are three recently discovered visual tracking mechanisms that specifically address translational disturbances and operate in machine‐like fashion with ultra‐short latencies (< 60 ms in monkeys, < 85 ms in humans). These visual reflexes deal with motions in all three dimensions and operate as automatic servos, using preattentive parallel processing to provide signals that initiate eye movements before the observer is even aware that there has been a disturbance. This processing is accomplished by visual filters each tuned to a different feature of the binocular images located in the immediate vicinity of the plane of fixation. Two of the reflexes use binocular stereo cues and the third is tuned to particular patterns of optic flow associated with the observer's forward motion. Some stereoanomalous subjects show tracking deficits that can be attributed to a lack of just one subtype of cortical cell encoding motion in one particular direction in a narrow depth plane centred on fixation. Despite their rapid, reflex nature, all three mechanisms rely on cortical processing and evidence from monkeys supports the hypothesis that all are mediated by the medial superior temporal (MST) area of cortex. Remarkably, MST seems to represent the first stage in cortical motion processing at which the visual error signals driving each of the three reflexes are fully elaborated at the level of individual cells.


Introduction
New ideas about the way that the brain processes visual information have often resulted from introspection sparked by our vivid perceptions of the visual world. However, perception is only one outcome of visual processing. Another, which this paper concentrates on, is visually guided behaviour. Vision is arguably our premier navigational aid, allowing us to map out and actively explore our surroundings. However, we view the world from a constantly shifting platform and some visual mechanisms function optimally only if the images on the retina are reasonably steady. As we go about our everyday activities, visual and vestibular mechanisms help to stabilize our gaze on particular objects of interest by generating eye movements to offset our head movements. The traditional approach emphasized mechanisms that deal with rotational disturbances of the observer and only recently have translational disturbances been considered. These new stimuli have uncovered a number of new visual and vestibular reflexes with ultra-short latencies and the general picture that has emerged is of two vestibulo-ocular reflexes, the RVOR and TVOR, that compensate selectively for rotational and translational Here, the observer rotates about this vantage point and the pattern of flow resembles the lines of latitude on a globe. In reality things are never as simple as this, voluntary head turns occurring about an axis some distance behind the eyes so that the latter always undergo some slight translation. Such second order effects are ignored here (from Miles et al., 1991.) (B) A cartoon showing the observer's limited field of view and the kind of motion experienced during rotation about a vertical axis as the observer looks straight out to the side. The speed of optic flow is greatest at the centre ('equator') and decrements as the cosine of the angle of latitude. However, both the pattern and the speed of the optic flow at all points are determined entirely by the observer's motion -the 3-D structure of the scene is irrelevant (from Miles, 1997.) of the visual disturbances created by the observer's own motion and then the vestibular mechanisms that also operate: the fact is that the visual and vestibular mechanisms have evolved in parallel and operate in close synergy so one must always have both in mind whenever functional considerations arise.

Two kinds of optic flow
The optic flow associated with rotational and translational disturbances of the observer has distinctly different patterns. A passive observer who undergoes rotation experiences en masse motion of his/her entire visual world, the direction and the speed of the optic flow at all points being dictated solely by the observer's motion. The overall pattern of optic flow resembles the lines of latitude on a globe (see Fig. 1A) but, of course, the observer's restricted field of view means that only a portion will be visible at any given time (e. g. Fig. 1B). In principle, appropriate compensatory eye movements could completely offset the visual effects due to rotational disturbances so that the entire scene would be stabilized on the retina. (I am ignoring the second-order translational effects due to the eccentricity of the eyes with respect to the usual axis of head rotation. These are of consequence only for very close viewing.) If compensation is less than adequate -often the case -the speed of flow is reduced and the overall pattern of flow is largely preserved (provided the compensatory eye movements are in the correct direction).
If the passive observer undergoes pure translation, the optic flow consists of streams of images emerging from a focus of expansion straight ahead and disappearing into a focus of contraction behind, the overall pattern resembling the lines of longitude on a globe: see Figure 2(A). As with rotational disturbances, the direction of flow at any given point depends solely on the motion of the observer but, in contrast, the speed of the flow at any given point also depends on European Journal of Neuroscience, 10, [811][812][813][814][815][816][817][818][819][820][821][822] the viewing distance at that location: nearby objects move across the field of view much more rapidly than more distant ones' motion parallax (Gibson, 1950(Gibson, , 1966. Again, given the observer's restricted field of view, the pattern of motion actually experienced depends very much on where the observer chooses to look. If the observer looks straight ahead, as when driving a car, for example, he/she sees an expanding world (as in the cartoon in Fig. 2B), whereas off to one side, as when looking out from a fast moving train, the sensation is of the visual world pivoting around the far distance (as in the cartoon in Fig. 2C).
Appropriate compensatory eye movements can effectively eliminate the visual consequences of head rotations, but this is not the case with translations if the scene has 3-D structure because of the dependence on viewing distance. During translation, eye movements can stabilize only the images in one particular depth plane and we shall see that the problem confronting the system here is how to make that 'plane of stabilization' coincide with the plane of fixation. In the case of the observer looking out from the train and making no attempt to compensate for the motion (Fig. 2C), only the images of the most distant mountains are stable. If the observer transfers fixation to the tree in the middleground, then it is reasonable to assume that priority should now go to stabilizing the image of the tree, which requires that the observer now compensate for the motion of the train. If the observer succeeds in this then his/her visual world will now pivot about the tree (as in the cartoon in Fig. 2D). The optic flow here is a combination of translational motion (due to the motion of the train in our example) and rotational motion (due to the subject's compensatory eye movements). Of course, many other combinations of translational and rotational disturbances are possible but the situations used in the laboratory have, of necessity, been somewhat limited to date. Even so, it is already apparent that there are ultra-rapid visual compensatory mechanisms that are much more resourceful -in terms of the patterns of optic flow that they can process -than previously suspected.

Two vestibular mechanisms
The vestibular system senses motions of the head through two kinds of end organ that are embedded in the base of the skull, the semicircular canals and the otoliths, which are selectively sensitive to angular and linear accelerations, respectively (Goldberg & Fernandez, 1975). These two kinds of end organs support two vestibulo-ocular reflexes: the canals provide the information to compensate for rotations (RVOR) and the otoliths provide the information to compensate for translations (TVOR). In the case of the RVOR, ignoring the eccentricity of the eyes with respect to the axis of head rotation, perfect compensation would require simply that the output (eye rotation) match the input (head rotation), in which case the gain (given by the ratio, output/input) would be unity. However, for the TVOR to be optimally effective, its gain should accord with the proximity of the object of interest, nearby objects necessitating much greater compensatory eye movements than distant ones in order for their retinal images to be stabilized during translation. In fact, to stabilize an image off to one side (as in Fig. 2C,D), the gain of the TVOR should be inversely proportional to the viewing distance, and this has been shown to be the case for both monkeys (Paige, 1989;Schwarz et al., 1989;Bush & Miles, 1996) and humans Gianna et al., 1997). In fact, the compensatory eye movements generated by the TVOR depend on the direction of gaze with respect to the direction of heading, consistent with the idea that the system attempts to stabilize the (foveal?) images in the plane of fixation (Paige & Tomko, 1991a,b).  Miles et al., 1991). (B) A cartoon showing the centrifugal pattern of optic flow experienced by the observer who looks in the direction of heading -the black dot at the foot of the mountain (from Busettini et al., 1997). (C) The optic flow experienced by the moving observer who looks off to the right but makes no compensatory eye movements so that the visual scene appears to pivot about the distant mountains (effective infinity). The speed of image motion is inversely proportional to the viewing distance. (D) Again, the observer looks off to one side but here attempts to stabilize the retinal image of a particular object in the middle ground (tree), necessitating that he/she track to compensate for his/her own motion, thereby reversing the apparent motion of the more distant objects and creating a swirling pattern of optic flow. The scene now appears to pivot about the tree. (C, D after Miles et al., 1992b.) Thus, when gaze is in the direction of heading, so that the object of interest is directly ahead and getting closer, the TVOR converges the two eyes to keep both foveas aligned on the object. Of course, if gaze is eccentric with respect to the direction of heading during the forward motion then the responses operate to increase the eccentricity of gaze exactly in accordance with the local pattern of optic flow: if the observer's gaze is directed downwards during the forward motion then his/her compensatory eye movements have a downward component, while if gaze is directed to the right of the direction of heading then the compensatory eye movements have a rightward component, and so forth. Thus, the oculomotor consequences of vestibular stimulation are here contingent upon the gaze position.
Neither of the vestibular reflexes is perfect, hence motion of the observer must often be associated with some residual retinal image motion which is dealt with by the visual stabilization mechanisms. The vestibular system's decomposition of head movements into rotational and translational components results directly from the physical properties of the end organs in the labyrinth. However, there European Journal of Neuroscience, 10, [811][812][813][814][815][816][817][818][819][820][821][822] is no such decomposition of the optic flow by the visual end organ: the retina sees all visual disturbances and if any decomposition is to be done it must be by signal processing in the CNS. The traditional approach to visual stabilization of the eyes ignored translational problems completely and placed the observer inside a rotating drum to simulate the visual events associated with a failure of the RVOR during head turns. This elicits a pattern of tracking eye movements, often termed optokinetic nystagmus, that has two distinct components: an early component (OKNe) with brisk dynamics and a delayed component (OKNd) with sluggish dynamics (Cohen et al., 1977). Recent studies of OKNe have often employed large moving patterns backprojected on to a translucent tangent screen facing the observer (because it offers much better control of the stimulus parameters) and the responses evoked in this situation have been termed 'ocular following' . My colleagues and I have suggested that the visual system actually does attempt to separate out the rotational and translational components of optic flow, and that OKNd and ocular following/OKNe are manifestations of this, normally operating as visual backups to the RVOR and the TVOR, respectively (Schwarz et al., 1989;Busettini et al., 1991;Miles et al., 1991Miles et al., , 1992aMiles et al., , 1992bMiles & Busettini, 1992;Miles, 1993Miles, , 1995Miles, , 1997. Initial The open-loop RVOR and the closed-loop OKNd generate eye movements, Ė R , that compensate for rotational disturbances of the head, Ḣ R . These reflexes share (a) a velocity storage element, which is responsible for the slow build-up in OKN and the gradual decay in RVOR with sustained rotational stimuli, and (b) a variable gain element, G, which mediates long-term regulation of RVOR gain. SCC, semicircular canals. The element, f (s), indicates that the visual input is sensitive to low slip speeds only (from Miles et al., 1992a). (B) The open-loop TVOR and the closed-loop OKNe generate eye movements that compensate for translational disturbances of the head, Ḣ T , which affect gaze in inverse proportion to the viewing distance, d. These reflexes share (a) a variable gain element, k 1 /d, which gives them their dependence on proximity, and (b) a fixed gain element, k 2 , which generates a small response irrespective of proximity. OTO, otolith organs (from Schwarz et al., 1989). Dashed lines represent physical links: Ḣ T , head velocity in linear coordinates; Ḣ R , Ė R , Ġ R and Ẇ R , velocity of head, eyes (in head), gaze and visual surroundings, respectively, in angular coordinates. support for this idea rested largely on two observations: firstly, changes in the gain of the RVOR, which can be induced by exposure to magnifying or minifying spectacles (Miles & Fuller, 1974;Miles & Eighmy, 1980), were associated with proportional changes in the gain of OKNd but not of OKNe (Lisberger et al., 1981). Secondly, changes in the gain of the TVOR, which can be induced by simply changing the viewing distance, were associated with proportional changes in the gain of ocular following (Schwarz et al., 1989;Busettini et al., 1991. (For technical reasons, the effect of viewing distance on OKNd has yet to be examined.) Such changes in the gains of the visually driven responses were attributed to changes in central pathways that are shared with the vestibular reflexes, presumably reflecting functional synergies between the RVOR and OKNd on the one hand, and the TVOR and OKNe on the other. The block diagrams in Figure 3 illustrate the two hypothesized visuo-vestibular mechanisms dealing independently with rotational and translational disturbances.

. . .but multiple visual decoding mechanisms
The new idea here -that visual mechanisms have evolved to deal with translational as well as rotational disturbances of the observerled to the introduction of novel visual challenges that more nearly resemble those accompanying translation of the observer in everyday life. This has revealed that, indeed, ocular following/OKNe does have the ability to process patterns of optic flow that are peculiar to translation. However, it is now apparent that ocular following is only one of a family of visual stabilization mechanisms -to date, a total of three have been uncovered -that address different aspects of the various problems posed by translational disturbances. All of these mechanisms operate with ultra-short latencies (Ͻ 60 ms in monkeys, Ͻ 85 ms in humans) and machine-like consistency, and I shall present evidence in support of the hypothesis that all are mediated by cortical area MST and function independently of perception. These three visual stabilization mechanisms, which will be the focus of the remainder of this review, provide new insights into the low-level cortical processing of visual signals that encode various aspects of the 3-D structure of the world.

Ocular following
Recent experiments indicate that ocular following has special builtin features for dealing with the visual problems posed when the moving observer looks off to one side, as in C and D of Figure 2. The visual task confronting the visual stabilization mechanisms here is to single out the motion of particular elements in the scene -such as the mountain in Figure 2(C) and the tree in Figure 2(D) -and ignore all of the competing motion elsewhere. One way to achieve this would be to use attentional focusing mechanisms to spotlight the target of interest. Such mechanisms exist and are used by the socalled pursuit system but have the limitation that they require highlevel executive decisions to select the image to be tracked and this of necessity is very time consuming (Keller & Khan, 1986;Kimmig et al., 1992). The ocular following system solves this problem more expeditiously using low-level stereomechanisms that perform rapid parallel processing of binocular images, effectively sorting them on the basis of the depth plane that they occupy. This stereo algorithm, which utilizes the fact that we have two eyes with slightly differing viewpoints, is illustrated in Figure 4, which is a 'binocular' version of the cartoons in Figure 2(C,D). The object on which the two eyes are aligned (the mountain in Fig. 4A or the tree in Fig. 4B) resides in the plane of fixation and is imaged at corresponding positions on the two retinas; the object is therefore perceived as a single, fused image. In contrast, objects that are nearer or farther than the plane of fixation have images that occupy non-corresponding positions on the two retinas -they are said to have 'binocular disparity' -and are seen as double (the tree in Fig. 4A and the mountain in Fig. 4B). Clearly, a highly reliable algorithm for stabilizing gaze on objects of particular interest would be to track only those objects whose images occupy corresponding positions on the two retinas: objects in the plane of fixation. Early support for this idea was the finding that optokinetic responses are best for images with zero binocular disparity (Howard & Gonzalez, 1987;Howard & Simpson, 1989), but highlevel processing, perhaps involving selective attention, may have contributed to these studies, which examined the closed-loop, steadystate responses. However, recent experiments indicate that the very earliest ocular following responses, which are generated before there has been time for such processing to influence eye movements, show a similar preference for binocular images that lack disparity (Busettini et al., 1996a). Figure 5(A) shows this effect of disparity on the ocular following responses of a monkey to sudden movements of large-field images presented on a tangent screen facing the animal. A dichoptic viewing arrangement was used to allow the (identical) images seen by the two eyes to be positioned and moved independently. The very earliest responses have the usual ultra-short latency (about 55 ms) and are clearly at their most vigorous when the binocular images are in register on the screen, which is the plane of fixation (trace labelled '0' in Fig. 5A). Responses decrement progressively as the images are presented with more and more disparity, which in effect positions them farther and farther from the plane of fixation. The disparity tuning curve for these data, based on measures of the very earliest responses, is plotted in Figure 5(B) and has a bell-shaped profile centred on zero disparity. Human ocular following shows almost identical dependence on disparity.
Low-level, parallel processing? The above discussion indicates that the ocular following system helps to stabilize gaze on objects of interest not by selecting a particular  Figure 2(C), except that, with binocular viewing, the mountain in the plane of fixation is seen as single and the nearer tree is seen double (disparate). A plan view of the observer and the two objects is shown to the right. (B) As in Figure 2(D), except that, with binocular viewing, the tree is placed in the plane of fixation and so is seen as single whereas the distant mountain is now seen as double (disparate). Again, the plan view is shown to the right. Note that the dimensions of the eyes and their separations have been exaggerated to illustrate the disparity more clearly. In fact, disparity is much more evident with near viewing, which is also associated with the most vigorous optic flow and requires the most vigorous tracking from the observer to compensate. All of the laboratory experiments used near viewing (from Busettini et al., 1996a). one but by stabilizing the image of any object that happens to lie close to the plane of fixation, an implicit assumption therefore being that this plane contains the objects likely to be of most interest. Note that the time-consuming process of selecting the object of interest therefore rests with the oculomotor subsystems that bring images into the plane of fixation -that is, the saccadic system working in concert with the vergence system. These latter systems redirect gaze to objects using higher-level criteria whereas ocular following relies on lowlevel rapid parallel filters. Thus, the general concept is of low-level reflex systems stabilizing whatever images the high-level systems happen to bring into the plane of fixation.
In line with this idea is reliance on early visual processing, and ocular following responses have properties generally attributed to low-level motion detectors (Borst & Egelhaaf, 1993). For example, when sinewave grating patterns are used, provided the patterns are FIG. 5. Dependence of ocular following on the horizontal disparity of the moving images. (A) Mean version velocity responses of a monkey in response to downward motion (40°/s in all cases) when the images seen by the two eyes had crossed disparities whose magnitude (in degrees) is indicated by the numbers at the ends of the traces. Note that version is the average velocity of both eyes. (B) Disparity tuning curve for the ocular following responses of the same monkey. Measures based on the change in version over the time period 60-77 ms after the onset of the stimulus ramp, which was always 40°/s (includes the data in A). Monoc, mean response to same ramps with monocular viewing. Error bars, Ϯ 1SD. (C) Disparity tuning curve for a monkey whose ocular following showed a stereoanomaly for one direction of motion (rightward). The data shown are for the responses to rightward motion (80°/s in all cases). For leftward, upward, and downward motion, the curves were like that in (B) (from Busettini et al., 1996a). within the spatial frequency bandwidth of the system (Ͻ 0.5 cycles/ deg), the latency is solely a function of contrast and temporal frequency . These and other data led to the development of a model consisting of a drive mechanism that integrates the motion errors over time and a separate trigger mechanism that responds solely to changes in luminance and acts as a gate with the power of veto over the output of the drive mechanism. The trigger mechanism has a high threshold and functions to improve the signalto-noise ratio (without impeding the integration of the motion error signals) so that the system is less likely to chase spurious internal noise -a potential problem with an automatic servo mechanism with such a short latency. There is also a built-in safety mechanism that prevents the system from tracking the visual disturbances created by the subject's own saccadic eye movements as they sweep the image of the world across the retina (Miles & Kawano, 1987). This mechanism senses the rapid motion in the peripheral visual field and transiently suppresses any ocular following . In fact, rapid motion in the peripheral field of one eye can prevent the tracking of visual motion presented simultaneously to the central visual field of the other eye (interocular transfer), indicating that this saccadic suppression must take place within the CNS, beyond the point at which visual inputs from the two eyes converge.
The ultra-short latencies mean that ocular following gets under way before the subject is even aware of the stimulus that drives it. Such rapid operation is presumably one reason why ocular following is subject to detailed long-term adaptive gain control, which helps to ensure that these ultra-rapid responses are appropriately calibrated in terms of both amplitude and direction ). All of these properties are characteristic of a mechanism that operates as an automatic reflex, independently of perception.

Neural mediation
It has been known for some time that there are neurons in visual cortex as early as V1 that are selectively sensitive to images moving in a particular depth plane, their activation requiring the images to have a specific direction of motion and binocular disparity: for review see Bishop & Pettigrew (1986) and Poggio (1995). Some of these neurons respond to motion within only a narrow range of depths that European Journal of Neuroscience, 10, 811-822 can lie exactly in ('tuned zero' cells), or close to ('tuned near', 'tuned far' cells), the plane of fixation, whereas others respond to motion over a wider range of depths either inside ('near' cells), or beyond ('far' cells), the plane of fixation. Clearly, the 'tuned zero' cells would seem to be good candidates for mediating ocular following because they are selectively sensitive to images moving in the plane of fixation. However, the 'tuned zero' cells in the literature all have tuning curves with half-widths much less than a degree whereas ocular following has a half-width of a degree or two. It could be that there are 'tuned zero' cells with much broader tuning curves that have yet to be recorded -all recordings to date have been limited to parafoveal regions -but I think it also likely that other types of 'tuned' cells make a contribution. The lack of response to large disparities is also interesting because monocular stimuli were effective in generating ocular following (see the horizontal line labelled 'monoc' in Fig. 5B). This raises the question of how the responses to binocular stimuli with large disparities come to be weaker than those to monocular stimuli. One possibility might be that there is active suppression from the 'near' and 'far' cells, which are the only disparity selective neurons that have been described to date that respond to larger disparities. [Unfortunately, none of the 'near' and 'far' neurons in the literature have been examined with disparities as large as those under consideration here. Also, with a few notable exceptions -such as Poggio et al. (1988). Roy et al. (1992). and Cumming & Parker (1997) -stimuli were small bars or spots rather than large textured patterns.] Two subjects showed extremely interesting stereoanomalies. These subjects had normal disparity tuning curves for three of the four cardinal directions of motion but, for the fourth direction, their curves exhibited a pronounced dip centred on zero disparity: see Figure 5(C). This extraordinarily specific stereoanomaly is exactly the sort of deficit that one would expect if the subject lacked only 'tuned zero' cells with a preference for motion in one particular direction. Such a seemingly cell-specific anomaly tempts one to think in terms of a naturally occurring gene knockout. Regardless of the aetiology of the deficit, its specificity lends strong support to the idea that these ocular following responses are mediated by neurons that signal motions in particular directions and depth planes.

Eye movements and visual processin 817
Although the stereoanomalies point to dependence on low-level disparity mechanisms -perhaps as early as striate cortex -there is strong evidence that ocular following derives at least some of its input from much later stages in the dorsal stream of cortex (Ungerleider & Mishkin, 1982) where motion is processed: chemical lesions in MST result in impairments of even the earliest components  and single unit recordings in this region indicate the presence of many directionally selective neurons that discharge in close relation to the large-field, high-speed motion stimuli that are optimal for eliciting ocular following (Kawano et al., 1994). Also, many of the neurons discharge early enough to have a causative role. There are data indicating selectivity for binocular disparity as well as motion in neurons of MT (Maunsell & Van Essen, 1983b) and MST (Roy & Wurtz, 1990;Roy et al., 1992) but stimuli optimal for ocular following were not tried in these studies. The suggestion has been made that MT, which provides a major part of the visual input to MST, receives its earliest visual inputs directly from subcortical areas rather than through striate cortex (Beckers & Zeki, 1995;ffytche et al., 1995). However, the significance of this direct subcortical input has been disputed (Barton & Sharpe, 1997).
It has been known since the classic study of Hubel & Wiesel (1965) that early strabismus results in a loss of binocularity in striate cortex neurons, and other manifestations include naso-temporal asymmetries in monocular OKN and smooth pursuit eye movements: for recent review, see Schor (1993) and Norcia (1996). It would be interesting to know if early disruptions of binocular vision affect initial ocular following.

Radial flow vergence
The visual challenge considered in the previous section on ocular following was that confronting the moving observer who looks off to one side. I now consider the gaze stability problems of the moving observer who looks in the direction of heading and so experiences the radial pattern of optic flow featured in Figure 2(A,B). What is required of the oculomotor system here? Insofar as the radial pattern of flow is associated with a change in viewing distance, the observer must converge his/her eyes if the object of interest in the scene ahead is to stay imaged on both foveas. Of course, the amount of convergence required to maintain binocular alignment is inversely related to the viewing distance, hence the greatest challenge comes with near viewing. Recent experiments on humans  have indicated that radial optic flow elicits vergence eye movements at latencies that are closely comparable with the ultra-short values mentioned above for human ocular following (µ80 ms). Centrifugal flow, which signals a forward approach and hence a decrease in the viewing distance, resulted in increased convergence, and centripetal flow, which signals the converse, resulted in decreased convergence . (A sample vergence response profile elicited by centrifugal flow can be seen in Fig. 6.) The clear suggestion here is that the brain is able to sense the radial pattern of flow and to infer from this that there has been a change in viewing distance. However, a characteristic of the ocular responses to these radial flow patterns is that each eye always moves in the direction of the net motion vector in the nasal hemifield, and this allows an alternative and less interesting explanation for the responses: the vergence might result from monocular tracking, in which each eye tracks only the motion that it sees and with a preference for motion in the nasal hemifields. For example, with centrifugal flow the net motion vector in the nasal hemifields is towards the nose and each eye moves in that direction, hence the increased convergence. That this was not the explanation was apparent European Journal of Neuroscience, 10, 811-822 FIG. 6. The initial vergence eye movements elicited by radial optic flow: effect of masking off various parts of the binocular visual field (human subject). The inset cartoons indicate the extent of the masks: no mask ('binoc full field'), left eye masked ('monoc full field'), both nasal hemifields masked ('binoc hemifields'), and all but one temporal hemifield masked ('monoc hemifield'). In addition to showing the vergence velocity profiles (the difference in the velocity of the two eyes), also shown are the velocity profiles for each of the two eyes, and the version velocity (the average velocity of the two eyes). Images were random dot patterns back-projected onto a large tangent screen facing the subject. Stimuli were looming steps simulating a sudden 4% reduction in viewing distance, which was achieved by switching between two projected images, the switch occurring at time zero. Calibration bar, 2°/s (from Busettini et al., 1997). from the observation that binocular vergence responses persisted, albeit weaker, when various parts of the radial flow patterns seen by the two eyes were masked off, including the whole of one eye (see traces labelled 'monoc full field' in Fig. 6), or both nasal hemifields ('binoc hemifields' in Fig. 6), or one whole eye plus the remaining nasal hemifield so that the only parts of the patterns now visible were those seen by the right temporal hemifield ('monoc hemifield' in Fig. 6). Note that in the last two cases each eye actually moves in the opposite direction to any net motion vector that it sees. Thus, it was concluded that the vergence responses result from a true parsing of the radial pattern of flow.
These data also imply something about the neural decoding of planar optic flow (such as that in Fig. 1B and, to a lesser degree, Fig. 2C): when the observer's view was limited to a single temporal hemifield there was a strong net motion vector (to the right in Fig. 6) yet it is clear that the system still correctly attributed the flow to forward rather than to leftward motion (or rotation) of the observer because it responded with convergence rather than rightward (conjugate) ocular following. [Conjugate oculomotor responses, such as ocular following, are more readily appreciated from plots of version, which is the average movement of the two eyes and is therefore insensitive to changes in vergence, than from the movements of the individual eyes. In fact, version and vergence provide an alternative, equally complete, representation of eye movements and might be more indicative of the way that eye movements are encoded in the cortical regions under consideration here.] An important point here is that the system not only produces the appropriate vergence responses but avoids making inappropriate version responses despite the net motion vector to the right: The presence of vertical motion -even though there is no net vertical vector -is clearly sufficient to block the version responses.

Low-level, parallel processing?
The above discussion indicates that there are neurons or networks that act like templates or tuned filters to detect specific patterns of optic flow and generate appropriate oculomotor responses to serve the needs of visual stabilization. Once more, latencies are extremely short so that the system must depend on parallel processing to arrive at an appropriate response based on the pattern of optic flow. Thus, again the system helps to stabilize gaze on objects of interest not by selecting a particular image but by sensing the global pattern of flow. Once again the general concept is of low-level reflex systems responding appropriately to whatever region of the optic flow field is brought into view by the high-level saccadic system.

Neural mediation
There is extensive evidence that area MST in the monkey's cortex contains neurons that are selectively sensitive to radial optic flow patterns such as those now known to evoke vergence eye movements at ultra-short latencies (Saito et al., 1986;Duffy & Wurtz, 1991a,b;Lagae et al., 1994;Duffy & Wurtz, 1995;Lappe et al., 1996;Pekel et al., 1996). In fact, MST is the first stage in this dorsal pathway at which global flow is encoded at the level of single cells: at earlier stages, such as MT, individual cells have much smaller receptive fields and encode only local motion (Van Essen et al., 1981;Maunsell & Van Essen, 1983a;Albright & Desimone, 1987;Komatsu & Wurtz, 1988;Albright, 1989;Lagae et al., 1994). Mention has already been made of the evidence indicating that ocular following (version) responses to planar flow in the frontoparallel plane of fixation are mediated at least in part by MST. The new observations with radial flow patterns indicate that the mechanism mediating these version responses is blocked by orthogonal motion, hence it is no surprise that putative ocular following neurons in MST are suppressed by non-preferred motion, i.e. motion in the opposite or orthogonal direction (Duffy & Wurtz, 1991a).

Disparity vergence
When the moving observer looks in the direction of heading, radial optic flow is only one of several cues which indicate the forward rate European Journal of Neuroscience, 10, 811-822 of progress. The possibility therefore exists that these additional cues might also elicit vergence eye movements at ultra-short latencies. One such cue is the apparent change in size of the objects as the observer approaches them, but it is known that this elicits convergence only at pursuit latencies, generally estimated to be in excess of 200 ms Cohen & Lisberger, 1996;Busettini et al., 1997). Another cue, however, is very potent at generating vergence at ultra-short latencies: binocular disparity. If the observer were to move forward without converging his/her eyes adequately then the object of regard would be overtaken and repositioned inside the plane of fixation where it would be imaged at non-corresponding positions on the two retinas (so-called crossed disparity). Recent experiments have demonstrated that when random-dot patterns are viewed dichoptically and small binocular misalignments are suddenly imposed (disparity steps), corrective vergence eye movements are elicited at latencies of Ͻ 60 ms in monkeys (Busettini et al., 1996b) and Ͻ 85 ms in humans , values closely comparable with those for ocular following and radial flow vergence. Crossed disparity steps elicited increased convergence and uncrossed steps decreased convergence, exactly as expected of a depth-tracking servo mechanism driven by disparity. However, once more it is necessary to prove that this is truly a response to a binocularly processed visual signal and not the result of monocular tracking in which each eye merely tracks the apparent motion that it sees (towards or away from the nose). That these responses could not be the result of monocular tracking is evident from experiments in which the disparity step was confined to one eye (Busettini et al., 1996b). For example, when the crossed disparity step was confined to the right eye (which saw a leftward step), the result was (binocular) convergence in which the left eye moved rightward even though that eye had seen only a stationary pattern. The (rightward) movement of the left eye here is in the direction expected of a stereoscopic mechanism that responds to a binocular misalignment but is in the opposite direction to the only available motion cues -the leftward motion at the right eye.
The range of disparities over which the system behaves like a servo mechanism, that is, the range over which increases in the disparity vergence error result in roughly linear increases in the vergence response, is Ͻ 2 degrees. Thus, this vergence mechanism can correct only small misalignments of the two eyes, commensurate with a mechanism that performs only local stereo matches and merely attempts to bring the nearest salient images into the plane of fixation. During forward locomotion this mechanism will help to prevent images from leaving the plane of fixation. [This disparity vergence mechanism is in a somewhat different category from ocular following and radial flow vergence insofar as its primary function is to eliminate small vergence errors, evidenced by the fact that it also operates in the vertical axis using vertical disparity, which is unrelated to depth and translation per se (C. Busettini, G. S. Masson and F. A. Miles, unpublished observations). While the specific involvement with vergence errors resulting from locomotion is clear, this is only a secondary function.] Once more, we have a mechanism that functions as a low-level automatic servo and is not involved in high-level operations like the transfer of fixation to new images in new depth planes, which requires time-consuming target selections, and (often) the decoding of large disparity errors (Ͼ 10 degrees) that necessitate solution of the correspondence problem. Recent experiments  have shown that vergence responses can also be elicited at ultra-short latencies by disparity stimuli applied to dense (50%) anticorrelated binocular patterns, in which each black dot in one eye is matched to a white dot in the other eye. Figure 7(A) shows sample FIG. 7. The vergence eye movements elicited by disparity step stimuli applied to random-dot patterns using a dichoptic viewing arrangement to allow separate stimulation of each eye. (A), Mean vergence velocity responses of a monkey to crossed disparity stimuli applied at time zero to correlated (continuous line) and anticorrelated (dotted line) patterns, with stimulus magnitudes (in deg) indicated at the ends of the traces. The cartoons indicate only the general form of the patterns seen by the left (LE) and right (RE) eyes -those actually used had higher dot density (50%), each dot being 2 degrees in diameter, and the whole image extended over 80 ϫ 80 degrees. (B) Plot of the mean (Ϯ SD) changes in vergence position (over time period 60-93 ms from stimulus onset) against the disparity stimulus, with correlated (filled circles) and anticorrelated (open circles) patterns. The normal disparity tuning curves have an s-shape, the linear (servo) region being restricted to disparities Ͻ Ϯ 2 degrees. The curves are the best fitting Gabor functions and the cosine terms for the correlated and anticorrelated data differ in phase by about 180 degrees (from Masson et al., 1997). mean vergence velocity profiles in response to crossed disparity stimuli applied to correlated and anticorrelated patterns. Note that the vergence responses to the anticorrelated stimuli are in the reverse European Journal of Neuroscience, 10, 811-822 direction of those to the correlated stimuli. The disparity tuning curves for these data are shown in Figure 7(B), the curve obtained with the normal correlated patterns having a characteristic s-shape that is well fitted by a Gabor function. The curve for the anticorrelated data is almost a mirror image, and the cosine term for the best-fit Gabor function is phase shifted almost exactly 180 degrees. In twoalternative-forced-choice tests, subjects could readily discriminate between crossed and uncrossed disparities when applied to the correlated patterns but not when applied to the anticorrelated patterns . This is consistent with the idea that these short-latency vergence responses derive their visual input from an early stage of cortical processing prior to the level at which depth percepts are elaborated. Actually, the large-field stimuli used in all of these disparity vergence studies contain only absolute disparity cues, which are known to be poorly perceived in depth (Erkelens & Collewijn, 1985a;Erkelens & Collewijn, 1985b;Regan et al., 1986).

Neural mediation
As already mentioned earlier, neurons sensitive to binocular disparity have been described in various regions of the visual cortex, and these have often been invoked as the source of the error signals driving disparity vergence. However, in discussing the sensitivity of ocular following to disparity we were concerned with neurons that were selective for motion as well as disparity and had a preference for the plane of fixation (zero disparity). Now, we are presumably concerned with disparity selective neurons that have no particular motion preferences (except perhaps for motion in depth) and that discharge to nonzero disparities, thereby effectively encoding vergence error. Many such neurons have been described in visual cortex as early as V1 (see Poggio, 1995 for recent review), and in the dorsal stream, including MT (Maunsell & Van Essen, 1983b) and MST . At recording levels up to MT, these neurons have been classified as 'near', 'far' 'tuned near' and 'tuned far', depending on the range of disparities over which they are active. However, in MST some of the neurons that discharge in close relation to the large-field binocular stimuli used to elicit short-latency disparity vergence have disparity tuning curves that do not readily conform to any of these categories but exactly match the broad s-shaped curves characteristic of the vergence responses (such as those seen in Fig. 7B). Once more, it seems that we have a short-latency oculomotor response that relies on signals that first occur in their entirety at the level of single cells in MST. Apropos the reversed vergence responses to anticorrelated patterns, many disparity-selective neurons in the monkey's striate cortex also respond to these patterns, despite the fact that monkeys (like humans) fail to perceive depth in them, and many of these neurons show inverted disparity tuning curves (Cumming & Parker, 1997). This response inversion is in accord with a local filter model of disparity selective complex cells (Ohzawa et al., 1990).

Closing remarks
The three visual stablization mechanisms share a number of featuresnotably, ultra-short latencies and a special involvement with translational disturbances -and this has led to the suggestion that they constitute a family of reflexes : see Table 1 for a listing of their fundamental similarities and differences. All three mechanisms also show a property that has been termed postsaccadic enhancement, whereby stimuli applied in the immediate wake of a saccadic eye movement are much more effective than the same stimuli applied a few hundred milliseconds later Busettini et al., , 1997. In addition, a similar enhancement The three visual tracking mechanisms have been grouped according to the type of translation that generates the visual challenge to which they respond optimally: with X/Y-translation the observer moves from side-to-side or up-down, tending to produce motion in the frontoparallel plane (the plane of fixation); with Z-translation the observer moves from front-to-back or vice versa, tending to produce radial optic flow and horizontal disparity. * C. Busettini, G. S. Masson and F. A. Miles (unpublished observations). occurs in the wake of a saccade-like shift of the visual scene, indicating that, in all cases, the enhancement is at least partly visual in origin and due to the visual reafference created by the saccade sweeping the image of the world across the retina. This transient priming of the three reflexes on completion of the saccade is very timely, coming when the need to re-establish retinal image stabilityby eliminating any residual retinal slip or binocular misalignmentis paramount. By boosting performance only transiently (with a time constant approximating that of the oculomotor plant) these control systems avoid the instability problems generally associated with excessively high gain . A few remarks are in order concerning the 'gain' of the various reflexes. All of the studies rely on performance measures that assess only the initial open-loop responses, which are those generated by the input stimulus -planar motion, radial flow or disparity -before that stimulus has been affected by eye movement feedback. As we have seen, under these special conditions we can use the stimulus-response relationships to infer the sensory-motor processing. However, because the latencies are so short, the measurement period must be correspondingly short and the eyes do not have time to reach asymptote, so that a steadystate 'gain' estimate, in the strict engineering sense, is not feasible. Further, the measured movements are so small that, in themselves, they can have little functional significance: they are of interest primarily for what they can tell us about the underlying visual processing, and can only hint at the system's likely functional potential.
To the extent that the three mechanisms are members of a family, one might hope to generalize from one to another. To date, only one has been shown to operate independently of perception (disparity vergence) but I would expect the same of the other two. Likewise, only ocular following has been shown to be subject to adaptive gain control, though I assume that all could benefit and probably do. Ocular following and disparity vergence both show dependence on (the inverse of) the viewing distance. If such dependence on proximity is also characteristic of the vergence resulting from radial flow then it might help to explain how one avoids converging one's eyes when passing through a doorway: the image of interest and the plane of fixation lie some distance beyond the doorway, resulting in a low gain that effectively vetoes a vergence response to the centrifugal European Journal of Neuroscience, 10, 811-822 flow generated by the doorframe. I suspect that the dependence on distance reflects a major involvement with near viewing because it is the retinal images of the nearest objects that are most sensitive to the observer's movements and therefore offer the greatest challenge to ocular alignment and stabilization. In other respects, the tracking systems might be different but still complementary. An example is the transient vergence elicited by steps of radial flow (a velocity servo) and the tonic vergence elicited by steps of disparity (a position servo).
Hard evidence that MST is critical for these mechanisms is currently available only for ocular following. However, it is surely not fortuitous that MST contains neurons that discharge in association with the stimuli that selectively activate each of the three mechanisms and, intriguingly, represents the first stage in the cortical processing pathway at which the adequate stimulus for each system is fully encoded at the level of individual neurons. The stereoanomalies that are selective for both disparity and direction of motion are especially intriguing, but perhaps not surprising in a system that depends critically on the filtering properties of particular cell types in cortex: Any disease or genetic factor that singles out particular cell types might be expected to produce such curious 'scotomas'.