Vestibular System: the Many Facets of a Multimodal Sense

Elegant sensory structures in the inner ear have evolved to measure head motion. These vestibular receptors consist of highly conserved semicircular canals and otolith organs. Unlike other senses, vestibu-lar information in the central nervous system becomes immediately multisensory and multimodal. There is no overt, readily recognizable conscious sensation from these organs, yet vestibular signals contribute to a surprising range of brain functions, from the most automatic reflexes to spatial perception and motor coordination. Critical to these diverse, multimodal functions are multiple computationally intriguing levels of processing. For example, the need for multisensory integration necessitates vestibular representations in multiple reference frames. Proprioceptive-vestibular interactions, coupled with corollary discharge of a motor plan, allow the brain to distinguish actively generated from passive head movements. Finally, nonlinear interactions between otolith and canal signals allow the vestibular system to function as an inertial sensor and contribute critically to both navigation and spatial orientation.


INTRODUCTION
Known as the balance organs of the inner ear, the vestibular system constitutes our sixth sense.Three roughly orthogonal semicircular canals sense rotational movements, and two otolith organs (the utricle and the saccule) sense linear accelerations.Vestibular afferents are continuously active even at rest and are strikingly sensitive for signaling motion accelerations as our head translates and rotates in space.Even when we remain motionless, the otolith organs sense the pull of gravity (a form of linear acceleration).The signals from the semicircular canals and the otolith organs are complementary; their combined activation is necessary to explore and comprehend the enormous range of physical motions experienced in everyday life.
The vestibular system differs from other senses in many respects.Most notably, central vestibular processing is highly convergent and strongly multimodal.For example, canal/otolith interactions take place in the brain stem and cerebellum immediately at the first synapse.Also, visual/vestibular and proprioceptive/vestibular interactions occur throughout the central vestibular pathways and are vital for gaze and postural control.Signals from muscles, joints, skin, and eyes are continuously integrated with vestibular inflow.Because of the strong and extensive multimodal convergence with other sensory and motor signals, vestibular stimulation does not give rise to a separate and distinct conscious sensation.Yet, the vestibular system plays an important role in everyday life because it contributes to a surprising range of functions, ranging from reflexes to the highest levels of perception and consciousness.
Experimental approaches in the vestibular system were traditionally framed by the perspective of the sensorimotor transformations required for reflex generation (for recent reviews see Angelaki 2004, Angelaki & Hess 2005, Boyle 2001, Cullen & Roy 2004, Raphan & Cohen 2002, Wilson & Schor 1999).Techniques based on control systems theory have been used to establish the sensorimotor transformations by which vestibular information is transformed into a motor output.This approach followed logically from the influential theory of a reflex chain made popular more than a century ago by Sherrington (1906).By recording from individual neurons at each successive stage in a reflex pathway (reviewed in Goldberg 2000), quantitative levels of sensorimotor processing were established.Using this approach, studies in reduced or in-vitro preparations have provided important insights into the functional circuitry, intrinsic electrophysiology, and signal processing of vestibularly driven reflexes.The relative simplicity of the neural circuits that mediate vestibular reflexes have also proven to be well suited for linking systems and cellular levels of analyses.
A unique feature of the vestibular system is that many second-order sensory neurons in the brain stem are also premotor neurons; the same neurons that receive afferent inputs send direct projections to motoneurons.An advantage of this streamlined circuitry is that vestibular sensorimotor responses have extraordinarily short latencies.For example, the latency of the vestibulo-ocular reflex (VOR) is as short as 5-6 ms.Simple pathways also mediate the vestibulo-spinal reflexes that are important for maintaining posture and balance.Recent studies, however, have emphasized the importance of extravestibular signals in shaping even these simple sensorimotor transformations.Moreover, multisensory and multimodal interactions play an essential role in higherlevel functions such as self-motion perception and spatial orientation.Largely owing to their inherent complexity and strongly multimodal nature, these very intriguing vestibular-related functions have just begun to be explored.
The vestibular system represents a great forum in which to address several fundamental questions in neuroscience: multisensory integration, changes of coordinate systems, separation of active from passive head movements, and the role of corollary discharge.In this review we discuss some of these issues.We first summarize recent work addressing how semicircular canal and otolith signals interact to compute inertial motion (i.e., motion of the head relative to the world) and then explore how vestibular information converges with proprioceptive and other extravestibular signals to distinguish self-generated from passive head movements.Finally, we present a few examples showing that vestibular signals in the brain are expressed in multiple reference frames, a signature of a truly multimodal and multifunctional sense.To date, these processes have been characterized most extensively in the brain stem vestibular nuclei and vestibulo-cerebellum, the two areas

COMPUTATION OF INERTIAL MOTION
The vestibular system constitutes an inertial sensor, i.e., it encodes the motion of the head relative to the outside world.However, this is not precisely true when considering signals separately from the semicircular canal and otolith organs.There exist two problems in interpreting information from the peripheral vestibular sensors.First is the rotation problem, which arises because vestibular sensors are physically fixed in the head.During rotation, semicircular canal afferents detect endolymph fluid motion relative to the skull-fixed bony ducts (Goldberg & Fernandez 1975), coding angular velocity (the integral of angular acceleration) in a headcentered reference frame but providing no information about how the head moves relative to the world.For example, a horizontal (yaw) rotation in an upright orientation activates canal afferents similarly as a yaw rotation in a supine orientation (Figure 1a).Yet, these two movements differ in inertial (i.e., world-centered) space.The second problem, referred to as the linear acceleration problem, is due to a sensory ambiguity that arises because of physical laws, namely Einstein's equivalence principle: Otolith afferents detect net linear acceleration but cannot distinguish translational from gravitational components (Figure 1b) (Fernandez & Goldberg 1976).That is, whether we actually walk forward or tilt our head backward is indistinguishable to primary otolith afferents.
Whereas each of the two sets of vestibular sensors alone is ambiguous, the brain can derive a reliable estimate of both attitude (i.e., orientation) and motion relative to the world by appropriately combining semicircular canal and otolith signals (Merfeld 1995;Glasauer & Merfeld 1997;Angelaki et al. 1999;Mergner & Glasauer 1999;Merfeld & Zupan 2002;Zupan et al. 2002;Green & Angelaki 2003, 2004; Angular velocity, ω, from the semicircular canals must be combined with gravitational information to be decomposed into two components, one parallel to gravity, ω EV , and another perpendicular to gravity, ω EH .In parallel, net linear acceleration, α, from the otolith organs must be combined with the temporal integral of ω EH ( ω EH ), such that it is separated into translational, t, and gravitational acceleration, g.Replotted with permission from Yakusheva et al. (2007).Green et al. 2005).The mathematical solution (for details see Green andAngelaki 2004, Green et al. 2005), schematized in Figure 1c, consists of two interdependent steps (Yakusheva et al. 2007).First, rotational signals from the semicircular canals (ω, coded relative to the head) must interact with a gravity signal (g) to construct an estimate of angular velocity relative to the world.Angular velocity can then be decomposed into two perpendicular components: an earth-vertical (i.e., parallel to gravity) component, ω EV , and an earth-horizontal (perpendicular to gravity) component, ω EH (Figure 1c, left).The former (ω EV ) signals only those rotations that do not change orientation relative to gravity (e.g., yaw from upright).The latter (ω EH ) signals a rotation that changes head orientation relative to gravity (e.g., pitch/roll from upright).Temporal integration of ω EH ( ω EH ) can yield an estimate of spatial attitude or tilt.
In a second computational step this tilt signal, ω EH , can be combined with net linear acceleration from the otolith organs, α, to extract the linear acceleration component that is due to translation, t (Figure 1c, right).The logic behind these computations is simple: Using signals from the semicircular canals, the brain can generate an internal estimate of the linear accelerations that should be detected by the otolith system during head tilts relative to gravity.This signal can then be subtracted from the net activation of primary otolith afferents.Whatever is left is then interpreted as translational motion.

Evidence for Allocentric Coding of Angular Velocity
Strong evidence for a world-centered representation of rotational signals comes from reflexive eye movement studies and, in particular, a process known as the velocity storage mechanism, which dominates the rotational VOR at low frequencies (Raphan et al. 1979).In particular, eye velocity during low frequency rotation is driven by semicircular canal signals that have been spatially transformed to align with gravity (i.e., they represent an ω EV signal; Merfeld et al. 1993Merfeld et al. , 1999;;Angelaki & Hess 1994;Angelaki et al. 1995;Wearne et al. 1998;Zupan et al. 2000).More recently, Fitzpatrick et al. (2006) demonstrated that a world-referenced angular velocity signal is also available for perception and balance.Using galvanic (electrical) stimulation of vestibular receptors in the inner ear, Fitzpatrick, Day, and colleagues evoked a virtual rotation as subjects walked in the dark.Depending on head orientation, the authors could either steer walking or produce balance disturbances, concluding that the brain resolves the canal signal according to head posture into world-referenced orthogonal components.Each of these components could have a potentially different function: Rotations in vertical planes (i.e., an ω EH signal) can be used to control balance, whereas rotations in the horizontal plane (i.e., an ω EV signal) can be used primarily for navigation.In this particular experiment, such computation could be performed either entirely by vestibular signals or through contributions from both vestibular and nonvestibular estimates of head orientation (e.g., derived from somatosensory and motor information).
In line with such decomposition, whereby ω EH contributes to orientation and balance and ω EV contributes to navigation, is also a role of vestibular signals in the generation of head direction cell properties in the limbic system (for a recent review, see Taube 2007).Although details about the neural implementations are still Velocity storage mechanism: the prolongation of the vestibular time constant during rotation compared with that in the vestibular eighth nerve Navigation: the ability to move appropriately and purposefully through the environment missing, vestibular-driven angular velocity appears essential for generating the head direction signal (Stackman & Taube 1997, Muir et al. 2004).However, head direction cell firing is dependent only on the earth-vertical component of angular velocity (Stackman et al. 2000).These results (see also Calton & Taube 2005) suggest that the head direction signal is generated by temporal integration of an ω EV (rather than ω) signal.

Evidence for Segregation of Head Attitude and Translational Motion
That the brain correctly interprets linear acceleration is obvious from everyday activities.As we swing in the play ground, for example, a motion that includes changes in both head attitude and translation, we properly perceive our motion.This is true even when our eyes are closed (thus excluding important visual cues).Quantitative evidence that a solution to the linear acceleration problem can exist using otolith/canal convergence comes from monkey and human studies.Angelaki and colleagues (Angelaki et al. 1999, Green & Angelaki 2003) showed that an extraotolith signal does contribute to the compensatory eye movements during mid/high frequency translation (>0.1 Hz).In line with the schematic of Figure 1c, these signals arise from temporal integration of angular velocity from the semicircular canals (Green & Angelaki 2003).Parallel studies by Merfeld and colleagues focused on low-frequency motion stimuli; they showed the contribution of canal cues in generating a neural estimate of translation by exploring erroneous behavioral responses (eye movements and perception) that are typically attributed to the velocity storage mechanism (Merfeld et al. 1999(Merfeld et al. , 2001;;Zupan et al. 2000).Tilt-translation ambiguities are not properly resolved at these lower frequencies because the semicircular canals do not provide a veridical estimate of angular velocity at low frequencies (<0.1 Hz) or when the head is statically tilted.
Similarly, the tilt-translation ambiguity is not always correctly resolved at the perceptual level; low-frequency linear accelerations in the absence of other, extravestibular cues are incorrectly interpreted as tilt even when generated by translational motion.Thus the ability to discriminate between tilt and translation based solely on vestibular cues (e.g., during passive motion in darkness) deteriorates at low frequencies (Glasauer 1995;Seidman et al. 1998;Merfeld et al. 2005a,b;Kaptein & Van Gisbergen 2006).In fact, it is typically at these low frequencies that perceptual illusions occur (e.g., "somatogravic and oculogravic" illusions, often elicited during airplane landing and take-off; Graybiel 1952;Clark & Graybiel 1963, 1966).Under these circumstances extravestibular information (e.g., visual signals) is necessary to avoid illusions.For example, visual cues can significantly influence our percept of head orientation relative to gravity (Dichgans et al. 1972, Howard & Hu 2001).In addition, visual rotational cues contribute to estimation of inertial motion because they can substitute for canal-driven angular velocity information (Zupan & Merfeld 2003, McNeilage et al. 2007).

Neural Substrates for Inertial Motion Detection
To characterize whether and how neurons use canal and otolith information to separate ω EV and ω EH , and to distinguish translational from gravitational accelerations, otolith afferents and central neurons have been studied during combinations of tilt and translation stimuli, as shown in Figure 2. Stimulus conditions included translation only (e.g., left/right motion), tilt only (e.g., sinusoidal tilt toward right/left ear down without linear displacement), and combinations of the two (tilt -translation and tilt + translation, illustrated by cartoon drawings, Figure 2, top).In the most important of these stimulus combinations (tilt -translation motion), roll tilt and translation stimuli were carefully matched to ensure that the gravitational and translational components of acceleration along the interaural axis canceled each other out.In this case, the body translated in space, but there was no net lateral linear acceleration stimulus to the otolith receptors.As expected (Fernandez & Goldberg 1976), primary otolith afferents encoded net linear acceleration, modulating similarly during translation and tilt (thereby emphasizing the linear acceleration ambiguity; Figure 1b).Note that during the tilt -translation stimulus condition, primary otolith afferents transmit no information about the subject's translation (i.e., there is no sinusoidal modulation of firing rate) because net linear acceleration along the axis of motion is zero.

Vestibular nucleus neuron
In contrast with primary otolith afferents, many central neurons selectively encode translational motion and remain relatively insensitive to changes in head orientation relative to gravity.For example, the vestibular nucleus neuron illustrated in Figure 2 (bottom) modulated little during its firing rate the tilt-only condition, whereas combined motion protocols resulted in cell activation similar to that during pure translation.Neurons that selectively encode translation rather than net acceleration were found not only in the vestibular nuclei (VN) but also in the rostral fastigial (FN) cerebellar nuclei (Angelaki et al. 2004), as well as in the nodulus and uvula (NU) of cerebellar cortex (vermal lobules X and IX; Yakusheva et al. 2007).Results from all three areas are summarized and compared with primary otolith afferents in Figure 3, which illustrates partial correlation coefficients describing the degree to which responses to these stimuli corresponded to neural coding of translation (ordinate) or net acceleration (abscissa).Data points falling in the upper left quadrant represent neurons that were significantly more translation coding than afferent like ( p = 0.01; dashed lines).In contrast, cells in the lower right quadrant were significantly more afferent like than trans- lation coding.VN and FN neurons tended to span the whole range, in contrast with NU Purkinje cells, all of which fell in the upper left quadrant (Angelaki et al. 2004, Yakusheva et al. 2007).Inactivation of the canals completely eliminated the presence of translation-coding cells.All neurons in the VN/FN/NU became afferent like and encoded net linear acceleration after canal inactivation (Shaikh et al. 2005, Yakusheva et al. 2007).This occurs because, in addition to an otolith input, these neurons also receive a semicircular canal-driven signal.Yakusheva et al. (2007) showed that, in line with Figure 1c, this canal-driven signal in the nodulus/uvula has been processed relative to canal afferents in two important aspects: (a) It represents an ω EH (rather than ω) signal.
Accordingly, Purkinje cells modulate only during canal activation involving rotations that change orientation relative to gravity, e.g., during pitch and roll in upright orientation, but not during pitch/roll in ear-down/supine orientation (Yakusheva et al. 2007).(b) This canaldriven, spatially transformed signal has been temporally integrated, thus coding head position relative to gravity ( ω EH , rather than rotational velocity).Such an earth-centered estimate of head attitude is then subtracted from net linear acceleration provided by the otoliths and used to estimate inertial motion during navigation.Next we show that neurons in these same areas (VN and FN) seem to be performing another important function: distinguishing between rotations that are self-generated and those that are externally applied.A simplified schematic of Von Holst & Mittelstaedt's reafference principle applied to the vestibular system.A motor command is sent to the effector muscle, and in turn, sensory activation, resulting from the effector's activation of the vestibular sensors, is returned.This reafference is then compared with an efference copy of the original motor command.Here, reafference is arbitrarily marked (+), and the efference copy is marked (−).When the reafference and efference copy signals are of equal magnitude, they cancel, and no sensory information is transmitted to the next processing levels.In contrast, a difference between reafference and efference copy indicates an externally generated event (i.e., exafference) that is considered behaviorally relevant and is thus further processed.

DISTINGUISHING PASSIVE FROM ACTIVE HEAD MOVEMENTS
The ability to navigate and orient through the environment requires knowledge not only of inertial motion, but also of which components of vestibular activation result from active (i.e., self-generated) and passive (i.e., externally applied) movements.How does the brain differentiate between sensory inputs that arise from changes in the world and those that result from our own voluntary actions?This question concerned many eminent scientists of the past century, including Helmholtz, Hering, Mach, and Sherrington.For example, Von Helmholtz (1925) made the salient and easily replicated observation that although targets rapidly jump across the retina as we move our eyes to make saccades, we never see the world move over our retina.Yet, tapping on the canthus of the eye to displace the retinal image (as during a saccadic eye movement) results in an illusory shift of the visual world.More than 50 years ago, Von Holst & Mittelstaedt (1950) proposed the principle of reafference (Figure 4), in which a copy of the expected sensory results of a motor command (termed reafference) is subtracted from the actual sensory signal to create a perception of the outside world (termed exafference).Thus, the nervous system can distinguish sensory inputs that arise from external sources from those that result from self-generated movements.More recent behavioral investigations have generalized this original proposal by suggesting that an internal prediction of the sensory consequence of our actions, derived from motor efference copy, is compared with actual sensory input (Wolpert et al. 1995, Decety 1996, Farrer et al. 2003).In line with this proposal, work in several model systems, including the electrosensory systems of mormyrid fish (Bell 1981, Mohr et al. 2003) and elasmobranchii (i.e., sharks, skates, and rays; Hjelmstad et al. 1996), the mechanosensory system of the crayfish (Krasne & Bryan 1973, Edwards et al. 1999), and the auditory system of the cricket (Poulet & Hedwig 2003, 2006), demonstrated that sensory information arising from self-generated behaviors can be selectively suppressed at the level of afferent fibers or the central neurons to which they project.

Differential Processing of Active Versus Passive Head Movement
Until recently, the vestibular system had been exclusively studied in head-restrained animals by moving the head and body together (reviewed in Cullen & Roy 2004).Thus be-cause neuronal responses were driven by an externally applied stimulus, our understanding of vestibular processing was limited to the neuronal encoding of vestibular exafference.More recently, investigators in the field have overcome the technical difficulties associated with recording single-unit responses during self-generated head movements.As shown in Figure 5, whereas vestibular afferents reliably encode active movements (Cullen & Minor 2002, Sadeghi et al. 2007) In the vestibular system, second-order neurons distinguish between sensory inputs that result from our own actions from those that arise externally.Representation of the activity of a horizontal canal afferent (left panel ) and VN neuron (right panel ) during (a) passive head movements, (b) active head movements, and (c) combined active and passive head movements.Afferents reliably encode head motion in all conditions.In contrast, VO neurons show significantly attenuated responses to the active component of head motion, but remain responsive to active head movements during combined stimulation.et al. 1999, Roy & Cullen 2001).What is even more striking is that these same secondorder vestibular neurons continue to respond selectively to passively applied head motion when a monkey generates active head-onbody movements (Figure 5c).Thus, consistent with Von Holst & Mittelstaedt's original proposal, vestibular information arising from self-generated movements is selectively suppressed early in sensory processing to create a neural representation of the outside world (i.e., vestibular exafference).This suppression of vestibular reafference is specific to a class of second-order neurons, which had been classically termed vestibular-only (VO) neurons on the basis of their lack of eye movementrelated responses in head-restrained animals (e.g., Fuchs & Kimm 1975, Keller & Daniels 1975, Lisberger & Miles 1980, Chubb et al. 1984, Tomlinson & Robinson 1984, Scudder & Fuchs 1992, Cullen & McCrea 1993).However, given that they only reliably encode passively applied head velocity (i.e., vestibular exafference), this nomenclature is clearly deceptive.This is the same group of neurons that, as summarized earlier, is involved in the computation of inertial motion (Angelaki et al. 2004).

Neural Mechanisms Underlying the Differential Processing of Actively Generated Versus Passive Head Movement
How does the brain distinguish between active and passive head movements at the first stage of central processing in the vestibular system?Theoretically, the existence of extensive multimodal convergence of other sensory and motor signals with vestibular information in the VN provides several possible solutions.To frame this question better, it is important to note that neuronal responses were compared during selfgenerated head movements that were produced by activation of the neck musculature (i.e., voluntary head-on-body movements) and passive movements that were generated by whole body rotations (i.e., the traditional stimulus for quantifying vestibular responses).
Consequently, recent studies in alert rhesus monkeys have focused on the implications of the difference between the extravestibular cues that were present in these two conditions.First, studies show a difference in the net sensory information that is available to the brain.Notably, during active head-on-body movements, neck proprioceptors as well as vestibular receptors are stimulated.Thus this additional information could alter neuronal responses during active head-on-body movements.Indeed, neck-related inputs are conveyed to the vestibular nuclei using a disynaptic pathway (Sato et al. 1997).In addition, activation of neck muscle spindle afferents has long been known to influence the VN neuron activity in decerebrate animals (Boyle and Pompeiano 1981, Anastasopoulos & Mergner 1982, Wilson et al. 1990).However, passive activation of neck proprioceptors alone does not significantly alter neuronal sensitivities to head rotation in alert rhesus monkeys (Roy & Cullen 2004).Second, during active head-on-body movements, the brain produces a command signal to activate the neck musculature.To quantify the influence of this additional cue, recordings were made in head-restrained monkeys who attempted to move their heads.The generation of neck torque, even reaching a level comparable to that issued to produce large active head movements, had no effect on neuronal responses (Roy & Cullen 2004).
Taken together, these results show that neither neck motor efference copy nor proprioception cues alone are sufficient to account for the elimination of neuronal sensitivity to active head rotation.However, a common feature of both these experiments was that neck motor efference copy and proprioceptive signals were not matched as they typically are during normal active head movements.By experimentally controlling the correspondence between intended and actual head movement, Roy & Cullen (2004) showed that a cancellation signal is generated only when the An internal model of the sensory consequences of active head motion is used to suppress reafference selectively at the vestibular nuclei level.(a) Activity of an example VN neuron ( gray filled trace) during passive whole body rotation.In this condition, only vestibular inputs are available to the central nervous system, and there is no motor efference copy signal because the monkey does not actively move its head.(b) Activity of the same neuron during active-head-on body movements.In this condition, the monkey commands an active head movement, so an efference copy signal is theoretically available.In addition, the head movement activates both vestibular and proprioceptive afferents.A prediction of the neuron's activity based on its response to passive head motion is superimposed (blue trace).(c) The neuron is then recorded as the monkey actively moves its head; however, the head velocity generated by the monkey (red arrow in schema) is experimentally cancelled by simultaneously rotating the monkey in the opposite direction (blue arrow in schema).Consequently, the head moves relative to the body but not to space.As a result, one finds an efference copy signal, and the neck proprioceptors are activated, but vestibular afferent input is greatly reduced.The neuron's response shows a marked inhibition, in excellent correspondence to that predicted from the difference in response during passive (a) vs. active (b) head movements (black superimposed trace; modified with permission from Roy & Cullen 2004).(d ) Schematic to explain the selective elimination of vestibular sensitivity to active head-on-body rotations.Vestibular signals that arise from self-generated head movements are inhibited by a mechanism that compares the brain's internal prediction of the sensory consequences to the actual resultant sensory feedback.Accordingly, during active movements of the head on body, a cancellation signal is gated into the vestibular nuclei only in conditions where the activation of neck proprioceptors matches that expected on the basis of the neck motor command.activation of neck proprioceptors matches the motor-generated expectation (Figure 6a-c).This interaction among vestibular, proprioceptive, and motor efference copy signals occurs as early as the first-order vestibular neurons.In agreement with Von Holst's & Mittelstaedt's original hypothesis, an internal model of the sensory consequences of active head motion (Figure 6d ) is used to selectively suppress reafference at the vestibular nuclei level.
The finding that vestibular reafference is suppressed early in sensory processing has clear analogies with other sensory systems, most notably the electrosensory system of the weakly electric fish (Bell 1981, Bastian 1999, Mohr et al. 2003).This is not unexpected because both systems have presumably evolved from the lateral line (Romer & Parsons 1977).Considerable evidence from work in electric fish demonstrates that cerebellum-like electrosensory lobes play a key role in the attenuation of sensory responses to self-generated stimulation (Bell et al. 1999, Mohr et al. 2003, Sawtell et al. 2007).Consistent with this idea, fMRI studies have suggested that the cerebellum plays a similar role in the suppression of tactile stimulation during self-produced tickle (Blakemore et al. 1998(Blakemore et al. , 1999a,b),b).Identifying the neural representations of the cancellation signal for vestibular reafference promises to be an interesting area of investigation, and the cerebellum is a likely site.However, perhaps an even more interesting question is, how does the brain facilitate the temporal/spatial comparison between proprioceptive inputs and motor commands that is required to cancel reafference?This is a critical point, given that the ability to attenuate incoming vestibular afferent signals depends on this comparison.Yet, not only does peripheral feedback from the movement lag descending motor commands, but also it reflects the spatial complexity of the neck motor system.
The ability to distinguish actively generated and passive stimuli is not a general feature of all early vestibular processing.Positionvestibular-pause (PVP) neurons constitute the middle leg of the three neuron arc that generates the VOR and thus are both sensory and premotor neurons.Unlike VO cells, PVP neurons code head velocity in a manner that depends exclusively on the subject's current gaze strategy.Specifically, vestibular inputs arising from active and passive head movements are similarly encoded, as long as the goal is to stabilize gaze (Roy & Cullen 1998;Roy & Cullen 2002, 2003).In contrast, when the goal is to redirect gaze (e.g., during orienting gaze shifts), neuronal responses to active head movements are suppressed.This finding is logically consistent with the role of these neurons in stabilizing gaze; because during gaze shifts eye movements compensatory to head movement would be counterproductive, the VOR is significantly suppressed (see discussion by Cullen et al. 2004).Also consistent with the proposal that these neurons process vestibular inputs in a manner that depends on the current gaze strategy is the finding that their rotational head movement sensitivity depends on viewing distance (Chen-Huang & McCrea 1999).This is because larger rotational VOR gains are necessary to stabilize near vs. far targets (as a result of the differences in the translations of the target relative to the eye).
In summary, whereas the behaviorally dependent processing of vestibular inputs is a general feature of early vestibular areas, the ability to distinguish actively generated and passive head movements is specific to a distinct population of neurons.The functional significance of this ability to selectively suppress vestibular inputs that arise from self-generated movements is considered next.

Functional Implications: Consequences for Motor Control and Spatial Orientation
The differential processing of vestibular information during active vs. passive head movements is essential for ensuring accurate motor control.This point can be easily appreciated by considering that many of the same neurons that distinguish actively generated from passive head movements control the vestibulocollic reflex (VCR) via their projections to the cervical segments of the spinal cord (Boyle 1993, Boyle et al. 1996, Gdowski & McCrea 1999).The function of the VCR is to assist stabilization of the head in space via activation of the neck musculature during head motion.In situations where it is helpful to stabilize head position in space, the compensatory head movements produced by this reflex are obviously beneficial.Yet, when the behavioral goal is to make an active head movement, the vestibular drive to the reflex pathway would command an inappropriate head movement to move the head in the direction opposite of the intended goal.Thus it is important that VN neurons that control the VCR are less responsive during active head movements.Furthermore, because neurons continue to reliably encode information about passive head-on-body rotations that occur during the execution of voluntary movements (McCrea et al. 1999, Roy & Cullen 2001), they can selectively respond to adjust postural tone in response to any head movements that the brain does not expect.This selectivity is fundamental because recovery from tripping over an obstacle while walking or running requires a selective but robust postural response to the unexpected vestibular stimulation.
The same VN neurons that distinguish actively generated from passive head movements are also reciprocally interconnected with the fastigial nucleus and nodulus/uvula of the cerebellum.As was detailed in the previous section, the same network (VN, FN, NU) also makes significant contributions to the computation of inertial motion (Angelaki & Hess 1995, Wearne et al. 1998).Results from a preliminary report in the rostral fastigial nucleus show that FN neurons also distinguish actively generated from passive head rotations (Brooks & Cullen 2007).This finding suggests that FN neurons do not compute an estimate of selfmotion during active movements, but rather use multimodal information to compute an exafference signal (i.e., motions applied by the outside world).Because the rostral FN is generally thought to be involved in vestibulo-spinal control, this processing is most likely essential for the regulation of gait and posture.
During self-motion, the ability to distinguish between actively generated and passively applied head movements is not only important for shaping motor commands, but also critical for ensuring perceptual stability (reviewed in Cullen 2004).Notably, Roy & Cullen (2001) asked whether the head movement-related responses of VN neurons might be attenuated not only during active head movements, but also during a more cognitively demanding, less natural, self-motion task.Single-unit recordings were made in the VN of monkeys while they controlled a steering wheel to actively rotate their heads and bodies together through space (Figure 7).In contrast to what was observed during active head-on-body movements, all second-order vestibular neurons continued to respond robustly to angular head velocity during these self-generated rotations.Although this result further emphasizes the important role that movement commands and proprioceptive signals play in shaping the responses of secondary vestibular neurons (i.e., during

Figure 7
Example of a VN neuron response to voluntary combined head-body motion.Head-restrained monkeys manually controlled a steering wheel to rotate the vestibular turntable relative to space.Their goal was to align a turntable-fixed laser target (T table ) with a computer-controlled target (T goal ).The example neuron is typical in that modulation was well predicted by its response to passive head movement.Modified with permission from Roy & Cullen (2001).
natural orienting movements), further training to control movement by steering might have ultimately resulted in the suppression of vestibular responses.For example, after extensive flight training with a particular aircraft, it is common for pilots to make comments such as "the aircraft began to feel like an extension of my limbs."Perhaps this sensation occurs once the brain has built an accurate internal model of the vehicle being driven and in turn is capable of canceling the sensory consequences of motion that result from the manual steering of the aircraft (or vestibular chair).Future studies of motor learning during selfmotion tasks will be required to address this proposal.
These results might also be relevant to the generation of the properties of head direction cells.As summarized in the previous section, the spatial tuning of these cells is currently thought to be created through online integration of the animal's angular head velocity, generally assumed to arise from the vestibular nuclei (reviewed in Brown et al. 2002, Taube 2007).Results from VN studies comparing coding of active vs. passive head movements, however, remain to be incorporated into models of how heading direction is computed.Indeed, one apparent contradiction between these two lines of research is the finding that head direction neurons actually respond far more robustly to active than passive head rotations (Zugaro et al. 2002, Stackman et al. 2003, Bassett et al. 2005).Accordingly, the construction of an accurate internal representation of head direction for these neurons appears to require the integration of multimodal signals (proprioceptive, motor efference copy, and optic field flow) with vestibular inputs.
In summary, the multimodal interactions outlined so far served to mediate particular functions: (a) computation of inertial motion and (b) isolation of a vestibular exafference signal.However, multisensory interactions involving vestibular information are much more extensive and abundant throughout the brain.Although an explicit coverage of this topic is beyond the scope of this review, in the next sec-tion we touch on a fundamental concept that is relevant to these multisensory interactions: the concept of reference frames.

REFERENCE FRAMES FOR CODING VESTIBULAR SIGNALS
A reference frame can be defined as the particular perspective from which an observation of a spatial variable (e.g., position, velocity) is made.All sensorimotor systems that require the encoding or decoding of spatial information must face the issue of reference frames (for reviews, see Cohen & Andersen 2002, Pouget & Snyder 2000).As the otolith organs and semicircular canals are fixed inside the head, both linear acceleration and angular velocity are initially encoded in a head-centered reference frame by the primary receptors (similar to auditory information but unlike visual information, which is encoded in an eye-centered frame).This is fine for controlling eye movements through the VOR because the eyes are also locked in the head.However, because the head can adopt almost any position relative to the body or the world, there is a need to transform vestibular signals into reference frames relevant to the behavior being controlled.
Furthermore, vestibular signals in the brain become strongly multisensory.Investigators have traditionally thought that sensory information from disparate sources (e.g., visual and auditory or vestibular) needs to be brought into a common frame of reference (Cohen & Andersen 2002) before it can be combined in a useful way, although this assumption has recently been challenged (Deneve et al. 2001, Avillac et al. 2005).Next we discuss which reference frames are used to code vestibular signals, i.e., whether vestibular information remains invariant when expressed in eye-, heador body-fixed reference frames.To date, this question has been mainly addressed in two ways: (a) Head-vs.body-centered reference frames have been examined in the brainstem and vestibulo-cerebellum; and (b) head-vs.eyecentered reference frames have been studied in extrastriate visual cortex.

Head-vs. Body-Centered Reference Frames: Vestibular/Neck Proprioceptive Interactions
Although the vestibular system alone may be sufficient to compute position and motion of the head, several daily functions including maintenance of posture and balance and perception of self-motion require knowledge of body position, orientation, and movement.By combining vestibular signals, which encode motion in a head-centered frame, with neck proprioceptive information that signals the static position of the head relative to the body, a coordinate transformation could take place to convert motion signals into a bodycentered reference frame.
To test this, one must measure the spatial tuning of central neurons while the motion of the head and body are dissociated, e.g., during motion along different directions (defined relative to the body) while the head is fixed at different static positions relative to the trunk (Figure 8a).A body-centered reference frame assumes that spatial tuning should be independent of the change in head position and the three tuning curves should superimpose.In contrast, if a cell detects motion in a headcentered reference frame, its preferred movement direction should systematically shift to the left or to the right to reflect the shifted direction of motion in a head reference frame.Figure 8b  and c show tuning curves from two representative neurons (Shaikh et al. 2004).For the cell in Figure 8b, the directions of maximum and minimum response gains (0 • and 90 • motion directions, respectively) were the same for all three head-on-body positions (blue, orange and red lines superimpose), indicating a body-fixed reference frame.For the other cell (Figure 8c), Head-vs.body-centered reference frames.(a) Schematic of the experimental manipulation.Head and body reference frames were dissociated by systematically varying both the direction of translation in the horizontal plane (0 • , 30 • , 60 • , 90 • , 120 • , 150 • , and 180 • , defined relative to the body) and the static orientation of the head relative to the trunk (three different positions were used: straight-ahead and 30 • rotated to the left/right).(b, c) Examples of cell response gain plotted as a function of motion direction.For a neuron coding motion in a body-centered reference frame, the spatial tuning curves for the three head-in-trunk positions superimpose (b).For a neuron coding motion in a head-centered frame, the three tuning curves are shifted accordingly (c).Blue, head 30 • -left; orange, head straight ahead; red, head 30 • -right.Data from Shaikh et al. (2004).
the directions of maximum and minimum response gain shifted for the three head-on-body positions, such that they remained fixed relative to the head.
Most neurons in the rostral VN were consistent with the spatial shift expected from a head-centered reference frame, but this was not the case for the rostral FN (Shaikh et al. 2004).In fact, many cells showed intermediate properties: Their tuning curves shifted through an angle that was in-between, suggesting intermediate or a mixture of reference frames.Kleine et al. (2004) reported similar findings regarding a mixture of head-and bodycentered reference frames in the rostral FN using different head-on-trunk positions during rotation, as illustrated with an example cell that codes motion of the body in Figure 9.A body-centered reference frame in the FN might be beneficial because the rostral FN represents a main output of the anterior vermis (Voogd 1989) and nodulus/uvula (Voogd et al. 1996), both which have been implicated in vestibular/proprioceptive interactions for limb and postural control.
To date, reference frame questions in the vestibular system have been studied using passive movements.Do the same coordinate transformations also characterize responses during active movements?Recent findings that FN cell responses are greatly attenuated during active head rotation (Brooks and Cullen 2007) suggest that the same computations may not be required during active movements.A similar logic might apply to higher levels of processing; active and passive information might be processed in ways appropriate to their functional roles.Supporting this idea, recent studies have shown that cortical neurons differentially process active and passive movements (e.g., Klam & Graf 2006, Fukushima et al. 2007).

Head-Versus Eye-Centered Reference Frames: Vestibular/ Visual Interactions
Another example of multisensory vestibular function that faces the problem of reference frames is that of visual/vestibular interactions.The dorsal subdivision of the medial superior temporal area (MSTd) is one of the likely candidates to mediate the integration of visual and vestibular signals for heading (i.e., translational motion) perception (Duffy 1998;Bremmer et al. 1999;Page & Duffy 2003;Gu et al. 2006aGu et al. , 2007;;Takahashi et al. 2007).It is commonly thought that multisensory neural populations should represent different sensory signals in a common reference frame (Stein & Meredith 1993, Cohen & Andersen 2002).Thus, it might be expected that both visual and vestibular signals in MSTd should code heading in a head-centered reference frame.This would enable neurons to encode a particular motion direction regardless of the sensory modality or eye position.
This hypothesis was recently tested and refuted (Fetsch et al. 2007).Head-vs.eyecentered reference frames were dissociated by manipulating static eye position while quantifying spatial tuning curves, constructed separately for translational inertial motion in the absence of visual motion (vestibular condition) and optic flow simulating translational motion (visual condition).As shown with an example cell in Figure 10, the reference frame for vestibular signals was close to head centered but at the population level shifted slightly toward eye centered (Fetsch et al. 2007).In contrast, visual signals continued to be represented in a retinal reference frame.These results contradict the conventional wisdom in two respects.First, reference frames for visual and vestibular heading signals in MSTd remain distinct, although evidence clearly shows that these neurons might mediate multisensory cue integration (Gu et al. 2006b).Thus, sensory signals might not be expressed in a common frame of reference for integration to occur.Second, rather than shifting the visual signals toward a head-centered representation, there was a modest shift of vestibular tuning toward an eye-centered representation.Similar to the results in the cerebellum, several MSTd neurons showed partial shifts and could thus be considered to represent motion direction in an intermediate frame of reference (Fetsch et al. 2007).In summary, these results are not consistent with the hypothesis that multisensory areas use a common reference frame to encode visual and vestibular signals.Similar conclusions have also been reached in other cortical and subcortical areas.For example, unlike visual receptive fields, tactile receptive fields in the ventral intraparietal (VIP) area are purely head centered (Avillac et al. 2005).In addition, visual and auditory receptive fields in VIP, as well as the lateral and medial intraparietal areas (LIP and MIP), exhibit a continuum of reference Mean firing rate (color contour plots) is plotted as a function of the heading trajectory in spherical coordinates, with the azimuth and elevation of the heading vector represented on the abscissa and ordinate, respectively.For illustration purposes, small white circles are positioned at the preferred heading for each tuning function, computed as a vector sum of responses around the sphere.(c) Conventions for defining the real (vestibular) or simulated (visual) motion directions of three-dimensional heading stimuli.Replotted with permission from Fetsch et al. (2007).
frames from head centered to eye centered (Mullette-Gillman et al. 2005, Schlack et al. 2005).Investigators traditionally thought that intermediate frames may represent a middle stage in the process of transforming signals between different reference frames ( Jay & Sparks 1987, Stricanne et al. 1996, Andersen et al. 1997).Alternatively, broadly distributed and/or intermediate reference frames may be computationally useful.According to this latter view, intermediate frames may arise naturally when a multimodal brain area makes recurrent connections with unimodal areas that encode space in their native reference frame (Pouget et al. 2002).Using a recurrent neural network architecture, Pouget and colleagues have shown that a multi-sensory layer expressing multiple reference frames, combined with an eye position signal, can optimally mediate multisensory integration in the presence of noise (Deneve et al. 2001, Deneve & Pouget 2004).This modeling framework predicts a robust relationship between the relative strength of visual and nonvisual signals and the respective reference frames in a particular brain area (Avillac et al. 2005); the stronger a sensory signal is, the more dominant its native reference frame is.Accordingly, in MSTd, where visual responses are stronger than vestibular responses, an eye-centered reference frame tends to dominate (Fetsch et al. 2007).Future studies of visual/vestibular interactions in other brain areas will be useful in further testing this framework.

SUMMARY POINTS
1.The vestibular system represents our sixth sense.Because of the need for these diverse, multimodal functions, computationally intriguing transformations of vestibular information occur as early as the first-order neurons in the brainstem vestibular nuclei and vestibulo-cerebellum.
2. Within a network consisting of the brainstem vestibular nuclei (VN), the most medial of the deep cerebellar (fastigial, FN) nuclei, and the most posterior lobulus (X and IX) of the cerebellar vermis, a critical computation of inertial motion (i.e., how our head moves in space) takes place.Nonlinear interactions between signals from the semicircular canals and otolith organs give the central vestibular system the ability to function as an inertial sensor and contribute critically to both navigation and spatial orientation.
3. Neurons at the first central stage of vestibular processing (VN and FN) can also distinguish between self-generated and passive movements.During active movements, a cancellation signal is generated when the activation of proprioceptors matches the motorgenerated expectation.This mechanism eliminates self-generated movements from subsequent computation of orientation and postural control.
4. The ability to distinguish actively generated and passive stimuli is not a general feature of all early central vestibular processing; central vestibular neurons process vestibular information in a manner that is consistent with their functional role.For example, central neurons controlling gaze process vestibular information in a behaviorally dependent manner according to current gaze strategy.
5. The need for multisensory integration with both proprioceptive and visual signals necessitates that vestibular information is represented in widely different reference frames within the central nervous system.Here we summarize two such examples where vestibular information has been at least partially transformed from a head-fixed to a bodycentered (cerebellar FN) and eye-centered (extrastriate area MSTd) reference frame.

FUTURE ISSUES
1. Most of the vestibular signal processing studies have concentrated on the brain stem vestibular nuclei and vestibulo-cerebellum. Vestibular information is also heavily present in the reticular formation, spinal cord, thalamus, and cortex.What are the properties and functions of vestibular information in these diverse brain areas?
2. What are the exact relationships between neurons that discriminate translation from tilt and those that have velocity storage properties?
3. How does an ω EV signal generate head direction cell activity and contribute to navigation?Although ω EH signals have been isolated in single-cell responses that selectively encode for translation, an ω EV signal has yet to be identified in single-cell activity.
4. The distinction between passive and active head movements in neural activity has so far been tested only during rotation.Whether neurons respond differently during passive and active translational movements has yet to be explored.
5. Which information is encoded by cortical areas that contribute to the perception of selfmotion?Can these areas distinguish actively generated from passive head movements?If so, which mechanisms underlie the computation, and what is the functional significance of the information that is ultimately encoded?
6. Prior studies describing the transformation of vestibular information from a headcentered to other reference frames (i.e., body-centered and eye-centered) considered only passive head movement stimuli.Is vestibular information encoded in the same reference frames during actively generated movements?Or alternatively, is the necessary transformation of vestibular information behavior dependent?
7. The only study of reference frames involving convergent visual and vestibular signals has been in extrastriate visual cortex (area MSTd).However, some form of visual/vestibular convergence (studied mainly by using optokinetic stimulation at low frequencies) already occurs as early as the vestibular nuclei and vestibulo-cerebellum. Which reference frames are used in these interactions?Have vestibular signals been transformed at least partially into an eye-centered reference frame (such as in MSTd)?Or, alternatively, are optokinetic signals coded in a head-centered reference frame?
Figure 1Schematic of the two computational problems in inertial motion detection.(a) The rotation problem involves calculation of head angular velocity, ω, relative to the world (as defined by gravity, g s ).(b) The linear acceleration problem involves the discrimination of net gravitoinertial acceleration, α, into translational, t, and gravitational acceleration, g.(c) Schematic of the computational solution.Angular velocity, ω, from the semicircular canals must be combined with gravitational information to be decomposed into two components, one parallel to gravity, ω EV , and another perpendicular to gravity, ω EH .In parallel, net linear acceleration, α, from the otolith organs must be combined with the temporal integral of ω EH ( ω EH ), such that it is separated into translational, t, and gravitational acceleration, g.Replotted with permission fromYakusheva et al. (2007).

Figure 3
Figure 3Summary of how well neurons in different subcortical areas discriminate translational from gravitational acceleration.Z-transformed partial correlation coefficients for fits of each cell's responses with a translation-coding model and an afferent-like model.Data from nodulus/uvula (NU) Purkinje cells (orange circles), rostral fastigial nucleus (FN, purple up triangles) and vestibular nucleus (VN, green down triangles) are compared with primary otolith afferents (AFF, blue squares).Dashed lines divide the plots into an upper-left area corresponding to cell responses that were a significantly better fit ( p < 0.01) by the translationcoding model; a lower-right area including neurons that were a significantly better fit by the afferent-like model; and an intermediate area that indicates cells that were not a significantly better fit by either model.Modified and replotted with permission fromAngelaki et al. (2004) andYakusheva et al. (2007).

Figure 9
Figure 9 Head-vs.body-centered reference frames.(a) Schematic of the experimental protocol.(b) Rotation responses with sinusoidal fit.(c) Tuning curves for neuronal gain and phase.Vertical dotted lines (and numbers) illustrate maximum response direction [which changes from pitch (trunk left) to RALP (trunk center) to roll (trunk right)].RALP: right anterior/left posterior canal axis orientation; LARP: left anterior/right posterior canal axis orientation.Replotted with permission from Kleine et al. (2004).
Figure 10Three-dimensional heading tuning functions of two example MSTd neurons.(a, b) Cell 1 was tested in the vestibular condition and cell 2 in the visual condition.Tuning was measured at three different eye positions: −20 • (top), 0 • (middle), and +20 • (bottom).Mean firing rate (color contour plots) is plotted as a function of the heading trajectory in spherical coordinates, with the azimuth and elevation of the heading vector represented on the abscissa and ordinate, respectively.For illustration purposes, small white circles are positioned at the preferred heading for each tuning function, computed as a vector sum of responses around the sphere.(c) Conventions for defining the real (vestibular) or simulated (visual) motion directions of three-dimensional heading stimuli.Replotted with permission fromFetsch et al. (2007).