Lateral habenula as a source of negative reward signals in dopamine neurons

Midbrain dopamine neurons are key components of the brain’s reward system, which is thought to guide reward-seeking behaviours. Although recent studies have shown how dopamine neurons respond to rewards and sensory stimuli predicting reward, it is unclear which parts of the brain provide dopamine neurons with signals necessary for these actions. Here we show that the primate lateral habenula, part of the structure called the epithalamus, is a major candidate for a source of negative reward-related signals in dopamine neurons. We recorded the activity of habenula neurons and dopamine neurons while rhesus monkeys were performing a visually guided saccade task with positionally biased reward outcomes. Many habenula neurons were excited by a no-reward-predicting target and inhibited by a reward-predicting target. In contrast, dopamine neurons were excited and inhibited by reward-predicting and no-reward-predicting targets, respectively. Each time the rewarded and unrewarded positions were reversed, both habenula and dopamine neurons reversed their responses as the bias in saccade latency reversed. In unrewarded trials, the excitation of habenula neurons started earlier than the inhibition of dopamine neurons. Furthermore, weak electrical stimulation of the lateral habenula elicited strong inhibitions in dopamine neurons. These results suggest that the inhibitory input from the lateral habenula plays an important role in determining the reward-related activity of dopamine neurons.

the primate lateral habenula, part of the structure called the epithalamus, is a major candidate for a source of negative reward-related signals in dopamine neurons. We recorded the activity of habenula neurons and dopamine neurons while rhesus monkeys were performing a visually guided saccade task with positionally biased reward outcomes 7 . Many habenula neurons were excited by a noreward-predicting target and inhibited by a reward-predicting target. In contrast, dopamine neurons were excited and inhibited by reward-predicting and no-reward-predicting targets, respectively. Each time the rewarded and unrewarded positions were reversed, both habenula and dopamine neurons reversed their responses as the bias in saccade latency reversed. In unrewarded trials, the excitation of habenula neurons started earlier than the inhibition of dopamine neurons. Furthermore, weak electrical stimulation of the lateral habenula elicited strong inhibitions in dopamine neurons. These results suggest that the inhibitory input from the lateral habenula plays an important role in determining the rewardrelated activity of dopamine neurons.
Dopamine neurons in the substantia nigra pars compacta respond to rewards or sensory stimuli that reliably predict the rewards. The response is positive (an increase in activity) or negative (a decrease in activity) if the value of the reward is higher or lower, respectively, than predicted 1,5,6,8 . However, it is unclear which brain areas provide dopamine neurons with reward-related information. Among many brain areas that project to the substantia nigra pars compacta 1 , we decided to investigate the lateral habenula, which is known to project to the substantia nigra pars compacta 9 (Supplementary Fig. 2) with inhibitory effects on dopamine neurons 10 . The lateral habenula has been implicated in anxiety 11 , stress 12,13 , pain 14 , avoidance learning 15,16 , attention 17 , human reward processing 18,19 , and psychosis 20,21 .
To examine the role of the lateral habenula in reward processing, we compared the activity of habenula neurons and dopamine neurons in two monkeys (L and E) performing a visually guided saccade task with positionally biased reward outcomes (hereafter called 'reward-biased visual saccade task') 7 (Fig. 1a). The target was presented randomly on the right or left and the monkeys had to make a saccade to it immediately. Correct saccades were signalled by tone stimuli 200 ms after the saccades. Saccades to one position were rewarded, whereas saccades to the other position were not rewarded. Thus, the target instructed the saccade direction and indicated the reward contingency (reward or no-reward). In rewarded trials, a liquid reward was delivered which started simultaneously with the tone stimulus. The position-reward contingency was fixed for 24 consecutive trials and was then reversed abruptly for the next block with no external instruction. In this task, the saccade latencies were reliably related to reward contingency. Figure 1b shows the distribution of the saccade latencies of monkey L. Both monkeys showed significantly shorter saccade latencies in rewarded trials than in unrewarded trials (Supplementary Note A).
Single cell activity of 49 lateral habenula neurons (37 in monkey L and 12 in monkey E) was recorded. These neurons were estimated to be in the lateral habenula using MRI and their localization was confirmed histologically (see Supplementary Fig. 3 and Supplementary Note B). Figure 2a shows the activity of a single neuron recorded in the left habenula while the monkey was performing the rewardbiased visual saccade task. The activity increased phasically after the appearance of the saccade target indicating the absence of upcoming reward, and decreased after the appearance of the target indicating the presence of upcoming reward. The increase and decrease depended on the reward contingency, regardless of target position.
Many of the 49 lateral habenula neurons behaved similarly to the sample neuron shown in Fig. 2a. In order to evaluate the effect of reward contingency and target position on the response to the saccade target (hereafter called post-target response), we performed Outcome Figure 1 | Behavioural task and monkey's performance. a, Sequence of events in the one-direction-rewarded version of the visually guided saccade task (reward-biased visual saccade task). The position-reward contingency was fixed in a block of 24 trials and was reversed in the next block. See text for details of tone and reward. b, Distribution of saccade latencies in rewarded trials (red) and in unrewarded trials (blue) (data from monkey L). Saccades in the first trials after the changes in position-reward contingency have been excluded.
two-way analysis of variance (ANOVA) for each neuron. Of the 49 neurons, 43 showed a significant main effect of reward contingency, and 10 neurons showed a significant main effect of target position (P , 0.01, two-way ANOVA). As shown in Fig. 2b, the reward index (see Methods) was predominantly larger than the position index (P , 10 28 , Wilcoxon signed-rank test). Even for the 10 neurons that were affected by target position (green dots in Fig. 2b), the position index was smaller than the reward index. Thus, the post-target response was mainly influenced by the reward contingency rather than target position. The post-target responses of the 43 neurons were predominantly positive (increase in activity) in unrewarded trials and negative (decrease in activity) in rewarded trials (Table 1). This is also evident in the scatter plot in Fig. 2c, which shows that the median post-target response was significantly larger than zero in unrewarded trials and smaller than zero in rewarded trials (P , 0.01, Wilcoxon signed-rank test). Control experiments showed that the post-target response was a visual response, not a saccadic response (see Supplementary Fig. 4 and Supplementary Note C).
These properties of habenula neuronal activity were similar to those of dopamine neurons, but in the opposite manner (Fig. 3). We recorded the activity of 62 dopamine neurons in the same monkeys (42 in monkey L and 20 in monkey E). Their activity increased and decreased phasically in response to the saccade target in rewarded and unrewarded trials, respectively, regardless of the target direction (sample neuron activity in Supplementary Fig. 5a, and the average activity in Fig. 3b and Supplementary Fig. 6b). The responses were visual, not saccade-related, confirming a previous report 6 .
Both habenula and dopamine neurons changed their activity similarly after the position-reward contingency was reversed ( Fig. 3c and  d). After a reward-to-no-reward transition (blue curves), the posttarget response in habenula neurons (Fig. 3c, top) increased rapidly (from negative to positive values) whereas the post-target response in dopamine neurons (Fig. 3d, top) decreased rapidly (from positive to negative values). After a no-reward-to-reward transition (red curves), the post-target response in habenula neurons decreased rapidly whereas the post-target response in dopamine neurons increased rapidly. The post-target responses in both habenula and dopamine neurons were then stable for the reminder of the block. The development of the post-target responses in habenula and dopamine neurons ( Fig. 3c and d, top) was paralleled by the changes in saccade latency ( Fig. 3c and d, bottom), although the changes in the neuronal responses were quicker than the changes in saccade latency especially after a reward-to-no-reward transition.
Habenula neurons also responded differentially to the delivery and omission of reward, as shown in the right half of the raster/spike density functions (SDFs) in Figs 2a and 3a. The responses (hereafter called reward on-off responses) were particularly strong in the first trials after the reversal of the position-reward contingency (dotted lines in the SDFs of Figs 2a and 3a), and appeared as a phasic increase and decrease, respectively, after the omission and delivery of reward. This is evident in the scatter plot in Fig. 2d, which shows that the median reward on-off response was significantly larger than zero in unrewarded trials and smaller than zero in rewarded trials (P , 0.01, Wilcoxon signed-rank test). The reward on-off response then declined rapidly (Fig. 3c, middle).  Dopamine neurons also showed reward on-off responses in the first trials (dotted lines in the SDFs of Fig. 3b, and Supplementary Figs 5a and 6b) that declined rapidly after the transitions (Fig. 3d, middle), but in the directions opposite to habenula neurons. Thus, the responses of both habenula and dopamine neurons shifted from the reward phase to the target phase as the prediction was established. These features suggest that habenula neurons encode reward prediction error, as dopamine neurons are thought to do 1 . In support of this hypothesis, when the reward bias was smaller (that is, both targets were associated with rewards, but with different amounts), both habenula neurons and dopamine neurons responded to the targets differentially, but their responses were weaker (Supplementary Fig. 7 and Supplementary Note D). However, the responses of habenula neurons may not represent pure prediction error, as the reward onoff responses, albeit small, remained in habenula neurons (Fig. 3a,  Fig. 3c middle); the difference in activity between unrewarded and rewarded trials remained significant (P 5 0.016, Wilcoxon signedrank test).
These results raise the possibility that habenula neuronal activity and dopamine neuronal activity are causally related. We found that, in unrewarded trials, the excitatory response of habenula neurons started earlier than the inhibitory response of dopamine neurons; in rewarded trials, however, the excitatory response of dopamine neurons started earlier than the inhibitory response of habenula neurons (Supplementary Note E). Thus, the excitation of habenula neurons could inhibit dopamine neurons in unrewarded trials, but inhibiting habenula neurons could not initiate the excitation of dopamine neurons in rewarded trials.
To test the hypothesis that habenula neurons affect dopamine reward responses, we delivered electrical stimulation (single biphasic pulse with 0.2 ms per phase duration, 100 mA) to the lateral habenula during recording from 22 of the 62 dopamine neurons (15 in monkey L and 7 in monkey E). The dopamine neuron in Fig. 4a was strongly inhibited by the stimulation in the habenula on the same side. The averaged activity of the 22 dopamine neurons showed a significant inhibition during a time window from 10 to 40 ms after the onset of stimulation (P , 0.05, Wilcoxon signed-rank test) (filled circles in Fig. 4b). 18 of the 22 dopamine neurons (82%) were significantly inhibited during this window (P , 0.01, Wilcoxon signed-rank test). Electrical stimulation of either the ipsilateral or contralateral habenula was effective (ipsilateral, 9/10; contralateral, 9/12). Significant inhibitions were obtained with weaker stimulation, but the effects were more robust on the ipsilateral side: 75% of the dopamine neurons tested were significantly inhibited with 40 mA (6/8; ipsilateral 4/4, contralateral 2/4); 50% with 20 mA (6/12; ipsilateral 5/5, contralateral 1/7). In contrast, electrical stimulation of the surrounding thalamic area (mediodorsal thalamus, MD) was ineffective even when the stimulation sites were only 1 mm away from the lateral  habenula: 8 dopamine neurons tested showed neither inhibition nor excitation (P . 0.05, Wilcoxon signed-rank test) (open squares in Fig. 4b).
The magnitude of the inhibition induced by habenula stimulation varied among the dopamine neurons. We found that dopamine neurons that are inhibited more strongly by habenula stimulation tended to show stronger inhibitions in response to the no-reward-predicting target (r 5 0.56, P , 0.01) (Fig. 4c). The result is consistent with the hypothesis that the input from the lateral habenula underlies the phasic inhibition of dopamine neurons in response to the noreward-predicting target. In contrast, the phasic excitation of dopamine neurons in response to the reward-predicting target was not correlated with the habenula-stimulation-induced inhibition (Supplementary Note F).
Using a saccade task, we have shown that lateral habenula neurons and dopamine neurons responded, with opposite signs, to visual targets that indicated the presence and absence of upcoming reward as well as to the unexpected delivery and omission of reward. The post-target responses of the habenula and dopamine neurons changed similarly after the reversal of the position-reward contingency, but in opposite directions. These results, together with the response latency analysis and the inhibition of dopamine neurons by electrical stimulation of the lateral habenula, suggest that the lateral habenula is capable of producing the negative reward response of dopamine neurons.
Recent studies from our laboratory have suggested that the reward modulation of dopamine neurons plays a key role in motivational control of saccadic eye movement [22][23][24] . An underlying mechanism proposed from these studies is that the efficacy of cortico-caudate synapses carrying visuo-saccadic signals is enhanced or depressed depending on the concurrent increase or decrease, respectively, in dopaminergic inputs 23 . According to this scheme, the lateral habenula would also be involved in the motivational control of saccadic eye movements by inhibiting dopamine neurons ( Supplementary  Fig. 1). In fact, the no-reward-dependent increase in the post-target response of habenula neurons was associated with the prolongation of saccade latency. However, testing this hypothesis would require further experiments, including artificial activation or inactivation of the lateral habenula.
One interpretation of our results may be that the lateral habenula is involved in negative reward processing while dopamine neurons are involved in positive reward processing. This view provides an interesting parallel with opponent-process theories, which postulate opponent interactions between an appetitive system and an aversive system [25][26][27] . This hypothesis may be supported by previous studies showing that habenula neurons are activated while dopamine neurons are inhibited by aversive stimuli in anaesthetized rats 14,28 . However, the nature of dopamine neuron has been studied under a wide variety of situations, including pavlovian and operant procedures 29 . It remains to be determined whether habenula neurons show response patterns opposite to dopamine neurons in these situations as well.
Finally, it is unknown how lateral habenula neurons acquire the negative reward information. It might be provided by the inputs from the limbic areas 30 (Supplementary Fig. 2). The reward information might then be elaborated through the interplay among the lateral habenula, the basal ganglia, and monoaminergic (dopaminergic and serotonergic) systems. Our data suggest that the lateral habenula may play a pivotal role in the integrative function.

METHODS SUMMARY
Two adult rhesus monkeys (Macaca mulatta) were used for the experiments. All procedures for animal care and experimentation were approved by the Institute Animal Care and Use Committee and complied with the Public Health Service Policy on the humane care and use of laboratory animals. A plastic head holder and plastic recording chambers were fixed to the skull under general anaesthesia and sterile surgical conditions. Two search coils were surgically placed under the conjunctiva of the eyes for recording of eye movements. We trained the monkeys to perform a one-direction-rewarded version of a visually guided saccade task (reward-biased visual saccade task) 7 (Fig. 1a) and a control task (reward-biased memory saccade task) 6,22 (Supplementary Fig. 4a). Details of these tasks can be found in Methods. While the monkeys were performing these tasks, we recorded single-unit activity from the lateral habenula and dopamine neurons in the substantia nigra pars compacta using conventional electrophysiological techniques (Methods). We estimated the positions of the lateral habenula and substantia nigra pars compacta by MRI. After the end of the recording session in one monkey, we confirmed the recording sites histologically (Supplementary Fig. 3 and Supplementary Note B). To examine the effects of lateral habenula neurons on dopamine neurons, we electrically stimulated the lateral habenula while recording from dopamine neurons. Details of the localization of the lateral habenula, identification of dopamine neurons, electrical stimulation, and histological procedures can be found in Methods. We analysed saccade latency and neuronal responses for trials in which the monkeys performed correct saccades. We focused on two kinds of neuronal responses: (1) post-target response that occurred after the onset of the saccade target, and (2) reward on-off response that occurred after the time when a reward was or would have been delivered. Further analysis methods can be found in Methods.
Full Methods and any associated references are available in the online version of the paper at www.nature.com/nature.

METHODS
Behavioural task. Behavioural tasks were under the control of a QNX-based real-time experimentation data acquisition system (REX, Laboratory of Sensorimotor Research, National Eye Institute, National Institutes of Health (LSR/NEI/NIH), Bethesda, Maryland). The monkeys sat in a primate chair, facing a frontoparallel screen 33 cm from the monkey's eyes in a soundattenuated and electrically shielded room. Stimuli generated by an active matrix liquid crystal display projector (PJ550, ViewSonic) were rear-projected on the screen. The monkeys were trained to perform a one-direction-rewarded version of the visually guided saccade task 7 (Fig. 1a), which we call 'reward-biased visual saccade task'. A trial started when a small fixation spot appeared on the screen. After the monkeys maintained fixation on the spot for 1,200 ms, the fixation spot disappeared and a peripheral target appeared at either right or left, 20u (for monkey L) or 15u (for monkey E) from the fixation spot. The monkeys were required to make a saccade to the target within 500 ms. Correct and incorrect saccades were signalled by tone and beep stimuli 200 ms after the saccades. Within a block of 24 trials, saccades to one fixed direction were rewarded with 0.3 ml of apple juice while saccades to the other direction were not rewarded. The position-reward contingency was reversed in the next block with no external instruction. Even in the unrewarded trials, the monkeys had to make a correct saccade; otherwise, the same trial was repeated. In rewarded trials a liquid reward was delivered which started simultaneously with the tone stimulus.
As a control, the monkeys were also trained to perform a reward-biased memory saccade task 6,22 (Supplementary Fig. 4a). After the monkeys maintained fixation for 1,200 ms, a peripheral target was presented for 100 ms at either right or left, 20u (for monkey L) or 15u (for monkey E) from the fixation spot. The monkeys had to maintain fixation while remembering the target position. After 1,000 ms (for monkey L) or 800 ms (for monkey E) delay, the fixation spot disappeared, and the monkeys were required to make a saccade to the remembered target position. Other procedures including the position-reward contingency were the same as the reward-biased visual saccade task. Electrophysiology. One recording chamber was placed over the midline of the parietal cortex, tilted posteriorly by 38u, and was aimed at the habenula; the other recording chamber was placed over the fronto-parietal cortex, tilted laterally by 35u, and was aimed at the substantia nigra. Single-unit recordings and electrical stimulations were performed using tungsten electrodes (Frederick Haer) that were advanced by an oil-driven micro-manipulator (MO-97A, Narishige) or an electrically driven micro-manipulator (MicroStepper, LSR/NEI/NIH). The recording and stimulation sites were determined using a grid system, which allowed recordings at every 1 mm between penetrations. The electrode was introduced into the brain through a stainless steel guide tube, which was inserted into one of the grid holes and then to the brain via the dura. For finer mapping of neurons, we also used a complementary grid, which allowed electrode penetrations between the holes of the original grid. Single neurons were isolated on-line using a custom voltage-time window discrimination software (MEX, LSR/NEI/ NIH). Localization of the lateral habenula. We estimated the position of the habenula by obtaining MRIs (4.7T, Bruker). We then recorded from neurons in and around the estimated habenula, and found that the firing patterns and spike shapes within the estimated habenula were distinctly different from neurons in the surrounding thalamic area (mediodorsal thalamus, MD). Presumed habenula neurons fired tonically with relatively high background rates (mean 6 s.d. 5 27.8 6 14.5 spikes s 21 , n 5 49). In contrast, presumed MD neurons exhibited irregular and bursty firing with lower background rates (mean 6 s.d. 5 7.5 6 5.4 spikes s 21 , n 5 33) and their action potentials were much broader than those of habenula neurons. Furthermore, most of the presumed habenula neurons, but none of the presumed MD neurons, were sensitive to reward outcome. Presumed habenula neurons were recorded at penetrations 1.5 or 2.0 mm from the midline with 1 or 2 penetrations separated anteroposteriorly; they were recorded at 2 holes (separated by 1 mm) at most in each hemisphere for a given grid. We made only two penetrations at 1.0 mm from the midline, but the recorded neurons were not sensitive to reward outcome; they were judged to be within the medial habenula. Recordings at penetrations 1 mm away from the presumed lateral habenula laterally or anteriorly yielded presumed MD neurons. Recordings at 1 mm posteriorly were different from others in that the first neuron in the subcortical structure was considerably deeper and neurons there may respond to visual stimuli but not reward outcome, suggesting that the electrodes were in the pretectum. Importantly, the characteristics of firing and the relation to reward outcome were distinctly different between the presumed habenula and the presumed MD or the pretectum, even when they were separated only by 0.5 mm or 1.0 mm. This was also true for the effect of electrical stimulation (see Fig. 4).
In the present study we isolated stable action-potentials from 74 habenula neurons. For each of these neurons, we first examined its activity using the reward-biased visual saccade task without recording. If we found any taskrelated response on-line, we recorded the activity of the neuron using the task. If not, we did not examine the activity further. Of the 74 habenula neurons, 49 were regarded as task-related neurons. The activity of these 49 habenula neurons was recorded and comprised the sample used for the analysis. However, we are not completely confident whether all neurons in the lateral habenula can be characterized as presented in this study. In particular, there may be different types of neurons in the deeper part of the lateral habenula that we have not explored fully. Identification of dopamine neurons. We searched for dopamine neurons in and around the substantia nigra pars compacta. Dopamine neurons were identified by their irregular and tonic firing around 5 spikes s 21 with broad spike potentials. In this experiment, we focused on dopamine neurons that responded to reward-predicting stimuli with a phasic excitation, and recorded from 62 dopamine neurons. Dopamine-like neurons that were not sensitive to rewardpredicting stimuli were not examined further. Electrical stimulation. In order to examine the effect of electrical stimulation in the lateral habenula on the activity of dopamine neurons, we first recorded single-or multi-unit activity in the habenula that was modulated by reward outcome in the reward-biased visual saccade task, and then used the electrode for electrical stimulation. We then recorded from a dopamine neuron that responded to reward-predicting stimuli with a phasic excitation from the substantia nigra chamber, and delivered a single current pulse (biphasic negativepositive pulse with 0.2 ms per phase duration) through the habenula electrode. The default setting of stimulation current was 100 mA. If the dopamine neuron was inhibited by 100 mA stimulation, we also used 20 and/or 40 mA. Data analysis. We defined the post-target response as the discharge rate during 150-350 ms after the target onset minus the background discharge rate before the target onset (500-0 ms). The reward on-off response was defined as the discharge rate during 250-700 ms after the onset of the tone stimulus (which was synchronized with reward onset if reward was present) minus the background discharge rate. These time windows for post-target and reward on-off responses were determined on the basis of the averaged activity of habenula neurons and that of dopamine neurons. Specifically, we set the time windows such that they include major parts of the excitatory and inhibitory responses of both habenula and dopamine neurons.
To evaluate the relative contribution of reward contingency (reward or noreward) and target position to the post-target response, we performed a two-way ANOVA and calculated reward index and position index for each neuron. The reward and position indices were defined as the percentage of variance accounted for by reward contingency and by target position, respectively 31 .
The latency of the averaged post-target response was calculated for each of four conditions (ipsilateral reward, ipsilateral no-reward, contralateral reward, and contralateral no-reward) using a bootstrap analysis. The data set of each neuron consisted of at least 24 trials for each condition. For each neuron, the trials were randomly resampled with replacements to form a new bootstrap data set which had the same number of trials as the original data set. The bootstrap data sets of all neurons were combined, and at each time point after target onset their averaged discharge rate was calculated during the 25 ms period before the time point (pre-period) and during the 25 ms period after the time point (postperiod). Then, the average discharge rate during the pre-period was compared with that during the post-period. Such random resampling and comparison were repeated 1,000 times. If the averaged discharge rate was larger during the post-period than during the pre-period in .975 repetitions, the time point was regarded as a time of significant increase. In contrast, if the averaged discharge rate was smaller during the post-period than during the pre-period in .975 repetitions, the time point was regarded as a time of significant decrease. This procedure was repeated by shifting the time point in 1 ms steps after target onset. If, of 20 consecutive 1 ms steps, the beginning and at least 19 showed significant increases or decreases, the beginning was defined as the latency of excitatory or inhibitory responses, respectively.
For the control task (reward-biased memory saccade task), we analysed the response just after the target onset (the discharge rate during 150-350 ms after the target onset minus the background discharge rate) and the response around the saccade onset (the discharge rate during 2100 to 100 ms after the saccade onset minus the background discharge rate).
In the population analysis of the electrical stimulation experiment (Fig. 4b), we excluded the activity during 0-10 ms after the stimulation onset because electrical stimulation often generated brief electric noise which contaminated the electrophysiological recording. Histology. After the end of the recording session in monkey L, we selected representative locations for electrode penetrations into the lateral habenula, substantia nigra pars compacta and MD. When typical single-or multi-unit activities were recorded for each region, we made electrolytic microlesions at