Neural systems for auditory perception of lexical tones

: Previous neuroimaging research on cognitive processing of speech tone has generated dramatically different patterns of ﬁndings. Even at the basic perception level, brain mapping studies of lexical tones have yielded inconsistent results. Apart from the data inconsistency problem, experimental materials in past studies of tone perception carried little or minimal lexical semantics, an important dimension that should not be dispensed with because speech tones serve to distinguish lexical meanings. The present study sought to examine the neural correlates of the perception of speech tone using lexically mean-ingful experimental stimuli. A simple lexical tone perception task was devised in which native Mandarin speakers were asked to judge whether or not the two syllables of an auditorily presented Chinese bisyllabic word had the same tone. We selected bisyllabic words as experimental stimuli because Chinese monosyllables often convey little or very vague meanings due to rampant homophony. We found that the left inferior frontal gyrus, the right middle temporal gyrus and bilateral superior temporal gyri are responsible for basic perception of linguistic pitches. Our interpretation of the data sees the left superior temporal gyrus as engaged in primary acoustic analysis of the auditory stimuli, while the right middle superior temporal gyrus and the left inferior frontal region are involved in both tonal and semantic processing of the language stimuli. pitch processing of tonal language and music in native Mandarin speakers. Experimental stimuli involved sequences of Chinese four-syllable phrases and four-note musical phrases. Half of the Chinese phrases were semantically meaningful, and the other half were similar except that the pitch contours of the last syllables of the phrases were manipulated, making these phrases semantically and syntactically incongruent. Subjects were asked to judge whether the four-syllable phrases presented sounded congruous. Nan and Friederici found that processing of Mandarin pitch congruities involved greater cortical activations in bilateral STG and left IFG. In the tasks of Gandour et al. and Nan and Friederici, working memory is required to hold linguistic items in mind during the task, and therefore, lexical tone processing may occur at a late short-term memory stage, instead of at the perceptual phase.

In a tonal language, lexical tone is used to distinguish lexical or grammatical meanings. Since tonal languages account for around 50% of the world's languages (Maddieson, 2013), understanding the neural substrates underlying speech tone perception in native speakers has been a central question in electrophysiological and neuroimaging research on spoken language (Luo et al., 2006;Nan, Friederici, Shu, & Luo, 2009;Ren, Yang, & Li, 2009;Ren, Tang, Li, & Sui, 2013;Tsang, Jia, Huang, & Chen, 2011;Xi, Zhang, Shu, Zhang, & Li, 2010;Yang, Gates, Molenaar, & Li, 2015;Zhang, Shu, Zhou, Wang, & Li, 2010. While existing research has yielded important findings on how lexical tone is neuroanatomically represented, the experi-mental tasks employed in brain mapping studies tap dramatically different cognitive processes, including perception and working memory (Gandour et al., 2002Hsieh, Gandour, Wong, & Hutchins, 2001;Klein, Zatorre, Milner, & Zhao, 2001;Li et al., , 2010Nan & Friederici, 2013;Wang, Sereno, Jongman, & Hirsch, 2003;Xu et al., 2006;Zhang et al., 2010Zhang et al., , 2011, and the neural correlates of perception of speech tone in native listeners have not been precisely pinpointed. Klein et al. (2001) used PET to identify the brain systems subserving auditory lexical tone perception in native Mandarin speakers. In their tone discrimination task, pairs of monosyllabic Chinese words were presented auditorily. Within each pair, the syllables were identical; half of them carried the same tone (e.g./tou2/and/tou2/) and the other half had different tones (e.g./fei2/and/fei4/). Subjects were required to make same-different lexical tone judgments of the syllable pairs. Peak activation was found in bilateral superior temporal gyri (STG), bilateral parietal areas and bilateral cerebellum in native Mandarin speakers. Similarly, Gandour et al. (2002) used fMRI to examine how the brain processes linguistically relevant pitch patterns (spectral vs. temporal cues) in tonal language speakers. The tone discrimination task they used is the same as in Klein et al.'s study, where subjects were required to make sameedifferent judgments on the syllable pairs they heard. The authors of this study used the same stimuli for the spectral (i.e. lexical tone) and temporal (i.e. vowel length) conditions, and the syllable pairs were monosyllabic Thai pseudowords which either (1) had the same tone with different vowel duration (e.g., p h in R and p h iin R ), (2) had different tones with the same vowel duration (e.g., haaj M and haaj H ), or (3) had same tone and same vowel duration. They found that native Thai speakers showed greater activation in left inferior prefrontal gyrus in discriminating Thai tone relative to nonspeech pitch. Wong, Parsons, Martinez, and Diehl (2004) used PET to compare the neural correlates underlying lexical tone perception between Mandarin natives and native English speakers. Their task is a tone judgment on Chinese syllable pairs (e.g./fei2/and/wei2/). Relative to passive listening of Mandarin syllables, lexical tone judgment in native Mandarin speakers induced stronger activation in the left insular cortex, putamen, thalamus, fusiform gyrus, and medial frontal gyrus. Activation in the right hemisphere was also observed in middle frontal gyrus and postcentral gyrus. In an fMRI study, Xu et al. (2006) found that when native Mandarin speakers performed tone discrimination tasks on two different Chinese syllables (e.g.,/bai2/and/yao2/) or stimuli resynthesized by superimposing Thai tones onto Chinese syllables, stronger activity in the left planum temporale (PT) was seen in response to native compared with nonnative tones.
The tasks used by Klein et al., Gandour et al., Wong et al., and Xu et al. are highly similar and are all perception-based, but their findings are inconsistent. This inconsistency may be related to the different languages used in the studies. Indeed, in Gandour et al.'s study (2002), native Chinese exhibited stronger activation in the left anterior superior temporal cortex in identifying Chinese speech tones relative to the nonspeech baseline.
Gandour and associates conducted several functional imaging studies to elucidate the neural mechanisms dedicated to Mandarin lexical tone processing. In their experiments, participants were presented a list of three to five monosyllables consecutively Hsieh et al., 2001;Li et al., , 2010; they were asked to make tone judgments of the two syllables located first and last in the sequence on each trial Hsieh et al., 2001), to match the last item in the sequence with the probe , or to match the probe with the target syllable within the sequence in random positions . Tonal processing yielded greater brain activations in bilateral frontal-parietal networks, including the inferior prefrontal cortex, posterior inferior frontal gyrus (IFG), middle frontal gyrus (MFG) and the inferior parietal lobule. Nan and Friederici (2013) compared pitch processing of tonal language and music in native Mandarin speakers. Experimental stimuli involved sequences of Chinese four-syllable phrases and four-note musical phrases. Half of the Chinese phrases were semantically meaningful, and the other half were similar except that the pitch contours of the last syllables of the phrases were manipulated, making these phrases semantically and syntactically incongruent. Subjects were asked to judge whether the foursyllable phrases presented sounded congruous. Nan and Friederici found that processing of Mandarin pitch congruities involved greater cortical activations in bilateral STG and left IFG. In the tasks of Gandour et al. and Nan and Friederici, working memory is required to hold linguistic items in mind during the task, and therefore, lexical tone processing may occur at a late short-term memory stage, instead of at the perceptual phase.
In summary, previous research on speech tone processing has generated markedly different patterns of findings, partly because the tasks employed in those studies measured different levels of cognitive processing. As a matter of fact, even in neuroimaging experiments on basic perception of lexical tones in native Mandarin speakers, the findings are highly inconsistent too. Thus, it is worthwhile to further address the neuroanatomical representation issue of speech tone.
Apart from the data inconsistency problem, we have also noted that experimental stimuli in past studies of tone perception carried little or minimal lexical semantics, an important facet that should not be ignored because speech tones serve, after all, to distinguish lexical meanings. For example, when the syllables/fei2/and/fei4/are used in a tone decision task, the two syllables have little or very vague meanings because of the rampant homophony in Mandarin Chinese. In this case, the neural basis for tone processing may be related to general tone analysis (e.g., pure tone) but is hardly associated with natural, meaning-related lexical tone. Perception of lexical tones includes both acoustic processing and semantic processing of the pitch signal. Yet, it seems that most of the past studies have emphasized on the processing of acoustic/phonological information carried by lexical tones, and little focus is put on the neural substrates associated with the semantic processing aspect. Nan and Friederici (2013) used Chinese word phrases as experimental materials, in which word meaning of each lexical item was very precise. They observed that the left inferior language region (BA45) was activated in Mandarin speakers while performing the Mandarin tone task. Previous studies have also found that the inferior frontal gyrus (BA 44, 45, 47) was responsible for semantic processing of Chinese characters (Chee et al., 2000;Chou, Chen, Wu, & Booth, 2009;Ding et al., 2003;Price et al., 2012;Tan et al., 2001). Since lexical tones are closely linked to the processing of lexical semantics, the left inferior frontal and surrounding language regions might also play a role in processing linguistic information of lexical tones.
In the present fMRI study, we used an auditory tone judgment task in which Mandarin natives decided whether or not the two syllables heard in a meaningful Chinese bisyllabic word on each trial had the same lexical tone. This design was simple in that subjects performed the task with minimal effort. Also, in this task, the lexical semantics of the bisyllabic word should be automatically activated (Neely, 1977;Tan & Perfetti, 1998). We used two-syllable words, instead of monosyllables, as stimuli, also because most words in Mandarin are disyllabic (DeFrancis, 1984;Duanmu, 2000).

Subjects
Eleven Beijing college students with normal hearing participated in this functional magnetic resonance imaging (fMRI) study (6 males and 5 females; average age ¼ 21.2 years, SD ¼ 0.41 years). All subjects were healthy, native speakers of Mandarin. They were strongly right-handed as judged by the handedness inventory devised by Snyder and Harris (1993). We adopted nine unimanual tasks (which could only be done by one hand at a time) including writing, drawing a picture, throwing a ball, holding chopsticks, hammering a nail, brushing teeth, cutting with scissors, opening a door and striking a match. A 5-point Liket-type scale was used, with "1" representing exclusive left-hand use, and "5" representing exclusive right-hand use. All subjects scored not less than 43 in total. Written consent was obtained from all participants. The study was approved by the Institutional Review Board of Beijing MRI Center for Brain Research.

Stimuli and procedure
A list of 48 meaningful Chinese bisyllabic words was produced by a 25-year-old female native Mandarin speaker and digitally edited using Adobe Audition software 7.0 (Adobe Systems, Inc.) to achieve a constant duration (1000 ms). In 24 stimulus words, the tone of the two syllables was identical (e.g./hu1//xi1/, breath). For the rest, the tones of the two syllables were different (e.g./fang1//fa3/, method). For the same-tone trials, Tone 3 words were excluded to avoid any tone confusion induced by the tone sandhi effect (i.e., when two or more consecutive syllables were third tones in a row, only the last syllable would be pronounced as third tone while the initial syllable(s) would be shifted to tone 2 obligatorily). All four tones were included in different-tone trials. All selected words were frequently used verbs, nouns or adjectives, with a frequency of occurrence not fewer than 15 per million according to the Modern Chinese Frequency Dictionary (Wang, 1986). All syllables in the disyllabic words were also commonly used, with frequency of occurrences of more than 20 per million.
Participants judged whether the bisyllabic words carried the same lexical tone. In each trial, a bisyllabic word was pre-sented through MRI-compatible earphones for 1000 ms, followed by a 1500 ms blank interval. Participants responded by pressing the corresponding key with their left or right index finger. If no response was made during the response interval, the trial was considered as incorrect. The baseline condition was silence: subjects were instructed to relax and no overt response was required. Task instructions were presented for 2 s before each block. Prior to the scan, participants were given sufficient practice; stimulus words used in the practice section did not reoccur during the scan.
The experiment was conducted in a single run. It began with a 6-s fixation crosshair, followed by a 2-s task instruction then a block of 12 lexical tone discrimination trials. The experiment consisted of four blocks of tone discrimination and the baseline. Different Chinese bisyllabic words were presented in each block to avoid any practice effect, and the presentation order was randomized across subjects. Sound stimuli were presented binaurally via a pair of MRI-compatible earphones. Sound levels were adjusted to comfortable levels for all subjects before the experiment began. All participants reported that they could hear the practice stimuli in the scanner before the experiment started.

Image analysis
The fMRI data analysis was performed with MATLAB software (Version 7.10; Mathworks, Natick, MA) and SPM8 (http:// www.fil.ion.ucl.ac.uk/spm/, Wellcome Department of Cognitive Neurology University College London, London). After data conversion, participants' data were preprocessed in batch model one by one. The first three volumes of each participant's scan were discarded, and the remaining functional images were slice-timing corrected (in ascending order, with reference slice 15) and realigned. Functional images were then co-registered with the anatomical image and segmented. The data were then spatially normalized to the Montreal Neurological Institute (MNI) stereotaxic template, resampled into 3 Â 3 Â 3 mm cubic voxels and spatially smoothed with a full-width half maximum (FWHM) isotropic Gaussian kernel of 8 mm. Participants' data were high-pass filtered at 128 s to remove low-frequency components. First level analysis was conducted on an individual basis. Data from each subject was entered into a general linear model using an event-related analysis procedure. Group analysis was done by obtaining contrast images using a second-level random-effects model. Activation patterns were evaluated by the lexical tone discrimination of bisyllabic words > silent baseline contrast with a one-sample t-test. Brain regions were estimated from Talairach and Tournoux, after adjustments for differences between MNI and Talairach coordinates (Talairach & Tournoux, 1988).

Behavioral performance
Participants were highly accurate in the experimental condition. Mean accuracy was 88.9% (SD ¼ 0.08) for the auditory lexical tone judgment of bisyllabic words (91% for the same-tone pairs and 88% for different-tone pairs). After excluding incorrect trials, the average reaction time was 1278 ms (SD ¼ 322).

Discussion
This study aimed to examine the brain basis of auditory perception of lexical tones by using natural, meaning-related experimental stimuli with a simple, explicit tone discrimination task. Despite the simple baseline task used in this study, only four sites of activation are observed (i.e. left inferior frontal gyrus, right middle/superior temporal gyrus and left superior temporal gyrus). Both the left and the right hemispheric regions are engaged in processing tones of Chinese bisyllabic words. Recently, researchers have proposed a more comprehensive hypothesis regarding the hemispheric lateralization of lexical tones. They suggested that perception of lexical tones involves the processing of two types of pitch information carried by the tonal signal, namely, the acoustic information and semantic information. Therefore, both hemispheres participated and interacted in processing lexical tones (Gandour et al., 2004;Gandour, 2006;Yu, Wang, Li, & Li, 2014). Our results are  Stereotaxic coordinates (mm) are derived from the human brain atlas of Talairach and Tournoux (1988) and refer to the peak Z scores for each region (P < 0.05 FWE-corrected) at cluster level for multiple comparisons.
consistent with this assumption that both hemispheres work together to achieve lexical tone perception. Also, this activation pattern is similar to that observed by Nan and Friederici (2013), in which meaningful word phrases were used as stimuli and lexical semantics of the syllables was processed along with the tone discrimination task. The left inferior frontal region (BA44/45) identified in this study is very close to the anterior insula. Both the inferior frontal gyrus and insular cortex have been shown to be implicated in pitch processing (Flagmeier et al., 2014;Jacquemot, Pallier, LeBihan, Dehaene, & Dupoux, 2003;Riecker, Ackermann, Wildgrber, Dogil, & Grodd, 2000) and lexical-semantic process-ing (Chan et al., 2004;Mummery, Shallice, & Price, 1999;Price, 2012;Rodríguez-Fornells, Cunillera, Mestres-Miss e, & de Diego-Balaguer, 2009;Rossell, Bullmore, Williams, & David, 2001;Rumsey et al., 1997;Tan et al., 2001). Moreover, Nan and Friederici (2013) found significant activation in the left inferior frontal region in Mandarin speakers who performed tone discrimination of Chinese word phrases relative to music phrases. In the present study, meaning is involved in the tone judgment task. We assume that the inferior frontal gyrus was responsible for the semantic processing of both the lexical tone and the bisyllabic word stimuli. Superior temporal gyrus activation occurred bilaterally, though activation in the right temporal regions was much stronger than in the left (cluster size of 159 voxels vs. 90 voxels). In the left hemisphere, marginally significant activation in the temporal cortex was focused on the superior temporal gyrus (BA 22) and extended medially to its neighboring region, the posterior transverse temporal area (BA42). Previous studies have indicated that processing of simple acoustic stimuli, such as frequency-modulated tones and sound with discontinuous acoustic patterns, activates BA 42 (Binder et al., 2000;Mirz et al., 1999). BA 42 has also been shown to be critical for auditory lexical tone processing in a recent metaanalysis (Kwok et al., 2015). The left STG has long been seen to be implicated in the basic pro-cessing of both speech and non-speech sounds in auditory research (Binder et al., 2000;Hickok & Poeppel, 2007;Price, Thierry, & Griffiths, 2005;Wong et al., 2004;Xu et al., 2006;Zatorre & Belin, 2001;Yang et al., 2015;Zhang et al., 2010Zhang et al., , 2011. Our findings suggest that the left superior temporal region is involved in the initial processing of auditory stimuli that may not be speechrelated, as reflected in its relatively weak activation. The activation cluster seen in the right temporal lobe is composed of middle and superior temporal gyrus (BA21 and BA22). The right STG has been repeatedly shown to be critical to perceptual pitch perception, vocal pitch error detection, and voice control in previous literature (Flagmeier et al., 2014;Johnsrude, Penhune, & Zatorre, 2000;Robin, Tranel, & Damasio, 1990;Zatorre, Belin, & Penhune, 2002;Zatorre & Belin, 2001;Zhang et al., 2010Zhang et al., , 2011. Moreover, several lexical tone studies have demonstrated right-lateralized cortical activity in processing tones. In a phonological recognition task, direct contrasts of Mandarin tones relative to consonants and rhymes showed stronger brain activation in right fronto-parietal network. Liu et al. (2006) found that the production of Mandarin tones elicited stronger activation in the right hemisphere than the production of vowels. Moreover, the observed activation in right MTG in this study was consistent with findings in a previous ERP study on the pre-attentive processing of Mandarin tones (Luo et al., 2006). Luo et al. found that Mandarin native-speakers showed rightlateralized activity in MTG in perceiving tones, while left hemispheric dominance was found in response to the pro-cessing of consonants. We believe that the right middle superior temporal gyrus in our study was the key region for the processing of lexical tone information. In general, our results lend support to the hypothesis that spectral processing of tones (i.e. with longer duration approximately at 150e250 ms) is lateralized to the right hemisphere (Poeppel, 2003;Zatorre, Evans, & Meyer, 1994;Zatorre & Belin, 2001). As mentioned earlier, stronger brain activation was observed in right temporal regions than that in the left. One possibility was that the right temporal areas were responsible for numerous roles in auditory lexical tone perception. The middle temporal gyrus has been implicated in lexical semantic processing according to the dual-stream model of speech processing (Hickok & Poeppel, 2000 and in idiom comprehension studies (Mashal, Faust, Hendler, & Jung-Beeman, 2008;Zempleni, Haverkort, Renken, & Stowe, 2007;Zhao et al., 2013). Thus, apart from acoustic processing of lexical tones, we believe that the right middle superior temporal gyrus was also involved in lexical-semantic processing.
Thus, our study has contributed to the identification of the neural systems for the basic auditory perception of lexical tones. The left inferior frontal/insula region is seen as responsible for semantic processing of tonal stimuli. The left superior temporal gyrus is involved in the primary acoustic analysis of the auditory stimuli, whereas the right middle superior temporal gyrus participates in the more complex auditory analysis of lexical tone information, as well as lexical-semantic processing. Methodologically, our study introduces a new, meaning-related tone discrimination paradigm in examining the perception of lexical tones with least effort. Further investigation is needed to investigate the connectivity of cortical regions critical for speech tone perception in order to unveil the correlations between the regions and the influence that one cortical region exerts over another.