A physiologically based model for temporal envelope encoding in human primary auditory cortex

Communication sounds exhibit temporal envelope fluctuations in the low frequency range (<70 Hz) and human speech has prominent 2-16 Hz modulations with a maximum at 3-4 Hz. Here, we propose a new phenomenological model of the human auditory pathway (from cochlea to primary auditory cortex) to simulate responses to amplitude-modulated white noise. To validate the model, performance was estimated by quantifying temporal modulation transfer functions (TMTFs). Previous models considered either the lower stages of the auditory system (up to the inferior colliculus) or only the thalamocortical loop. The present model, divided in two stages, is based on anatomical and physiological findings and includes the entire auditory pathway. The first stage, from the outer ear to the colliculus, incorporates inhibitory interneurons in the cochlear nucleus to increase performance at high stimuli levels. The second stage takes into account the anatomical connections of the thalamocortical system and includes the fast and slow excitatory and inhibitory currents. After optimizing the parameters of the model to reproduce the diversity of TMTFs obtained from human subjects, a patient-specific model was derived and the parameters were optimized to effectively reproduce both spontaneous activity and the oscillatory part of the evoked response.


INTRODUCTION
Temporal envelope characterizes speech sounds. This component largely contributes to understand a vocal message [17]. Temporal envelope is coded in two ways by neurons of the auditory pathway. Some neurons discharge synchronously with envelope for a given modulation frequency range, others have their discharge rate that increases when the frequency of modulation is close to a specific value [19]. This last coding is a rate place one because information about amplitude modulation depends on firing rate and neuron's position. Here we focus on the processing of temporal information. That is why only synchronous coding will be studied.
An auditory pathway model from outer ear to primary auditory cortex is presented. This work is interesting because the model is based on physiology and includes each stage of temporal envelope processing. This model has to reproduce amplitude modulation processing. Its response to amplitude modulated white noise will be compared to human data using temporal modulation transfer function [13]. To our knowledge, there is no other existing model which simulates responses to these stimuli from outer ear to the primary auditory cortex.

AUDITORY PATHWAY MODEL
Modeling work involves simplifications. These are the result of a balance between main processing and available data. The model presented here is constrained by the following strong limitations. Firstly, only afferent pathway is modeled. Secondly, there is no difference between left and right pathways. Thirdly, there is no connection between left and right ears in the pathway involved in amplitude modulation processing. Binaural stimuli are used in most of the experimentations that address the question of auditory temporal processing. This method is supposed to avoid localization tasks but this also hide left and right temporal envelope processing specificity.
Based on Hewitt [7,8] and Jansen [9] previous works, this model is composed of two parts. The first part is a four stage model dealing with the basilar membrane, the inner hair cells, the cochlear nucleus and the inferior colliculus. It is based on single unit responses. The second part of the model, based on neuronal population responses, consists in three interconnected neuronal populations representing: the medial geniculate body (MGB), the thalamic reticular nucleus (TRN) and the primary auditory cortex (PAC).

From outer ear to inferior colliculus
The stimulus is at first filtered by the outer and middle ear. Then it is analyzed by passing through the basilar membrane which can be considered as a filter bank. The base of the basilar membrane vibrates for high frequency components whereas the apex vibrates for low frequency components. This is the origin of the tonotopic organization of the auditory pathway. To model this membrane, the Dual Resonance Non Linear model of Lopez-Poveda [14] is used. It takes into account the compressive property of the membrane. In several points of the basilar membrane defined by their center frequency (CF), oscillations close to their CF induce a move on inner hair cells which realize the signal transduction.
The inner hair cell model reproduces some of its main behaviors: signal rectification and adaptation to auditory nerve fibers [18]. The auditory nerve transmits the signal to cochlear nucleus cells. The auditory nerve signal is the realization of a geometrical law of the firing probability corresponding to the number of neurotransmitters in the synaptic cleft. The realization of the law is the number of fibers that fire.
In the cochlear nucleus, a group of neurons is connected to the same inner hair cell and consequently has the same CF. This group is divided in subgroups, each one is characterized by the modulation frequency for which it produces the most synchronized response. This modulation frequency is called the BMF (best modulation frequency). Neurons of the cochlear nucleus project in the inferior colliculus where they Figure 1: Structure of the model from outer ear to inferior colliculus. This first part reproduces single neuron response to amplitude modulated stimuli. excite neurons that detect coincidence (Guérin et al. [5] studied different connectivity patterns). In this study, a subgroup of cochlear nucleus neurons of same CF and same BMF innervates an inferior colliculus neuron. To model neurons of the cochlear nucleus and those of the inferior colliculus, a McGregor model is used [15]. Figure 1 illustrates the first part of the model (four stages). The inferior colliculus output signals form a matrix of signals indexed by their CFs and BMFs. This part of the model is an updated version of Hewitt's model [8] which was validated with amplitude modulated stimuli. This version also reproduces responses to amplitude modulated stimuli close to physiological data in terms of firing rate and synchronization with the envelope.

From medial geniculate body to primary auditory cortex
The second part of the model is fed by inferior colliculus outputs to obtain the signal recorded in the PAC. Inferior colliculus outputs are integrated over CFs and BMFs to form a neural population response. Here we sum signals with CFs in the range [500, 5000] Hz and BMFs (at 20dB) in [60, 300] Hz. This integrated signal corresponds to an average pulse density of inferior colliculus afferent in MGB.
MGB receives excitatory afferent from inferior colliculus and PAC [16]. It also receives inhibitory afferent from TRN. Bushy cells are the main neurons of the MGB, they excite PAC and TRN neurons. These are excitatory neurons having AMPA, GABAa and GABAb receptors [1].
TRN receives excitatory afferent from MGB and PAC [16] and produces inhibition in MGB via GABAa and GABAb.
Auditory cortex is organized in layers and columns. Cytoarchitectonic studies show that the cortex can be divided in six layers, from layer I at the surface of the cortex to layer VI at the other end. Moreover, there are neurons whose ramifications run perpendicularly to the surface of the cortex. A group of about thousand neurons (200 to 500 µm wide) forms a cortical column. Their functional role is not known but neurons of a cortical column respond similarly to the same stimulus. In our model, a single cortical column of the PAC is considered. The model of cortical column is simplified and takes into account only two layers: layer IV that receives MGB input and layer V that sends corticothalamic feedback (directly and via TRN). PAC receives excitatory afferent from MGB. This cortical column model is composed of a principal neuronal population of pyramidal neurons and three subtypes of interneurons: excitatory interneurons, fast and slow inhibitory interneurons (that act respectively on GABAa and GABAb pyramidal receptors). Figure 2 gives the structure of the second part of the model. It represents interconnections between neuronal populations (AMPA, GABAa and GABAb boxes). Based on Jansen's work [9], a population model consists in two stages: the first one transforms the pulse density into post-synaptic potential via a causal linear filter. The second one transforms the post-synaptic potential into pulse density via a sigmoidal function. The causal linear filter impulse response is: 5" 1"" 15" 2"" 25" 3"" 35" 4"" 45" !'  Figure 3: Comparison between human PAC background activity (circle) (extracted from [4]) and background activity given by the model (asterisk).
A is the amplitude constant and a is the time constant. The non linearity introduced in the model by the sigmoidal function is defined by: where x(t) and y(t) are respectively the input and the output of the non linearity, r = 0.56 is the compression ratio and v 0 = 6 the maximum output level. Table 1 lists the values corresponding to each kind of receptor. These values have been set to reproduce excitatory and inhibitory post-synaptic potentials ( [3] for AMPA and GABAa, [11] for GABAb receptors

DATABASE
To be validated, the model must reproduce different behaviors of human response to amplitude modulated noise. The database used to compute validation criteria is presented here and these ones will be presented in sections 4.1 and 4.3.  Twenty epileptic patients suffering from drug-resistant partial epilepsy participate in this study and are implanted with chronic SEEG (stereoelectroencephalographic) electrodes in various cortical structures (left or right hemispheres) for epileptic studies. Physiology of their auditory cortex has been studied previously and considered as normal.

AMPA GABAa GABAb
The patients are informed about the research protocol during SEEG monitoring and gave their fully informed consent for participating in this study. Stimuli are 1 s amplitude modulated white noise. They are shaped by rising and falling 25 ms cosine ramps to avoid auditory response to sudden sound rise. Sounds are presented binaurally via headphones to the listener by series of 50 to 100 stimuli of two randomly alternated modulation frequencies (4/32 Hz, 8/64 Hz, 16/128 Hz) with a 100 % modulation depth.

Background activity
The first step of this study was to tune the cortical column model by optimizing coupling parameters to reproduce background activity recorded in humans. More precisely, the averaged spectrum of signals recorded in PAC when the patient is not stimulated was computed. The background activity is modeled with a seven order AR model [4] (according to bayesian and Akaike information criteria). Cortical column tuning was achieved by minimizing the mean squared error between the AR model and the spectrum of a 10 s cortical column output signal. Figure 3 illustrates the result for the parameters given in table 2. The model we proposed reproduces with a slight error the AR spectrum.

Temporal response
An epoch is the response to a stimulus presented once and recorded via an electrode. For a given stimulus, an average of epochs forms the auditory evoked potential (AEP). The average process reduces additive noise and enhances responses which are phase locked with the stimulus. Figure 4.A shows an AEP measured in the PAC corresponding to a 16 Hz amplitude modulated white noise [4]. This AEP is composed of three parts: the transient response, the oscillatory response and the response to the stimulus ending. Oscillation frequency corresponds to the stimulus modulation frequency. This highlights the important amount of neurons that code amplitude modulation in a temporal way (only for low modulation frequencies). Figure 4.B shows the AEP given by the model in response to the same stimulus. AEP produced by the model is also composed of transient and oscillatory parts. The transient response of the model rises slowly compared to that of the database. As observed on human AEPs, oscillation frequency corresponds to the stimulus amplitude modulation one. AEPs oscillatory parts are evaluated in section 4.3.

Temporal modulation transfer function
Responses to amplitude modulated stimuli are characterized by temporal modulation transfer functions (TMTFs). These curves show the evolution of the degree of the response synchronization (temporal coding) with the modulation frequency. These curves are defined for a given stimulus level because the auditory pathway is a non linear system. The amplitude of the highest spectral peak in an interval around the modulation frequency is used to evaluate the synchronization of the response. The more synchronized the response and the stimulus, the greatest the amplitude of the oscillation in the AEP. Several methods exist to evaluate this amplitude. Gourévitch [4] compared the most common ones. Here we used the one he advocates.
Let X ( j) (n) be the epoch recorded during the j th stimulation and n the current time sample. AEP is defined by: X(n) = 1 J ∑ J j=1 X ( j) (n) with J the number of recorded epochs corresponding to stimuli with the same modulation frequency. S X ( j) ( f ) = 1 N ∑ N n=1 X ( j) (n) e −2 i π f fe n is the discrete Fourier transform of X ( j) (n), with N the number of samples in the oscillatory part of the response and f e the sampling frequency. Power spectrum density of the oscillation is eval-uated computing epochs cross-spectra: The asterisk is the conjugate complex operator. Then we All TMTFs extracted from the database were grouped into classes according to their modulation frequency selectivity [4]. The model reproduces the three main classes obtained on database (see table 3). This is achieved by varying coupling parameters between structures in order to minimize the mean quadratic error between the two curves. Figure 5 illustrates this result. It appears that class 1 is characterized by a weak coupling between MGB and TRN whereas the transmission from PAC to TRN is strong. At the opposite, class 3 is obtained for strong connection between MGB and TRN and weak feedback. As class 1, class 2 requires strong PAC feedback and particularly via GABAa inhibition, because there is no GABAb inhibition.

DISCUSSION
As illustrated in figure 4, the simulated cortical response is smoother than the one of the database. This can be explained by the fact that SEEG electrodes record activity from other layers and from cortical columns of other cortical areas. This activity is not necessarily temporally correlated with the stimulus and a residual activity may be observed after the averaging process. The transient part of the response is poorly reproduced but the main source of this transient wave seems to be in the secondary auditory cortex [6]. This wave is recorded in PAC because of its electromagnetic propagation. The use of one cortical column may appear as an underestimation of the PAC complexity. MGB, PAC and TRN are tonotopically organized but amplitude modulated white noise stimulus is not frequency selective. One can suppose that the response to amplitude modulated white noise is similar across CF. Langner and colleagues [12] showed a specific modulation frequency selectivity perpendicular to the tonotopic axis but this result was not confirmed [10]. Stimuli spectra and lack of information about PAC organization lead to only one column model. However, results obtained with this simplified structure help us to understand the main connections in the amplitude modulation process. The three classes reproduced represent 65% of the TMTFs recorded in the PAC. To obtain these classes, coupling coefficients between the three structures were adjusted whereas the PAC cortical column coefficients (C1, C2, C3, C4, C5 and C6) were unchanged. In the database, some patients present several recording sites in PAC. Each recording site has a different TMTF. So, it is possible to find these classes in the PAC of one patient. This supports our approach because the connectivity pattern may be different for columns that have the same organization and which produce the same background activity.
To our knowledge, there is no characterization of AEP in response to amplitude modulated stimuli recorded in the TRN. Concerning MGB, single neuron responses to amplitude modulated stimuli were studied. Creutzfeld [2] observed that a neuron which receives MGB afferent can code temporally amplitude modulation for frequencies up to 20 Hz whereas its afferent neuron can follow modulation frequencies up to 200 Hz. This phenomenon is supposed to be a consequence of synaptic depression. Unfortunately, the model presented here does not take into account such a mechanism. That is why this behavior is not reproduced. This will lead us to include these mechanisms to refine the model.