Multisensory representations of space and time in sensory cortices

Abstract Clear evidence demonstrated a supramodal organization of sensory cortices with multisensory processing occurring even at early stages of information encoding. Within this context, early recruitment of sensory areas is necessary for the development of fine domain‐specific (i.e., spatial or temporal) skills regardless of the sensory modality involved, with auditory areas playing a crucial role in temporal processing and visual areas in spatial processing. Given the domain‐specificity and the multisensory nature of sensory areas, in this study, we hypothesized that preferential domains of representation (i.e., space and time) of visual and auditory cortices are also evident in the early processing of multisensory information. Thus, we measured the event‐related potential (ERP) responses of 16 participants while performing multisensory spatial and temporal bisection tasks. Audiovisual stimuli occurred at three different spatial positions and time lags and participants had to evaluate whether the second stimulus was spatially (spatial bisection task) or temporally (temporal bisection task) farther from the first or third audiovisual stimulus. As predicted, the second audiovisual stimulus of both spatial and temporal bisection tasks elicited an early ERP response (time window 50–90 ms) in visual and auditory regions. However, this early ERP component was more substantial in the occipital areas during the spatial bisection task, and in the temporal regions during the temporal bisection task. Overall, these results confirmed the domain specificity of visual and auditory cortices and revealed that this aspect selectively modulates also the cortical activity in response to multisensory stimuli.

involved, with auditory areas playing a crucial role in temporal processing and visual areas in spatial processing. Given the domain-specificity and the multisensory nature of sensory areas, in this study, we hypothesized that preferential domains of representation (i.e., space and time) of visual and auditory cortices are also evident in the early processing of multisensory information. Thus, we measured the event-related potential (ERP) responses of 16 participants while performing multisensory spatial and temporal bisection tasks. Audiovisual stimuli occurred at three different spatial positions and time lags and participants had to evaluate whether the second stimulus was spatially (spatial bisection task) or temporally (temporal bisection task) farther from the first or third audiovisual stimulus. As predicted, the second audiovisual stimulus of both spatial and temporal bisection tasks elicited an early ERP response (time window 50-90 ms) in visual and auditory regions. However, this early ERP component was more substantial in the occipital areas during the spatial bisection task, and in the temporal regions during the temporal bisection task. Overall, these results confirmed the domain specificity of visual and auditory cortices and revealed that this aspect selectively modulates also the cortical activity in response to multisensory stimuli.

K E Y W O R D S
ERP, multisensory processing, sensory cortices, space perception, time perception 1 | INTRODUCTION Humans constantly combine information from different senses, which provide complementary representations of the surrounding environment. In this melting pot of sensory information, different senses are more accurate in processing specific environmental properties. For example, vision allows a complete representation of the surrounding space by receiving detailed spatial information directly from the retina (Alais & Burr, 2004). At the same time, hearing is the most accurate sense in representing temporal information (Barakat et al., 2015;Burr et al., 2009;Guttman et al., 2005). Monica Gori and Giorgia Bertonati contributed equally to this study.
The strong association of visual and auditory sensory modalities with a specific domain of representation (i.e., space and time) suggested that the recruitment of the visual and auditory cortices might be necessary for building high-resolution spatial and temporal representations, respectively. Indeed, vision is crucial for aligning neural representations of space also for other sensory modalities (King, 2009(King, , 2014 and the visual cortex is not solely involved in processing visual input (Romei et al., 2009;Vetter et al., 2014). In this regard, a past study revealed that occipital areas supported the neural processing underlying complex spatial representations of sighted individuals in the acoustic modality (Campus et al., 2017). Similarly, auditory areas were proven not to be involved exclusively in acoustic processing (Rosenblum et al., 2017), but to support also the visual representation of a complex temporal metric (Amadeo et al., 2020a).
Studies on sensory deprivation offered further evidence in this direction. Results found that people with visual impairment were impacted when processing some auditory spatial tasks since the lack of vision did not provide a full development of their auditory spatial maps (Gori et al., 2014;Vercillo et al., 2016;Voss et al., 2015;Zwiers et al., 2001; reviewed in Gori et al., 2020). This observation was completed by a reduced occipital response for acoustic space perception of early blind individuals Gori et al., 2020;Tonelli et al., 2020) where the cumulative number of years spent without vision gradually impacted this occipital activation pattern in response to sounds (Amadeo et al., 2019a;2020b). However, blind individuals do not show deficit in all spatial skills. For example, people with visual impairment are able to localize sounds in the space (Battal et al., 2020), to generate mental images of tactile spatial layouts (Cattaneo et al., 2008), and to perform spatial orientation tasks (Fortin et al., 2006). A parallel between vision and audition can be made in the case of deaf individuals who were shown to be impaired in some temporal processing, as a visual temporal bisection task and a tactile temporal discrimination task Amadeo et al., 2022;Bolognini et al., 2012;Gori et al., 2017), but not in others.
For instance, deaf people can well estimate the duration of visual stimuli (Poizner & Tallal, 1987) and perform a visual temporal order judgment task (Nava et al., 2008). These findings about the effects of sensory deprivation on the spatial and temporal abilities of blind and deaf individuals may seems controversial. Nonetheless, what could make the difference in these apparently conflicting results is the kind of task used to explore the spatial and the temporal skills. For instance, two tasks in which visually impaired individuals and deaf people were found to be particularly affected are the spatial and the temporal bisection tasks. In the bisection, three stimuli are reproduced in sequence, the second stimulus is randomly delivered at two different spatial positions and temporal lags, and participants evaluate whether this second stimulus is spatially (spatial bisection) or temporally (temporal bisection) farther from the first or the third stimulus.
These tasks involve the construction of a metric as they explicitly require participants to spatially and temporally compare external stimuli with each other. It has been suggested that in the bisection the lack of vision may affect the ability to compare the different inputs in space, and the lack of audition the different stimuli in time.
To sum up, past studies suggested that the recruitment of the visual and auditory areas selectively underlies the development of some skills involving spatial and temporal domains, and that the lack of this neural activation in case of sensory deprivation may affect the shaping of fine spatiotemporal representations. Conversely, this mechanism, when developed, is activated independently of the sensory modality involved, suggesting a domain-specific supramodal organization of the brain for which the domains of representation (i.e., space or time), rather than the sensory modalities, primarily shape the human perception (Amedi et al., 2017;Cecchetti et al., 2016;Heimler & Amedi, 2020;Heimler et al., 2015;Ricciardi et al., 2014Ricciardi et al., , 2020Rosenblum et al., 2017).
Studies on the neural mechanisms underlying multisensory perception support the view that the sensory modality is no longer the primary organizing principle of the sensory brain's architecture. Traditionally, multisensory functions have been considered the domain of association cortices as the superior temporal sulcus (Beauchamp, 2005), the intraparietal area (Andersen et al., 1997), and the frontal cortex (Fuster et al., 2000). Nowadays, a body of research showed that also occipital and temporal areas could support the encoding of multiple sensory modalities (Bueti & Macaluso, 2010;Giard & Peronnet, 1999;Molholm et al., 2002;van Wassenhove & Grzeczkowski, 2015), with anatomical substrates that were noted to sustain multisensory processing at low levels of cortical processing (Cappe & Barone, 2005;Falchier et al., 2002;Rockland & Ojima, 2003). Consequentially, multisensory influences emerged to take place on all levels of cortical processing, suggesting that the neocortex is essentially multisensory (Ghazanfar & Schroeder, 2006). Finally, research revealed that the encoding of multiple sensory information extended over a wide range of time latencies. For instance, multisensory processes were shown to occur also within the first 100 ms poststimulus onset (early-latency multisensory interactions [eMSI];reviewed in De Meo et al., 2015) and to directly shape perception and behavior even at these early stages of multisensory encoding (Cappe et al., 2010;Fort, Delpuech, Pernier, Giard, & Thomas, 2002;Gondan & Röder, 2006;Raij et al., 2010;Teder-Sälejärvi et al., 2002).
Despite the increasing knowledge of the mechanisms underlying multisensory perception, it is still not clear whether or not and how the multisensory nature of sensory areas is modulated by the domain specificity implicit in visual and auditory cortices. In other words, whether the preferential domains (i.e., space and time) revealed in sensory areas also influence multisensory processing at the cortical level. Given that visual and auditory regions play an important role in scaffolding the spatial and temporal processing respectively (Amadeo et al., 2020a;Campus et al., 2017Campus et al., , 2019, and that these cortical areas are multisensory in nature too (Bueti & Macaluso, 2010;Giard & Peronnet, 1999;Molholm et al., 2002;van Wassenhove & Grzeczkowski, 2015), we hypothesized that the domains of representation would modulate the cortical activation to multisensory stimuli. More specifically, we expected to find a preferential activation of visual areas for multisensory spatial processing, and of auditory areas for multisensory temporal processing, and that this specialized mechanism would occur at early stages of multisensory processing.

| Participants
A group of 16 adults participated in the study (9 females, mean age ± SD: 24 ± 2.95 years old). Based on a meta-analysis of previous studies testing the neural correlates of spatial and temporal abilities of healthy adults (Amadeo et al., 2020a;Campus et al., 2017), we expected a large effect size. A priori power analysis revealed that a minimum sample size of 15 participants was needed to statistically detect such an effect size (two-tailed t-test, power 0.80, alpha .05). All participants reported no history of neurological, cognitive, and/or sensory deficits. The study was approved by the ethics committee of the local health service (Comitato etico, ASL 3 Genova) and conducted in line with the Declaration of Helsinki. All participants gave written informed consent prior to testing. All participants performed a spatial bisection task and a temporal bisection task. In both tasks, a trial consisted of three-audiovisual (AV) stimuli (namely S1-S3) played at three different spatial positions and time lags. An AV stimulus consisted of a single sound (60 db SPL at ears' level, 500 Hz) spatially aligned with a single red flash (2.3 diameter, 20 cd/m 2 luminance), presented for 75 ms. The spatial and temporal proximity of the auditory and the visual stimulations allowed participants to perceive them as originating from exactly the same source ( Figure 1b). S1 and S3 were always played at À25 and +25 , respectively, and separated by a fixed time interval of 1.5 s. From trial to trial, S2 could be presented randomly from either À2.3 or +2.3 in space, and at either À250 ms or +250 ms in time (with 0 ms representing the middle of the 1.5 s temporal sequence). We chose these spatial positions and time lags on the basis of previous studies' participants' psychophysical performance (for more details see Gori et al., 2012Gori et al., , 2014Vercillo et al., 2016).

| Setup, stimuli, and procedure
Four conditions were possible according to this experimental design ( Figure 2): (a) S1-S2 distance/interval narrow in space and short in time (i.e., S2 at À2.3 and À250 ms; Figure 2a); (b) S1-S2 distance/interval narrow in space and long in time (i.e., S2 at À2.3 and +250 ms; Figure 2b); (c) S1-S2 distance/interval wide in space and long in time (i.e., S2 at +2.3 and +250 ms; Figure 2c); and (d) S1-S2 distance/interval wide in space and short in time (i.e., S2 at +2.3 and À250 ms; Figure 2d). In conditions (a) and (c), the spatial and temporal components of the AV stimuli were coherent, in conditions (b) and (d) they were conflictual.
The AV stimuli and conditions were identical in both tasks, which differed only in relation to the experimental question. More specifically, in the spatial bisection task participants evaluated whether S2 was spatially farther from S1 or S3, whereas in the temporal bisection task they evaluated whether S2 was temporally farther from S1 or S3. For each task, answers were provided after the presentation of S3 by subjects pressing the button corresponding to S1 or S3. The two tasks were counterbalanced across subjects in two separated blocks, and participants could take a break between them. Each block consisted of 240 experimental trials and 15 catch trials (in which S2 was delivered at 0 and at 0 ms to check for participants' stereotypical responses). Participants were asked to maintain a stable head position that was continuously monitored by the experimenter, together with the electrooculogram (EOG) signal.

| Electroencephalography (EEG) data collection and preprocessing
We recorded EEG from 64 active scalp electrodes using the Biosemi ActiveTwo EEG System. Electrode offsets were kept below 30 mV. A first-order analog antialiasing filter with a half-power cutoff at 3.6 kHz was applied. Data were acquired at 2048 Hz and then downsampled to 512 Hz with a bandwidth of DC to 134 Hz. The EEG recording was referenced to a common mode sense active electrode and a driven right leg passive electrode. To check horizontal ocular movements, two additional electrodes were positioned at the left and right outer canthi for the EOG recordings.
The EEG was filtered between 0.1 and 100 Hz. We removed transient stereotypical (e.g., eye blinks) and non-stereotypical (e.g., movement or muscle bursts) high-amplitude artifacts by applying the artifact F I G U R E 1 Experimental setup. (a) A horizontal array of 23 free-field speakers and 23 light emitting diodes (LEDs). (b) Detail of one speaker spatially aligned with one LED. In each trial, a single sound was simultaneously reproduced with a single red flash. Participants reported the auditory and visual stimulations as originating from exactly the same source subspace reconstruction (ASR) method (Mullen et al., 2015) implemented by the EEGLAB plug-in (Delorme & Makeig, 2004). 500 ms sliding windows of EEG data were decomposed via principal component analysis and compared with data from a clean baseline EEG recording. Within each sliding window, the ASR algorithm identifies principal subspaces which significantly deviate from the baseline and then reconstructs these subspaces using a mixing matrix computed from the baseline EEG recording. In this study, a threshold of 3 SD was used to identify corrupted subspaces. Moreover, channels were removed if their correlation with the other channels was <0.85, or if their line noise relative to its signal was more than 4 SD on the basis of the total channel population. Whenever the fraction of contaminated channels exceeded the threshold of 0.25, we removed time windows.
We further cleaned EEG data using independent component analysis (ICA) with two EEGLAB toolboxes namely SASICA (Chaumon et al., 2015) and IC_MARC (Frølich et al., 2015), keeping all parameters as their default. For component rejection, we followed criteria reported in the corresponding validation papers and based rejection on the abnormal topographies and/or spectra. Data were referenced to the average of the left and right mastoids (TP7 and TP8 electrodes).

| Behavioral-level and sensor-level analysis
Behavioral performance was computed as the percentage of correct responses for each task.
In regards to the neurophysiological data, we compared the neural response to S2 with that to S1 for the spatial and temporal bisection tasks separately. Previous studies involving unisensory stimuli (visual stimuli: Amadeo et al., 2020a; auditory stimuli: Amadeo, Campus et al., 2019;Campus et al., 2017Campus et al., , 2019 already showed that S2 represents the starting point for the development of spatial and temporal metrics correlated with an early contralateral activation of occipital and temporal areas, respectively. On the contrary, S1 was taken as control since fixed in space and time, and S3 was not considered in the analysis since potentially involving more complex processing related to the metric definition. We hypothesized to find a similar pattern of early activation also with multisensory stimuli. To obtain the event-related potentials (ERPs), we considered a time window of 200 ms before S1 onset as baseline and we averaged EEG data in synchrony with S1 or S2 onset, separately for the two tasks. For each participant, a minimum of 100 trials per block was required for each ERP. After artifacts rejection, the total number of trials for each ERP was equal to 1707, $107 per participant.
As in previous studies (Amadeo, Campus, & Gori, 2019a, 2020aCampus et al., 2017Campus et al., , 2019, analysis focused on electrodes related to visual (O1 and O2 in occipital areas) and auditory (T7 and T8, in temporal areas) processing. Always in accordance to these studies, a time window of between 50 and 90 ms after the stimulus occurred was defined as a crucial interval for the earliest stages of multisensory integration. Thus, for both tasks we computed mean ERP amplitude by averaging the voltage in this time window. We then collapsed ERP waveforms across conditions and hemispheres of recording to obtain ERPs recorded on the contralateral and the ipsilateral hemisphere with respect to stimulus position in space (e.g., occipital contralateral response: ERP amplitude to stimulus at À2.3 recorded from O2 electrode; occipital ipsilateral response: ERP amplitude to stimulus at À2.3 recorded from O1 electrode). Consequently, lateralized ERP responses were calculated as the difference between the contralateral F I G U R E 2 Four experimental conditions according to S2 spatial and temporal features. (a) S2 from À2.3 at À250 ms, (b) S2 from À2.3 at +250 ms, (c) S2 from +2.3 at +250 ms, and (d) S2 from +2.3 at À250 ms. S1 and S3 were always delivered at À2.3 and +2.3 , respectively and ipsilateral ERP recordings. We performed statistical comparisons by running analysis of variance (ANOVA) on the lateralized mean ERP responses, considering as factors: Area (Occipital, Temporal), Task (Spatial bisection, Temporal bisection), and AV stimulus (S1, S2). Paired two-tailed t-tests were performed as post hoc comparisons with alpha level set at .05 after Bonferroni correction.

| Source-level analysis
In order to estimate the cortical generators of the ERP components, we performed a distributed source analysis using the Brainstorm software (Tadel et al., 2011), similarly to procedures used in previous studies (Amadeo et al., 2020a;Campus et al., 2017Campus et al., , 2019Gori et al., 2020). Data were re-referenced to the common average. We

| RESULTS
A group of 16 participants performed a spatial and a temporal bisection task in which three AV stimuli were reproduced in sequence and the second of these stimuli randomly delivered at two different spatial positions and according to two separate temporal lags. Participants evaluated whether the second stimulus was spatially or temporally farther from the first or the third stimulus. During both tasks, EEG was recorded and behavioral data were collected.

| Behavioral performance
Behavioral performance was calculated as the percentage of correct responses. Participants performed equally well in the two tasks (t [15] = 1.80, p = .091, Cohen's d = 0.45, 95% CI = [À0.08, 0.98]). This observation allowed for the exclusion of any effect of task difficulty on the cortical responses associated with the two bisection tasks.

| Sensor-level analysis
In Figure 3a, the scalp topographies of the mean ERP in the 50-90 ms time window after S1 show a positivity involving the temporal and the occipital areas contralateral to the AV stimulus position in space (always À25 ). The activation pattern appears similar between the temporal and the spatial bisection tasks and likely reflects multisensory cortical processing in the C1 time window (50-90 ms), which findings were already revealed in the previous literature (Cappe et al., 2010;Giard & Peronnet, 1999;Molholm et al., 2002;Murray, Lewkowicz, et al., 2016;Murray, Thelen, et al., 2016;reviewed in De Meo et al., 2015). In parallel, the scalp maps depicting the same time window after the S2 onset ( Figure 3b) show a more prominent positivity than S1 in occipital areas for the spatial bisection task and S1 in temporal areas for the temporal bisection task, always lateralized with respect to AV stimulus position in space.
We demonstrated these results by running an ANOVA on latera- The Task (Spatial, Temporal) X AV Stimulus (S1, S2) follow-up ANOVA on temporal regions revealed a contralateral temporal activity that was higher during the temporal bisection task than during the spatial bisection task, independently of the stimulus (F[1,15] = 26.76, p < .001, η 2 p = 0.64, 95% CI [0.28, 0.80]). However, a significant interaction between Task and AV Stimulus (F[1,15] = 51.63, p < .001, η 2 p = 0.77, 95% CI [0.51, 0.88]) suggested that the gain modulation was not similar for the temporal bisection task between S1 and S2.   Figure 5). This neural response was stronger during the spatial bisection task than during the temporal bisection task. We also observed a not lateralized modulation of later neural response P140 specific to the spatial bisection task, in agreement with previous studies (Amadeo et al., 2020a;Campus et al., 2017Campus et al., , 2019Gori et al., 2020). Finally, a modulation occurring in a late poststimulus time window (250-450 ms), more pronounced for the spatial task, was detected, likely involving the auditory-evoked contralateral occipital activation (Feng et al., 2014;McDonald et al., 2013). Over the temporal scalp (Figure 5), an early ERP component contralateral to S2 position in space was stronger during the temporal bisection task. This activation resembled the N1 component usually elicited by auditory F I G U R E 4 Lateralized mean eventrelated potential (ERP) amplitude (i.e., difference between the contralateral and ipsilateral ERP responses) in the selected time window (50-90 ms) after S1 and S2 of the two bisection tasks in occipital (left panel) and temporal (right panel) areas. Error bars indicate SEM F I G U R E 3 Scalp maps of the mean event-related potential amplitude in the 50-90 ms time window after S1 (a) and S2 (b), for the spatial (top) and temporal (bottom) bisection task. On top, a schematic representation of each condition: S1 (a) was always delivered at À25 and À750 ms. S2 (b) was randomly delivered at either À2.3 (b, left panel) or +2.3 (b, right panel) in space, and at either À250 or +250 ms in time stimuli, and also recalled the multisensory responses observed at very short latencies (Calvert & Thesen, 2004;Giard & Peronnet, 1999).

| Source-level analysis
To provide further evidence that the early activation of the temporal and occipital areas was actually involving the auditory and visual cortices respectively, we performed a source-level analysis ( Figure 6). Considering the neural response at S2, the source analysis showed that both bisection tasks elicited a cortical response contralateral to the stimulus spatial position in occipital and temporal regions. However, when performing comparison at source level between the two bisection tasks, we observed that the early activation of occipital regions was stronger during the spatial bisection than during the temporal bisection task, while the neural response of temporal areas was more widely evoked by the temporal bisection task. Even if also with source analysis it was hard to define the exact generators of this neural activity, the early latency of the response (50-90 ms), together with the neural activation covering a wide region of the temporal and occipital lobes, suggested that the two tasks were probably evoking a neural response involving the auditory and visual cortices. Paired twotailed t-tests confirmed the significant differences between the two tasks in the recruitment of the auditory and visual cortices.

| DISCUSSION
Our environment determines the sense that is the most reliable for processing specific information (Welch & Warren, 1980) by selecting vision as the most appropriate sense for spatial judgments and audition for temporal processing. In this scenario, visual cortices play a pivotal role in spatial representations and auditory regions in the temporal representations, independently of the inputs' sensory modality (Amadeo et al., 2020a;Campus et al., 2017). Indeed, some cortical regions process the sensory inputs in a modality-independent manner, since mainly driven by specific computations rather than by specific sensory information (Amedi et al., 2017;Cecchetti et al., 2016;Heimler & Amedi, 2020;Heimler et al., 2015;Ricciardi et al., 2014Ricciardi et al., , 2020Rosenblum et al., 2017). The idea that the sensory cortices are innately specialized is further challenged by multisensory operations occurring at all levels of cortical processing (Ghazanfar & Schroeder, 2006).
In this study, we recorded behavioral data and ERPs in 16 participants performing audiovisual temporal and spatial bisection tasks, to test the hypothesis that the domain-specific organization of visual and auditory brain areas also subsists at multisensory level. Participants evaluated whether, in a sequence of three audiovisual stimuli, the second stimulus (S2) was spatially (spatial bisection task) or temporally (temporal bisection task) farther from the first or the third audiovisual stimulus. Our results showed a S2 selective early activation (50-90 ms) of temporal regions that were stronger when encoding the audiovisual stimuli in a temporal bisection task than in a spatial bisection task. This early response recalled some aspects of the N1 component usually elicited by auditory stimuli (Näätänen & Picton, 1987) and originated in a wide temporal region that presumably involved the auditory cortex. This area generally works together with regions such as the superior temporal sulcus to coordinate many multisensory processes ). Complementarily, we found an occipital response resembling the visual-evoked C1 that was still a selective S2 response but larger for the spatial bisection task than for the temporal bisection task. Our findings integrated past studies using unisensory stimuli that support a crucial role of the visual and auditory cortices in spatial and temporal representation, respectively (visual stimuli: Amadeo et al., 2020a;auditory stimuli: Amadeo, Campus et al., 2019;Campus et al., 2017Campus et al., , 2019. In addition to these evidences, this study showed that the domain-specificity of sensory areas acts also within a multisensory framework. In many past studies, visual and auditory cortices have been shown to support the encoding of multiple sensory modalities, and this observation suggested that the high-level associative cortices do not hold the absolute primacy of multisensory processes (Bueti & Macaluso, 2010;Giard & Peronnet, 1999;Molholm et al., 2002;van Wassenhove & Grzeczkowski, 2015). Our results showed that when processing multisensory information, the sensory areas took into F I G U R E 5 ERPs (mean ± SEM) elicited by S2 during the spatial bisection and the temporal bisection tasks in occipital (left panel) and temporal (right panel) electrodes. Both contralateral and ipsilateral ERP responses in respect to S2 position in space are reported. The gray-shaded area delimits the selected time window (50-90 ms) account also the features of stimuli to be processed and, in particular, the domain of representation (i.e., space and time) to which the stimuli belonged. In particular, occipital areas were preferentially recruited to encode multisensory stimuli spatially rearranged in a complex metric configuration, supporting the idea that the visual circuit is crucially enrolled whenever dealing with spatial representations across multiple sensory modalities. Likewise, we confirmed the crucial role of the auditory cortices in temporal processing by showing the preferential recruitment of these areas in the temporal representation of multisensory stimuli. By lacking of unimodal conditions (only auditory and only visual) to compare with the multisensory stimulation, from this study we cannot infer with confidence that the domain-specific neural response we observed was intrinsically multisensory. Indeed, we cannot exclude that participants were taking into account only the most relevant sense for each specific task: the visual stimuli for the spatial bisection task, and the auditory stimuli for the temporal bisection task. Nonetheless, from a qualitative comparison between the results of this study and those of past works using the same methodology but with unimodal conditions (visual stimuli : Amadeo et al., 2020a;auditory stimuli: Amadeo, Campus, et al., 2019;Campus et al., 2017Campus et al., , 2019, we observed a neural gain in response to multisensory stimuli, in line with typical processing of multisensory inputs Fort, Delpuech, Pernier, Giard, & Thomas, 2002;Molholm et al., 2002;reviewed in Ricciardi et al., 2014). Specifically, the multisensory response we observed was larger than the unisensory responses previously described.
F I G U R E 6 (a) Average source activity after S2 in the 50-90 ms time window: Left and right panels of each line show the conditions in which S2 was delivered from the left (i.e., À2.3 ) or the right (i.e., +2.3 ), respectively. (b) Results of the pairwise two-tailed t-tests performed on average source activity in the 50-90 ms time window: Only t values corresponding to p < .0001 after FDR correction are displayed. Reddish and bluish colors indicate stronger activations in spatial and temporal bisections, respectively. Color intensity indicates the significance of the difference (i.e., magnitude of t). A stronger neural response with the spatial bisection occurs in the occipital areas, while in the temporal sites the activation more strongly supports the temporal bisection Interestingly, this multisensory gain was detectable in both occipital and temporal areas and was independent of the domain of representation involved (spatial or temporal). However, since we could not quantitatively compare unimodal and multimodal conditions, future investigations in this direction are still needed, with participants being tested in visual-only, audio-only, and audiovisual spatial and temporal tasks. Finally, we showed that the behavioral performance was similar between the two tasks, which confirmed that the neural modulation of sensory areas referred essentially to the task request (rather than to other experimental aspects such as the task difficulty). Overall, the results of this study fit into a framework delineated by Murray, Lewkowicz, et al. (2016), who proposed that the multisensory processing does not always involve a single and fixed schema of neural activation, but encompasses different cortical circuits. In particular, the authors proposed a neural circuit that is recruited among high-order association cortices, such as the prefrontal and parietal cortex, and a second neural circuit that occurs directly between low-level cortices. Multisensory processes can involve both kinds of schema in a dynamic combination, in relation to the nature of the multisensory stimuli to be processed. The task-specific recruitment of visual and auditory cortices described in our study fits into this dynamic and contextadaptive scenario of multisensory processing occurring between sensory cortices.
The early time latency (50-90 ms) we selected in this study supports a task-specificity occurring at a low level of the sensory processing. Indeed, this specific time window can be considered an eMSI, which is a functionally premature stage of the multisensory processing (within the first 100 ms poststimulus onset) allowing the brain to select and encode important external inputs which can also facilitate a later stimulus encoding. However, it is worth noting that activation of visual and auditory regions has been registered even at earlier latencies than 50 ms in both macaques (Lamme & Roelfsema, 2000;Maunsell & Gibson, 1992) and humans (Brang et al., 2015(Brang et al., , 2022, by using different techniques than our study. In the past literature, eMSI was generally elicited by simple tasks such as discrimination or detection tasks (Cappe et al., 2012;Giard & Peronnet, 1999;Murray, Thelen, et al., 2016;Raij et al., 2010;Talsma et al., 2007;Teder-Sälejärvi et al., 2002), but less was known on the occurring of this mechanism with more complex requests, such as the bisection tasks we proposed in this study. The spatial and temporal bisections explore the human ability to build a metric representation of the environment by estimating and comparing different inputs in space and time. In addition, by using the same audiovisual stimuli in the two tasks, and changing only the experimental question between them, this experimental paradigm allowed us to detect the early neural effects for the interaction of identical sensory information but with different behavioral goals (in the present investigation, the spatial and the temporal content of the task). The fact that the task-specificity of multisensory processing appeared within early time latency can be regarded as a controversial point since early multisensory integration is typically considered an automatic process, that is, a hallmark of bottom-up mechanisms (De Meo et al., 2015). Nevertheless, past studies revealed that also top-down factors, such as attention, influenced multisensory integration within very premature stages of stimulus processing (Talsma & Woldorff, 2005;Talsma et al., 2010), and that high-level cognitive processes can directly involve the recruitment of auditory and visual areas (reviews on the visual areas: Ricciardi et al., 2020;Roelfsema & de Lange, 2016; review on the auditory areas: Zatorre, 2007). Thus, in light of these findings, we are not surprised to observe a domain-specific early activation of auditory and visual areas for a task such as the bisection. Indeed, in line with the cross-sensory calibration theory (Gori, 2015), for this kind of task the visual and the auditory systems calibrate the other senses for the spatial and temporal representations, respectively, supported by the recruitment of the sensory cortices (Amadeo et al., 2020a;Campus et al., 2017). In this task, an adult-like behavior is achieved only in late development (Amadeo et al., 2019b;Gori et al., 2012) and, when the calibration is not possible (e.g., in blindness or deafness), the spatiotemporal skills involved in the bisection are impaired, together with the related activation of sensory areas. Here, we speculate that with other tasks that do not require such calibration, the domain-specific modulation of the sensory cortices would occur less (as well as the deficit in some spatiotemporal skills in case of sensory impairment).
For example, a spatial localization task, for which visual calibration does not seem to be required (the ability to localize sounds in the space develops even in the absence of visual experience; Gori et al., 2021;Rohlf et al., 2020), may involve an alternative schema of multisensory processing at the neural level or activate later cortical processes. However, a further investigation in this direction is needed.
The findings of this study should be considered in light of some limitations. First, the lack of unimodal conditions (only auditory and only visual) limits a direct comparison between unimodal and multisensory processing, as well as the lack of a computational description of the data, for instance into a Bayesian framework. However, the multisensory gain we qualitatively observed in the occipital and temporal activation for the spatial and temporal bisection, respectively, suggests that this response was likely related to multisensory processing. Second, the lack of correlation between the subjects' neural response and the behavioral performance (i.e., the % of correct responses and/or the spatial and temporal parameters of S2), does not allow to state that the observed neural modulation was truly responding to the spatiotemporal characteristics of the stimuli. Third, the low spatial resolution of the EEG technique, together with the lack of individualized MRI scans for the source analysis, limits the access to the exact cortical locations generating the occipital and temporal activations. However, the similarities between the early occipital and temporal positivity we observed and the canonical visual-evoked and auditory-evoked components of sensory cortices, make us assume that the neural response was generated at the level of visual and auditory cortices. Finally, this study has a limited sample size, although in line with the sample sizes of past studies using the same methodology Amadeo et al., 2020a;Campus et al., 2017Campus et al., , 2019.
To conclude, this study provides evidence of early responses of auditory and visual cortices for temporal and spatial multisensory tasks, respectively. This work demonstrates that preferential domains of representation (i.e., space and time) of the sensory areas persist also at the multisensory level, with a task-dependent involvement of auditory and visual regions in the processing of bimodal stimuli. Moreover, if we consider a continuous interaction between multisensory processes and supramodal mechanisms (Ricciardi & Pietrini, 2011), our results may also integrate a task-specific supramodal organization of the brain revealed by past studies using unisensory stimulation (visual stimuli: Amadeo et al., 2020a;auditory stimuli: Amadeo, Campus et al., 2019;Campus et al., 2017Campus et al., , 2019. Overall, these findings would have important implications for the understanding of the multifaceted, dynamic, and context-adaptive multisensory mechanisms at the neural level.

ACKNOWLEDGMENTS
Authors would like to thank the subjects for their willing participation in this study. The research is partially supported by the MYSpace project (PI Monica Gori), which has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (grant agreement No. 948349).