4

www.elsevier.
com/locate/ynimg
NeuroImage 33 (2006) 672 – 680
“What” versus “Where” in the audiovisual domain: An fMRI study

C. Sestieri,a,b,⁎ R. Di Matteo,a,b A. Ferretti,a,b C. Del Gratta,a,b M. Caulo,a,b A. Tartaro,a,b
M. Olivetti Belardinelli,c, d and G.L. Romania,b
a
Department of Clinical Sciences and Bio-Imaging, “G. d’Annunzio” University, Chieti, Italy
b
ITAB, Institute for Advanced Biomedical Technologies, “G. d’Annunzio” University Foundation, Chieti, Italy
c
Department of Psychology, University of Rome, “La Sapienza”, Italy
d
ECONA, Interuniversity Centre for Research on Cognitive Processing in Natural and Artificial Systems, Rome, Italy
Received 28 February 2006; revised 8 June 2006; accepted 25 June 2006

Available online 24 August 2006
Similar “what/where” functional segregations have been proposed for information, projecting to the inferotemporal cortex, and a dorsal
both visual and auditory cortical processing. In this fMRI study, we route for “where” information, projecting to the posterior parietal
investigated if the same segregation exists in the crossmodal domain, cortex (Ungerleider and Mishkin, 1982). This model was
when visual and auditory stimuli have to be matched in order to subsequently applied to the human brain and progressively
perform either a recognition or a localization task. Recent neuroima- supported by anatomical and lesion data (Newcombe et al.,
ging research highlighted the contribution of different heteromodal 1987, Farah, 1990), physiological recordings (Baizer et al., 1991)
cortical regions during various forms of crossmodal binding. Interest-
and evidence from functional neuroimaging (Haxby et al., 1991,
ingly, crossmodal effects during audiovisual speech and object
recognition have been found in the superior temporal sulcus, while
1994; Aguirre and D’Esposito, 1997).
crossmodal effects during the execution of spatial tasks have been The first evidence for the presence of a similar functional
found over the intraparietal sulcus, suggesting an underlying “what/ subdivision in the auditory cortex came from monkey electro-
where” segregation. In order to directly compare the specific physiology research, showing a specialization in the response of
involvement of these two heteromodal regions, we scanned ten male neurons of the lateral belt that surrounds the auditory core area.
right-handed subjects during the execution of two crossmodal Caudal regions were more sensitive to sound location and rostral
matching tasks. Participants were simultaneously presented with a regions were more sensitive to species-specific sounds (Rauschecker
picture and an environmental sound, coming from either the same or et al., 1997; Rauschecker, 1998a,b; Rauschecker and Tian, 2000;
the opposite hemifield and representing either the same or a different Tian et al., 2001). Furthermore, anatomical tract tracing revealed the
object. The two tasks required a manual YES/NO response
presence of two separate connections linking anterior and posterior
respectively about location or semantic matching of the presented
stimuli. Both group and individual subject analysis were performed. regions of the auditory cortex with distinct regions of the pre-
Task-related differences in BOLD response were observed in the right frontal cortex (Romanski et al., 1999). These findings led to a
intraparietal sulcus and in the left superior temporal sulcus, providing model of separate cortical processing of spatial (where) and non-
a direct confirmation of the “what–where” functional segregation in spatial (what) auditory information (Rauschecker and Tian, 2000;
the crossmodal audiovisual domain. Tian et al., 2001). Accordingly, the “where” or “how” stream,
© 2006 Elsevier Inc. All rights reserved. projecting to posterior temporo-parietal regions, should be devoted
to process sound motion and location, while the “what” stream,
Keywords: Crossmodal; Audiovisual; fMRI; What; Where leading to regions anterior and ventral to the primary auditory
cortex, should be specialized for processing auditory features.
Further support for this model came from neuropsychological data
Introduction (Clarke et al., 2000, 2002; Clarke and Thiran, 2004) and functional
neuroimaging (Alain et al., 2001; Maeder et al., 2001; Warren and
In 1982, it was proposed that two major pathways originate Griffiths, 2003; Arnott et al., 2004; Barrett and Hall, 2006). A
from the monkey primary visual cortex: a ventral stream for “what” similar dissociation between a ventral and a dorsal stream has been
recently demonstrated in the somatosensory system (Reed et al.,
2005), providing new evidence for a general organization of the
⁎ Corresponding author. Institute for Advanced Biomedical Technologies, sensory systems.
University “G. D'Annunzio” of Chieti, Via dei Vestini, 33, 66013 Chieti We usually perceive the external world by integrating different
(CH), Italy. Fax: +39 0871 3556930. sensory information at the same time, rather than using each sense
E-mail address: carcarlo11@hotmail.com (C. Sestieri). in isolation, and although much evidence shows that multisensory
Available online on ScienceDirect (www.sciencedirect.com). performances lead to improvements in many aspects of perceptive
1053-8119/$ - see front matter © 2006 Elsevier Inc. All rights reserved.
doi:10.1016/j.neuroimage.2006.06.045
C. Sestieri et al. / NeuroImage 33 (2006) 672–680 673
and attentional processes (Welch and Warren, 1986; King and Belardinelli et al., 2004). As with the hemisphere asymmetry
Calvert, 2001), it is already questionable how unimodal sensory proposed for the processing of “where” information, there is
processing streams are able to interact with each other. The agreement on the description of a left hemisphere dominance in the
extensive literature on crossmodal domain, especially about processing of “what” information (Talati and Hirsch, 2005, for a
audiovisual integration, has provided important clues about the review).
brain regions where this kind of information binding is likely to Despite the increasing knowledge about regions supporting
occur. Heteromodal regions in the human brain, analogous to those crossmodal processing in the human brain, no direct comparison
identified by neuroanatomical studies in primates, are located in between the processing of “what” and “where” audiovisual
the prefrontal cortex, in the posterior parietal cortex, in parts of the information has been conducted so far. In the present fMRI study,
lateral temporal cortex and in portions of the parahippocampal we argue that is possible to highlight the specific involvement of
gyrus (Ettlinger and Wilson, 1990; Mesulam, 1990). Neuroimaging different heteromodal areas during the execution of two cross-
research on crossmodal processing has shown that different modal tasks, respectively involving the processing of “where” and
components, belonging to a network of brain areas, are involved “what” information. In order to avoid any confounding effect due
in synthesizing different types of crossmodal information (Calvert, to the use of different stimuli, we designed two crossmodal
2001; Thesen et al., 2004). Moreover, fMRI studies on human matching tasks in response to the simultaneous presentation of an
crossmodal object recognition, reviewed by Amedi et al. (2005), image and an environmental sound. Participants were required to
suggest that heteromodal regions do not show the same response to report whether the two attended stimuli were matching or
all types of crossmodal combinations and therefore have a degree mismatching with respect to either spatial position (localization
of specialization. Thus heteromodal areas seem to be differently task) or semantic content (recognition task). Our aim was to
involved in the analysis of stimulus timing, position and semantic compare the cortical activation during the execution of the two
content. tasks and to test if the ventral–dorsal distinction is suitable for the
Reconsidering the issue of processing streams, we wonder if is crossmodal domain. More specifically, we wanted to test the
it possible to take unimodal pathways organization as a model to hypothesis that task-related differences should be observed in both
predict the selective involvement of different heteromodal regions the heteromodal areas located in the superior temporal sulcus and
during different processing demands, and thus try to elucidate in the intraparietal sulcus, with an opposite outcome. Furthermore,
crossmodal architecture. Auditory and visual dorsal (where) the use of an event-related design allowed us to test whether or not
processing streams converge in the parietal cortex. In non-human the manipulation of semantic consistency and spatial congruency
primates, the areas around the intraparietal sulcus (IPS) have been results in a different BOLD activation among the areas highlighted
shown to integrate neural signals from different sensory modalities by the task comparison. Consistently with previous crossmodal
for guiding and controlling action in space (Andersen et al., 1997; studies, we used a continuous fMRI acquisition instead of the
Rizzolatti et al., 1998; Avillac et al., 2004). Evidence from recent sparse sampling technique (Hall et al., 1999), since the latter
neuroimaging studies in humans supports the claim that posterior method requires a detailed knowledge of the hemodynamic delay
parietal heteromodal cortex is implicated in tasks involving that is not fully characterized in non-primary areas and in the
crossmodal localization. In particular, the presence of crossmodal execution of complex tasks.
effects has been observed in the IPS during the execution of tasks
involving spatial perception and attention (Bushara et al., 1999; Materials and methods
Macaluso et al., 2000; Macaluso and Driver, 2001; Eimer, 2001)
and movement perception (Lewis et al., 2000; Bremmer et al., Subjects
2001). Furthermore, a right hemisphere specialization in the
processing of spatial information has been proposed both for Ten, healthy, right-handed male subjects (mean age = 26.2;
auditory (Weeks et al., 1999; Zatorre, 2001; Krumbholz et al., range 20–34 years) participated in this study. All of them were free
2005) and for visual stimuli, as in the case of neglect (De Renzi et from neurological diseases and had normal hearing and vision.
al., 1977) and spatial attention (Miniussi et al., 2002). Participants gave informed consent for a protocol approved by the
In contrast, auditory and visual ventral (what) streams converge local Institutional Ethics Committee and were paid for their
in the lateral temporal cortex, along the superior temporal sulcus participation.
(STS). This region is then ideally located for the integration of
visual and auditory “what” information. As a matter of fact, Stimuli
convergent evidence has demonstrated that STS plays an important
role in the integration of naturally and artificially related Thirty black and white pictures from the Italian standardized
audiovisual information, during the perception of either verbal or version of the Snodgrass and Vanderwart’s set (Dell’Acqua et al.,
non-verbal stimuli (Beauchamp et al., 2004; Calvert et al., 2000; 2000) and thirty semantically correspondent environmental sounds
Callan et al., 2001, 2003; Raij et al., 2000; Hashimoto and Sakai, from a free Internet archive set (Marcell et al., 2000) were used in
2004). One explanation of the observed multimodal response in this experiment. Stimuli belonged to four categories: animals,
STS is that this region is involved in the formation of associations weapons, musical instruments, and vehicles. Visual stimuli were
between auditory and visual features representing the same object, projected on a screen located behind the scanner bed and viewed
although the specific role of semantic consistence in this kind of through a mirror placed above the subject’s head. Pictures were
integration has not been completely understood (Beauchamp et al., presented centered on the horizontal meridian within 7° to the left
2004). Recent findings suggest that, with respect to unimodal or right of fixation. Auditory stimuli were delivered by means of a
stimulation, congruent audiovisual presentation elicits greater pneumatic headset, designed to minimize interference from scanner
activity in the left parahippocampal gyrus while incongruent noise. Sound sampling rate was 22 000 Hz with 16-bit resolution.
presentation activates the left inferior frontal cortex (Olivetti Sounds were presented stereo, at a sound pressure level (SPL) of
674 C. Sestieri et al. / NeuroImage 33 (2006) 672–680
90 dB. Despite the high-frequency cut off of the pneumatic device Cedrus, California, USA) placed under the right hand. Accuracy
(estimated around 4 kHz), subjects reported they were able to and reaction times were recorded.
recognize the auditory stimuli. The duration of sounds was 2 s and
coincided with the presentation of the pictures. Functional imaging
Procedure Data were acquired with a Siemens Magnetom Vision 1.5 T

scanner, by means of T2*-weighted echo planar imaging (EPI) free
Prior to the scanning session, participants were presented with induction decay (FID) sequences with the following parameters:
all the visual and auditory stimuli used in the experiment by means TR 2 s, TE 60 ms, matrix size 64 × 64, FOV 256 mm, in-plane
of a PC, in order to familiarize them with the stimulus presentation. voxel size 4 mm × 4 mm, flip angle 90°, slice thickness 5 mm and
The fMRI session lasted about 40 min, during which both no gap.
functional and anatomical data were acquired. The fMRI scan was In each run, a total of 430 functional volumes were acquired,
performed using an event-related paradigm. Subjects performed consisting of 16 trans-axial slices, including the cortical regions of
one run of the object recognition task (REC) and one run of the interest. A high-resolution structural volume was acquired at the
object localization task (LOC). Each run lasted for 14 min and end of the session via a 3D MPRAGE sequence with the following
consisted of the random presentation of thirty matching and thirty features: axial, matrix 256 × 256, FoV 256 mm, slice thickness
mismatching trials. Every trial consisted of a stimulation period of 1 mm, no gap, in-plane voxel size 1 mm × 1 mm, flip angle 12°,
2 s followed by a rest period of 12 s, during which subjects were TR = 9.7 ms, TE = 4 ms.
told to fixate a black cross located in the center of the white screen.
The order of runs was counterbalanced across subjects. The Data analysis
experimental paradigm is described in Fig. 1.
In both tasks, a picture and a sound were simultaneously The pre-processing and the statistical analysis of fMRI data
presented by means of E-Prime software, v.1.1, Psychology were performed by means of the Brain Voyager 4.9 software (Brain
Software Tools. Pictures could be presented either in the left or Innovation, The Netherlands). Due to T1 saturation effects, the first
in the right visual hemifield and sounds could be presented either 4 scans of each run were discarded from the analysis. Pre-
on the left or on the right headphone. Furthermore, pictures and processing included motion and slice scan time corrections, and the
sounds could be either semantically correspondent or not (e.g., if removal of linear trends from the time series. Functional 2D
the picture of a dog and the sound of barking were presented, this images were registered with the 3D high-resolution structural
was considered a semantically matching trial; if the picture of a images and normalized in 3D Talairach Space (Talairach and
horse and the sound of a violin were presented, this was considered Tournoux, 1988). Functional volumes were resampled at a voxel
a semantically mismatching trial). Each stimulus was presented a size of 3 mm × 3 mm × 3 mm. Statistical analysis was performed
total of two times per run. using the general linear model (GLM, Friston et al., 1995), and
During the LOC task, subjects were asked to report if the data analysis was treated as an event-related design.
simultaneously presented stimuli were matching or mismatching in In each run, regression coefficients were estimated for the two
terms of spatial position, ignoring semantic content. During the experimental conditions, MATCH and MISMATCH. Regressors
REC task, participants were asked to report if the stimuli were were specified using a rectangular wave form, convolved with an
matching or mismatching in terms of semantic content, ignoring empirically derived hemodynamic response function (Boynton et
spatial position. Responses were given by pressing two buttons on al., 1996). For every participant we obtained two contrast statistical
a fMRI compatible response pad (Lumina LSC-400 controller, parametric maps (SPMs) corresponding to each task. Thresholding
Fig. 1. fMRI experimental paradigm.

of SPMs was performed correcting for multiple comparisons by

means of the False Discovery Rate (FDR, Genovese et al., 2002).
First, a fixed-effect group analysis was performed. In this analysis,
the time series from each run and subject were z-normalized and
concatenated prior to the GLM computation allowing the direct
comparison of LOC and REC tasks. Group SPMs were thresholded
at p < 0.005 (FDR corrected; z value = 3.88) and projected onto a
T1-weighted anatomical image of one of the subjects. Anatomical
locations of significant activation foci were characterized using the
Talairach and Tournoux (1988) atlas.
A random effect analysis was performed on the basis of
individual subject activation, in order to take into account the
intersubject anatomical variability of the location and extension of
cortical sulci. For each subject, both the left superior temporal
sulcus and the right intraparietal sulcus were accurately anatomi-
cally defined. Within these anatomically defined areas, individual
ROIs were obtained considering those voxels showing a significant
experimental effect (significant response to any experimental
condition) with p < 0.01 (FDR corrected). In each ROI, the BOLD
response to the different experimental conditions was evaluated as
the signal percent change (peak response) with respect to the
baseline. These responses were entered as the dependent variable
of a 2 × 2 (TASK × CONGRUENCY) ANOVA, performed to
estimate the main effect of task and of stimulus congruency as
well as the interaction effect between these factors.
Finally, in order to search for both semantic and spatial
congruency effects (match versus mismatch trials), we performed
a group analysis, separately for the LOC and the REC tasks,
contrasting the two congruency conditions. For the LOC and
the REC task, SPMs were thresholded respectively at p < 0.005
(FDR corrected; z value = 4.52) and p < 0.002 (FDR corrected;
z value = 4.36) and projected onto a T1-weighted anatomical image
of one of the study participants as in the previous task
comparisons.
Fig. 2. Behavioral results. Means and standard errors measured for reaction
times (A) and accuracy (B). In each graph, the two columns on the left
Results
indicate the localization task and the two columns on the right indicate the
recognition task. M and MM represent respectively congruent and
Behavioral results incongruent crossmodal conditions in each task.
The behavioral data are summarized in Fig. 2. A 2 × 2 accurate with the mismatching trials especially during the REC
(TASK × CONGRUENCY) within-subjects analysis of variance task (Fig. 2B).
(ANOVA) was performed using reaction times and accuracy as the
dependent variables of interest.
Subjects did not show differences in reaction times in per- fMRI results
forming the localization task (mean = 1465 ± 185 ms) versus the
recognition task (mean = 1544 ± 211 ms) [F(1,9) = 0.17, n.s.). Overall, the comparison between the two tasks revealed BOLD
However, a weak but significant effect of congruency was ob- signal differences that are consistent with the ventral–dorsal
served between matching (M) trials (mean = 1534 ± 174 ms) and distinction proposed for the respective processing of “what” and
mismatching (MM) trials (mean = 1475 ± 176 ms) [F(1,9) = 6.42, “where” information in the visual and the auditory domain.
p < 0.05] and a two-way interaction between task and congruency In particular, the group analysis showed that the crossmodal
[F(1,9) = 43.61, p > 0.001], indicating that subjects were slightly localization task elicited more activity than the recognition task in
faster with mismatching trials especially during the REC task (see the left and right precuneus, in the right parietal cortex, including
Fig. 2A). the right inferior parietal lobule and the right intraparietal sulcus,
Subjects showed the same pattern of results in terms of and in the right superior occipital cortex. Results are shown in
accuracy. We did not find a main effect of task in performing the Table 1 and in Fig. 3.
localization task (mean = 93.2%) versus the recognition task The 2 × 2 (TASK × CONGRUENCY) within-subject ANOVA
(mean = 91.0%) [F(1,9) = 0.31, n.s.]. A significant effect of performed in the right intraparietal sulcus confirmed the result of
congruency was observed between matching (mean = 87.8%) and the group analysis. A significant effect of task, measured as the
mismatching trials (mean = 96.4) [F(1,9) = 13.42, p < 0.005] as well percent signal change versus fixation, was observed between the
as an interaction effect between task and congruency [F(1,9) = 7.74, localization task (mean = 0.58 ± 0.07) and the recognition task
p < 0.05] indicating also in this case that subjects were more (mean = 0.36 ± 0.04) [F(1,9) = 14.33, p < 0.005] showing a greater
Table 1
Brain areas activated during LOC task compared to REC condition at
p < 0.005 (FDR corrected for multiple comparisons)
Hemisphere, area BA Mean Talairach Cluster
coordinates
x y z Voxels t
L/R, precuneus 7 −2 − 63 47 62 12.674
R, intraparietal sulcus/inferior 40 45 − 47 35 53 8.342
parietal lobule
R, intraparietal sulcus 40/7 39 − 52 47 42 7.761
R, precuneus 19 18 − 75 42 26 10.414
R, cuneus 18 6 − 90 21 24 8.204
R, middle frontal gyrus 46 41 36 23 16 5.806
L, cuneus 19 − 11 − 79 37 11 12.751
Clusters restricted to a minimum of 10 voxels in extent.
signal change in the LOC task. Interestingly, no effect of Fig. 4. Results of the individual subject analysis, showing the BOLD percent
congruency was found in this region of interest (F(1,9) = 0.40, signal change in the right intraparietal sulcus for all the experimental
n.s.) nor an interaction effect (F(1,9) = 0.31, n.s). The results of the conditions with respect to the rest condition (*p < 0.005; vertical bars
single subject analysis on the right IPS are shown in Fig. 4. represent standard errors).
Regions responding to the crossmodal recognition task more
than the localization task were found in the inferior occipital gyrus, In the REC task, regions responding more to the matching trial
bilaterally, and in the left lateral temporal cortex, including the than the mismatching trials were detected bilaterally in a region
anterior part of the superior temporal sulcus and the superior including the inferior frontal gyrus and the anterior insula, as
temporal gyrus. The results of the group analysis are shown in shown in Table 4.
Table 2 and in Fig. 5.
On the basis of the group results, we selected a region of
interest on the left superior temporal sulcus in order to perform an Discussion
individual subject analysis. The 2 × 2 (TASK × CONGRUENCY)
within-subject ANOVA showed that BOLD percent signal change The present fMRI study was aimed at comparing the cortical
during the recognition task (mean = 0.52 ± 0.04) was significantly activity during the execution of two crossmodal tasks, respectively
greater with respect to that observed during the localization task involving the processing of “where” and “what” information, in
(mean = 0.32 ± 0.03) [F(1,9) = 20.52, p < 0.001]. The ANOVA did order to test if the ventral–dorsal distinction is a useful framework
not reveal a main effect of congruency [F(1,9) = 0.13, n.s] or an for explaining multisensory perception. Specifically, we wanted to
interaction effect between task and congruency [F(1,9) = 2.05, n.s.]. test whether: (1) the localization task elicited more activation than
The results of the individual subject analysis are shown in Fig. 6. the recognition task in the heteromodal cortex located in the
To elucidate the role of stimulus congruence, we performed two intraparietal sulcus; (2) the recognition task elicited more activation
additional group contrasts between the matching and mismatching than the localization task in the heteromodal cortex located in the
trials, separately for each crossmodal task. In this way, we could superior temporal sulcus; and (3) it was possible to observe
highlight regions showing a congruency effect that were not congruency effects over those regions highlighted by the task
detected by the previous task comparison. comparison.
In the LOC task, the only region responding more to the Our results suggest the involvement of separate neural
matching trials than to the mismatching ones was detected in the substrates for audiovisual object localization (LOC) and recogni-
right superior temporal sulcus, as shown in Table 3. tion (REC). Although it is difficult to recognize an object without
Fig. 3. Sagittal (x = 46), coronal (y = − 46) and transversal (z = 48) views of regions (increasing significance from red to yellow) showing higher BOLD signal
during LOC task compared to REC task in the precuneus and right intraparietal sulcus (group results). Arrows and heads of arrow indicate respectively the
clusters located in the fundus and the wall of the intraparietal sulcus. Activations were superimposed on the brain of one of the subjects, standardized in Talairach
space. The brain is shown in radiological convention (left is right and vice versa).
Table 2
Brain areas activated during REC task compared to LOC task at p < 0.005
(FDR corrected for multiple comparisons)
coordinates
x y z Voxels t
L, superior temporal 22/21 − 58 − 18 −3 33 12.708
sulcus/middle temporal gyrus
L, inferior occipital gyrus 18/19 − 27 − 88 −5 25 25.286
R, inferior occipital gyrus 18/19 26 − 87 −5 10 22.030
Clusters restricted to a minimum of 10 voxels in extent.
localizing it in space and vice versa (Reed et al., 2005), our

manipulation of task demands allowed us to reveal a pattern of
task-specific activations that is consistent with the ventral/dorsal
distinction. These differences cannot be explained by differences in
stimulus presentation, which was the same for both tasks, nor by an
Fig. 6. Results of the individual subject analysis showing the BOLD percent
effect of task presentation order, which was counterbalanced across
signal change in the left superior temporal sulcus for all the experimental
subjects. Differences can be rather explained in terms of processing conditions with respect to the rest condition (**p < 0.001, vertical bars
demands, i.e., the nature of information that must be used in order represent standard errors).
to accomplish the tasks with a good level of performance. As a
matter of fact, subjects achieved more than 90% of accuracy for intraparietal sulcus (Grefkes and Fink, 2005), lateral (LIP) and
both tasks, confirming that participants were actively integrating ventral (VIP) areas have been shown to receive convergent
the crossmodal information in an efficient way. multisensory inputs (Lewis and Van Essen, 2000) and to respond to
The direct task comparison showed areas responding more to crossmodal information in previous neuroimaging studies (Bushara
the LOC task than to the REC task in superior occipital and et al., 1999; Lewis et al., 2000; Bremmer et al., 2001). In our study,
posterior parietal regions, including the bilateral precuneus, the the group analysis showed two clusters located in the fundus and
right inferior parietal lobule and the right intraparietal sulcus. This the walls of the right intraparietal sulcus; furthermore, the single
pattern of activity is consistent with the involvement of (1) the two subject analysis, performed over each anatomically defined
visual dorsal systems (Creem and Proffitt, 2001; Rizzolatti and intraparietal sulcus to take into account intersubject variability,
Matelli, 2003), one concerned with action planning and the other confirmed the group results. We conclude that the intraparietal
involved in spatial perception, respectively projecting to the sulcus participates in the processing of spatial information, rather
superior and the inferior parietal lobules; (2) the auditory dorsal than semantic information, of incoming crossmodal stimuli. This
system projecting to the right inferior parietal lobule, previously result is in agreement with other fMRI studies, which proposed that
described by neurophysiological (Rauschecker et al., 1997; IPS is specialized for synthesizing crossmodal spatial coordinate
Rauschecker, 1998a,b; Rauschecker and Tian, 2000; Tian et al., cues and mediating crossmodal links in attention (Bushara et al.,
2001), neuropsychological (Clarke et al., 2000, 2002, Clarke and 1999; Macaluso et al., 2000; Macaluso and Driver, 2001; Eimer,
Thiran, 2004) and functional imaging studies (Bushara et al., 1999; 2001). Moreover, we suggest a right hemisphere asymmetry in the
Weeks et al., 1999; Alain et al., 2001; Maeder et al., 2001; Warren activation of the intraparietal sulcus during crossmodal localiza-
and Griffiths, 2003, Barrett and Hall, 2006). tion, which is consistent with the well-documented specialization
Specifically, we predicted that a task-specific effect should be within the right hemisphere for processing spatial-related informa-
observed over the intraparietal sulcus, consistent with the role of tion. For example, lesions of the right parietal cortex, compared to
this structure in tasks involving crossmodal integration of spatial the left hemisphere, cause visuospatial impairments including
information (Calvert, 2001; Macaluso and Driver, 2001). Among space perception (Irving-Bell et al., 1999), spatial attention
the different functionally specialized regions of the human (Miniussi et al., 2002) and neglect (De Renzi et al., 1977).
Fig. 5. Sagittal (x = − 54) coronal (y = −14) and transversal (z = − 5) views of regions (increasing significance from red to yellow) showing higher BOLD signal
during REC task compared to LOC task (group results). Arrows and heads of arrow indicate respectively clusters located in the left superior temporal sulcus and
in the right and left inferior occipital gyrus.
Table 3 et al., 2000). However, as noted by Martin and Chao (2001),

Brain areas activated during MATCH trials compared to MISMATCH trials extensive neuroimaging research demonstrated that left anterior
in the LOC task at p < 0.005 (FDR corrected for multiple comparisons) and inferior prefrontal regions, comprising BA 47 and the inferior
Hemisphere, area BA Mean Cluster part of BA 45, could be selectively involved in semantic
Talairach processing (Gabrieli et al., 1998; Poldrack et al., 1999; Wagner,
coordinates 1999), for retrieving and manipulating semantic representations.
x y z Voxels t In contrast, the comparison between matching and mismatching
trials, during the execution of the LOC task, highlighted a region of
R, superior temporal 22/21 52 − 26 −4 10 7.203
the right STS that was more responsive to spatially congruent
sulcus/middle temporal gyrus
trials. This result raises some questions about the functional
Clusters restricted to a minimum of 10 voxels in extent. organization of the superior temporal sulcus. The first topic
concerns the left and right hemispheric specialization for “what”
Interestingly, this pattern of asymmetry seems to be reflected also and “where” information respectively. We suggest that only the left
in the processing of spatial auditory and tactile information STS is specifically involved in the crossmodal binding of semantic
(Coghill et al., 2001; Brunetti et al., 2005; Krumbholz et al., 2005). information, while the right STS is more generically involved
The second important result of the direct task comparison during various forms of crossmodal integration. Secondly, the role
concerns regions responding more to the recognition task than to of the STS does not seem to be merely confined to the “what”
the localization task. Our prediction was that task-related effects information processing. The posterior part of the STS contains
should be observed along the superior temporal sulcus, since, due numerous cross-connections between the two visual pathways
to its anatomical position, it could be considered a good candidate (Baizer et al., 1991), and a similar organization has been proposed
for the binding of visual and auditory ventral pathways for auditory information (Hall, 2003). Furthermore, increasing
(Beauchamp et al., 2004; Amedi et al., 2005). Our results showed evidence supports the role of posterior STS in auditory motion
the activation of the left and the right inferior occipital gyrus and processing (Warren et al., 2002) and in the perception of biological
the left superior temporal sulcus. motion, such as passive observation of eye, mouth and hand
Activation in the secondary visual cortex (BA 18/19) for object movements (reviewed in Decety and Grezes, 1999).
recognition tasks, compared to localization tasks, has been Relevant to this point is the fact that monkey studies reported
previously reported in experiments dealing with visual stimulation that the two auditory processing streams projecting from the core
(Haxby et al., 1993; Aguirre and D’Esposito, 1997), reflecting the of the auditory cortex are initially directed rostrally and caudally
increased information processing of the ventral pathway. However, rather than ventrally and dorsally (Hackett et al., 1999;
we focused on the activation of the left superior temporal sulcus Rauschecker and Tian, 2000; Tian et al., 2001). Taken together,
and carried out an individual subject analysis that confirmed the the present and previous results suggest that posterior STS regions,
group results. We conclude that the previously described hetero- which are closer to the dorsal processing streams, differ in their
modal region of the lateral temporal lobe (Mesulam, 1990; Calvert, binding properties from more anterior parts, which on the contrary
2001) is more involved in the processing of semantic content are closer to the ventral routes. However, the questions regarding
of incoming crossmodal stimuli, than their spatial position. Our the specific relationship between regions showing congruency
result is consistent with recent neuroimaging literature effects and those exhibiting task effects require future investiga-
that provided evidence that the STS plays an important role tions.
in the integration of naturally and artificially related audiovi- In conclusion, the ventral–dorsal model, initially developed
sual information, either during the perception of non-verbal to explain visual information processing, and subsequently
(i.e., common objects; Beauchamp et al., 2004) or verbal material applied in the auditory domain seems to be valid also in cases
(i.e., audiovisual speech and letter/sound associations; Calvert et of crossmodal audiovisual stimulation. The dissociation between
al., 2000; Raij et al., 2000; Callan et al., 2001, 2003; Hashimoto object and spatial processing streams seems to be an organizing
and Sakai, 2004). Moreover, we observed a leftward asymmetry in principle of cortical functioning, since it was found to be present
agreement with the well-established left hemisphere specialization also in the somatosensory domain (Reed et al., 2005). Our
in the processing of object-related information (Talati and Hirsch, results are consistent with the view expressed by recent reviews
2005), probably accompanying the brain asymmetry for language (Calvert, 2001; Amedi et al., 2005) about crossmodal imaging
production and comprehension. studies which postulate the specific involvement of the
The third issue that we wanted to explore concerned the heteromodal cortical regions in the processing of various kinds
presence of congruency effects in the crossmodal areas that
exhibited task-related differences. The ANOVA performed on the Table 4
regions of interest of each individual subject for right IPS and left Brain areas activated during MATCH trials compared to MISMATCH trials
STS regions showed no effect of stimulus congruency (match in the REC task at p < 0.002 (FDR corrected for multiple comparisons)
versus mismatch) during the execution of both tasks. When directly
looking at the regions that exhibited congruency effect in the REC coordinates
task, we observed a bilateral activation of a region comprehending
the inferior frontal gyrus (BA 45) and the anterior insula (BA 13). x y z Voxels t
This result suggests specific role for these regions in crossmodal L, inferior frontal 45/13 − 34 21 3 67 12.770
semantic integration, since modulation of activity has been gyrus/anterior insula
observed in the insula and in the IFG for crossmodal compared R, inferior frontal 45/13 36 18 1 44 12.920
to unimodal stimulation (Callan et al., 2003) and congruent versus gyrus/anterior insula
incongruent trials in studies involving audiovisual speech (Calvert Clusters restricted to a minimum of 10 voxels in extent.
of information, in particular the role of the intraparietal sulcus in Calvert, G.A., Campbell, R., Brammer, M.J., 2000. Evidence from
the binding of “spatial” audiovisual information and of the functional magnetic resonance imaging of crossmodal binding in the
superior temporal sulcus in the binding of “semantic” audio- human heteromodal cortex. Curr. Biol. 10, 649–657.
visual information. Clarke, S., Thiran, A.B., 2004. Auditory neglect: what and where in auditory
space. Cortex 40 (2), 291–300.
Clarke, S., Bellmann, A., Meuli, R.A., Assal, G., Steck, A.J., 2000. Auditory
Acknowledgment agnosia and auditory spatial deficits following left hemispheric lesions:
evidence for distinct processing pathways. Neuropsychologia 38 (6),
The authors thanks Antonino Raffone for his helpful contribu- 797–807.
tion. Clarke, S., Bellmann Thiran, A., Maeder, P., Adriani, M., Vernet, O., Regli,
L., Cuisenaire, O., Thiran, J.P., 2002. What and where in human
audition: selective deficits following focal hemispheric lesions. Exp.
References Brain Res. 147 (1), 8–15.
Coghill, R.C., Gilron, I., Iadarola, M.J., 2001. Hemispheric lateralization of
Aguirre, G.K., D'Esposito, M., 1997. Environmental knowledge is somatosensory processing. J. Neurophysiol. 85, 2602–2612.
subserved by separable dorsal/ventral neural areas. J. Neurosci. 17 (7), Creem, S.H., Proffitt, D.R., 2001. Defining the cortical visual systems:
2512–2518. “what”, “where”, and “how”. Acta Psychol. 107, 43–68.
Alain, C., Arnott, S.R., Hevenor, S., Graham, S., Grady, C.L., 2001. “What” Decety, J., Grezes, J., 1999. Neural mechanisms subserving the perception
and “where” in the human auditory system. Proc. Natl. Acad. Sci. 98 of human actions. Trends Cogn. Sci. 3, 172–178.
(21), 12301–12306. Dell'Acqua, R., Lotto, L., Job, R., 2000. Naming times and standardized
Amedi, A., von Kriegstein, K., van Atteveldt, N.M., Beauchamp, M.S., norms for the Italian PD/DPSS set of 266 pictures: direct comparisons
Naumer, M.J., 2005. Functional imaging of human crossmodal with American, English, French, and Spanish published databases.
identification and object recognition. Exp. Brain Res. 166, 559–571. Behav. Res. Meth. Instrum. Comput. 32 (4), 588–615.
Andersen, R.A., Snyder, L.H., Bradley, D.C., Xing, J., 1997. Multimodal De Renzi, E., Faglioni, P., Previdi, P., 1977. Spatial memory and
representation of space in the posterior parietal cortex and its use in hemispheric locus of lesion. Cortex 13, 424–433.
planning movements. Annu. Rev. Neurosci. 20, 303–330. Eimer, M., 2001. Crossmodal links in spatial attention between vision,
Arnott, S.R., Binns, M.A., Grady, C.L., Alain, C., 2004. Assessing the audition, and touch: evidence from event-related brain potentials.
auditory dual-pathway model in humans. NeuroImage 22 (1), Neuropsychologia 39, 1292–1303.
401–408. Ettlinger, G., Wilson, W.A., 1990. Cross-modal performance: behavioural
Avillac, M., Olivier, E., Denève, S., Ben Hamed, S., Duhamel, J.R., 2004. processes, phylogenetic considerations and neural mechanisms. Behav.
Multisensory integration in multiple reference frames in the posterior Brain Res. 40, 169–192.
parietal cortex. Cogn. Process. 5 (3), 159–166. Farah, M.J., 1990. Visual Agnosia. MIT Press, Cambridge, MA.
Baizer, J.S., Ungerleider, L.G., Desimone, R., 1991. Organization of visual Friston, K.J., Holmes, A.P., Worsley, K.J., Poline, J.P., Frith, C.D.,
inputs to the inferior temporal and posterior parietal cortex in macaques. Frackowiak, R.S.J., 1995. Statistical parametric maps in functional
J. Neurosci. 11 (1), 168–190. imaging: a general linear approach. Hum. Brain Mapp. 2, 189–210.
Barrett, D.J.K., Hall, D.A., 2006. Response preferences for ‘what’ and Gabrieli, J.D.E., Poldrack, R.A., Desmond, J.E., 1998. The role of left
‘where’ in human non-primary auditory cortex. NeuroImage 32 (2), prefrontal cortex in language and memory. Proc. Natl. Acad. Sci. U. S. A.
968–977. 95, 906–913.
Beauchamp, M.S., Lee, K.E., Argall, B.D., Martin, A., 2004. Integration of Genovese, C.R., Lazar, N.A., Nichols, T., 2002. Thresholding of statistical
auditory and visual information about objects in superior temporal maps in functional neuroimaging using the false discovery rate.
sulcus. Neuron 41, 809–823. NeuroImage 15 (4), 870–878.
Boynton, G.M., Engel, S.A., Glover, G.H., Heeger, D.J., 1996. Linear Grefkes, C., Fink, G.R., 2005. The functional organization of the
systems analysis of functional magnetic resonance imaging in human intraparietal sulcus in humans and monkeys. J. Anat. 207, 3–17.
V1. J. Neurosci. 16, 4207–4241. Hackett, T.A., Stepniewska, I., Kaas, J.H., 1999. Prefrontal connections of
Bremmer, F., Schlack, A., Shah, N.N.J., Zafiris, O., Kubischik, M., the parabelt auditory cortex in macaque monkeys. Brain Res. 817,
Hoffmann, K., Zilles, K., Fink, G.R., 2001. Polymodal motion 45–58.
processing in posterior parietal and premotor cortex: a human fMRI Hall, D.A., 2003. Auditory pathways: are ‘what’ and ‘where’ appropriate?
study strongly implies equivalencies between humans and monkeys. Curr. Biol. 13, R406–R408.
Neuron 29, 287–296. Hall, D.A., Haggard, M.P., Akeroyd, M.A., Palmer, A.R., Summerfield,
Brunetti, M., Belardinelli, P., Caulo, M., Del Gratta, C., Della Penna, S., A.Q., Elliott, M.R., Gurney, E.M., Bowtell, R.W., 1999. “Sparse”
Ferretti, A., Lucci, G., Moretti, A., Pizzella, V., Tartaro, A., Torquati, K., temporal sampling in auditory fMRI. Hum. Brain Mapp. 7 (3), 213–223.
Olivetti Belardinelli, M., Romani, G.L., 2005. Human brain activation Hashimoto, R., Sakai, K.L., 2004. Learning letters in adulthood: direct
during passive listening to sounds from different locations: an fMRI and visualization of cortical plasticity for forming a new link between
MEG study. Hum. Brain Mapp. 26 (4), 251–261. orthography and phonology. Neuron 42, 311–322.
Bushara, K.O., Weeks, R.A., Ishii, K., Catalan, M.J., Tian, B., Rauschecker, Haxby, J.V., Grady, C.L., Horwitz, B., Ungerleider, L.G., Mishkin, M.,
J.P., Hallett, M., 1999. Modality-specific frontal and parietal areas for Carson, R.E., Herscovitch, P., Schapiro, M.B., Rapoport, S.I., 1991.
auditory and visual spatial localization in humans. Nat. Neurosci. 2, Dissociation of object and spatial visual processing pathways in human
759–766. extrastriate cortex. Proc. Natl. Acad. Sci. U. S. A. 88 (5), 1621–1625.
Callan, D.E., Callan, A.M., Kroos, C., Vatikiotis-Bateson, E., 2001. Haxby, J.V., Grady, C.L., Horwitz, B., Salerno, J., Ungerleider, L.G.,
Multimodal contribution to speech perception revealed by independent Mishkin, M., Schapiro, M.B., 1993. Dissociation of object and spatial
component analysis: a single-sweep EEG case study. Cogn. Brain Res. visual processing pathways in human extrastriate cortex. In: Gulyas, B.,
10, 349–353. Ottoson, D., Roland, P.E. (Eds.), Functional Organization of Human
Callan, D.E., Jones, J.A., Munhall, K., Callan, A.M., Kroos, C., Vatikiotis- Visual Cortex. Pergamon Press, Oxford.
Bateson, E., 2003. Neural processes underlying perceptual enhancement Haxby, J.V., Horwitz, B., Ungerleider, L.G., Maisog, J.M., Pietrini, P.,
by visual speech gestures. NeuroReport 14, 2213–2218. Grady, C.L., 1994. The functional organization of human extrastriate
Calvert, G.A., 2001. Crossmodal processing in the human brain: insight cortex: a PET-rCBF study of selective attention to faces and locations.
from functional neuroimaging studies. Cereb. Cortex 11, 1110–1123. J. Neurosci. 14, 6336–6353.
Irving-Bell, L., Small, M., Cowey, A., 1999. A distortion of perceived space Rauschecker, J.P., 1998b. Parallel processing in the auditory cortex of
in patients with right-hemisphere lesions and visual hemineglect. primates. Audiol. Neuro-Otol. 3 (2–3), 86–103.
Neuropsychologia 37, 919–925. Rauschecker, J.P., Tian, B., 2000. Mechanisms and streams for processing of
King, A.J., Calvert, G., 2001. Multisensory integration: perceptual grouping “what” and “where” in auditory cortex. Proc. Natl. Acad. Sci. U. S. A. 97
by eye and ear. Curr. Biol. 11, R322–R325. (22), 11800–11806.
Krumbholz, K., Schonwiesner, M., von Cramon, D.Y., Rubsamen, R., Rauschecker, J.P., Tian, B., Pons, T., Mishkin, M., 1997. Serial and parallel
Shah, N.J., Zilles, K., Fink, G.R., 2005. Representation of interaural processing in rhesus monkeys auditory cortex. J. Comp. Neurol. 382,
temporal information from left and right auditory space in the human 89–103.
planum temporale and inferior parietal lobe. Cereb. Cortex 15, Reed, C.L., Klatzky, R.L., Halgren, E., 2005. What vs. where in touch: an
317–324. fMRI study. NeuroImage 25, 718–726.
Lewis, J.W., Van Essen, D.C., 2000. Corticocortical connections of visual, Rizzolatti, G., Matelli, M., 2003. Two different streams form the dorsal
sensorimotor, and multimodal processing areas in the parietal lobe of the visual system: anatomy and functions. Exp. Brain Res. 153, 146–157.
macaque monkey. J. Comp. Neurol. 428 (1), 112–137. Rizzolatti, G., Luppino, G., Matelli, M., 1998. The organization of the
Lewis, J.W., Beauchamp, M.S., DeYoe, E.A., 2000. A comparison of visual cortical motor system: new concepts. Electroencephalogr. Clin.
and auditory motion processing in human cerebral cortex. Cereb. Cortex Neurophysiol. 106 (4), 283–296.
10, 873–888. Romanski, L.M., Tian, B., Fritz, J., Mishkin, M., Goldman-Rakic, P.S.,
Macaluso, E., Driver, J., 2001. Spatial attention and crossmodal interactions Rauschecker, J.P., 1999. Dual streams of auditory afferents target
between vision and touch. Neuropsychologia 39, 1304–1316. multiple domains in the primate prefrontal cortex. Nat. Neurosci. 2 (12),
Macaluso, E., Frith, C.D., Driver, J., 2000. Modulation of human visual 1131–1136.
cortex by crossmodal spatial attention. Science 289, 1206–1208. Talairach, J., Tournoux, P., 1988. Co-Planar Stereotaxic Atlas of the Human
Maeder, P.P., Meuli, R.A., Adriani, M., Bellmann, A., Fornari, E., Thiran, Brain. Thieme Medical, New York.
J.P., Pittet, A., Clarke, S., 2001. Distinct pathways involved in sound Talati, A., Hirsch, J., 2005. Functional specialization within the medial
recognition and localization: a human fMRI study. NeuroImage 14, frontal gyrus for perceptual Go/No-Go decisions based on “What,”
802–816. “When,” and “Where” related information: an fMRI study. J. Cogn.
Marcell, M.M., Borella, D., Greene, M., Kerr, E., Rogers, S., 2000. Neurosci. 17 (7), 981–993.
Confrontation naming of environmental sounds. J. Clin. Exp. Neurop- Thesen, T., Vibell, J.F., Calvert, G.A., Österbauer, R.A., 2004. Neuroima-
sychol. 22 (6), 830–864. ging of multisensory processing in vision, audition, touch, and olfaction.
Martin, A., Chao, L.L., 2001. Semantic memory and the brain: structure and Cogn. Process. 5 (2), 84–93.
processes. Curr. Opin. Neurobiol. 11, 194–201. Tian, B., Reser, D., Durham, A., Kustov, A., Rauschecker, J.P., 2001.
Mesulam, M.-M., 1990. Large-scale neurocognitive networks and dis- Functional specialization in rhesus monkey auditory cortex. Science
tributed processing for attention, language, and memory. Ann. Neurol. 292, 290–293.
28, 597–613. Ungerleider, L.G., Mishkin, M., 1982. Two cortical visual systems. In: Ingle,
Miniussi, C., Rao, A., Nobre, A.C., 2002. Watching where you look: D.J., Goodale, M.A., Mansfield, R.J.W. (Eds.), Analysis of Visual
modulation of visual processing of foveal stimuli by spatial attention. Behavior. MIT Press, Cambridge, MA, pp. 549–586.
Neuropsychologia 40, 2448–2460. Wagner, AD., 1999. Working memory contributions to human learning and
Newcombe, F., Ratcliff, G., Damasio, H., 1987. Dissociable visual and remembering. Neuron 22, 19–22.
spatial impairments following right posterior cerebral lesions: clinical, Warren, JD., Griffiths, T.D., 2003. Distinct mechanism for processing spatial
neuropsychological and anatomical evidence. Neuropsychologia 25, sequences and pitch sequences in the human auditory brain. J. Neurosci.
149–161. 23, 5799–5804.
Olivetti Belardinelli, M., Sestieri, C., Di Matteo, R., Delogu, F., Del Gratta, Warren, J.D., Zielinski, B.A., Green, G.G.R., Rauschecker, J.P., Griffiths,
C., Ferretti, A., Caulo, M., Tartaro, A., Romani, G.L., 2004. Audio- T.D., 2002. Perception of sound-source motion by the human brain.
visual crossmodal interactions in environmental perception: an fMRI Neuron 34, 139–148.
investigation. Cogn. Process. 5 (3), 167–174. Weeks, R.A., Aziz-Sultan, A., Bushara, K.O., Tian, B., Wessinger, C.M.,
Poldrack, R.A., Wagner, A.D., Prull, M.W., Desmond, J.E., Glover, G.H., Dang, N., Rauschecker, J.P., Hallett, M., 1999. A PET study of human
Gabrieli, J.D.E., 1999. Functional specialization for semantic and auditory spatial processing. Neurosci. Lett. 262, 155–158.
phonological processing in the left inferior frontal cortex. NeuroImage Welch, R.B., Warren, D.H., 1986. Intersensory interactions. In: Boof, K.R.,
10, 15–35. Kaufman, L., Thomas, J.P. (Eds.), Handbook of Perception and Human
Raij, T., Uutela, K., Hari, R., 2000. Audiovisual integration of letters in the Performance (Vol. 1: Sensory Processes and Perception). John Wiley and
human brain. Neuron 28 (2), 617–625. Sons, pp. 1–36.
Rauschecker, J.P., 1998a. Cortical processing of complex sounds. Curr. Zatorre, R.J., 2001. Neural specializations for tonal processing. Ann. N. Y.
Opin. Neurobiol. 8 (4), 516–521. Acad. Sci. 930, 193–210.

4

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

4

Uploaded by

Copyright:

Available Formats

www.elsevier.

“What” versus “Where” in the audiovisual domain: An fMRI study

Received 28 February 2006; revised 8 June 2006; accepted 25 June 2006

Procedure Data were acquired with a Siemens Magnetom Vision 1.5 T

Fig. 1. fMRI experimental paradigm.

of SPMs was performed correcting for multiple comparisons by

localizing it in space and vice versa (Reed et al., 2005), our

Table 3 et al., 2000). However, as noted by Martin and Chao (2001),

You might also like