Professional Documents
Culture Documents
PRIORITY 2
INFORMATION SOCIETY TECHNOLOGIES
Contract for:
TABLE OF CONTENTS
1. Project summary
Page
3
2. Project objectives
3. Participant list
8
8
17
18
5. Potential Impact
19
21
21
24
25
26
26
33
34
39
40
43
70
70
71
71
9. Ethical issues
75
78
78
A.2 Sub-contracting
92
92
92
Appendix B References
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 2 of 101
93
1. Project Summary
Our goal is to investigate how complex cognitive behaviour in artificial systems can
emerge through interacting with an environment, and how, by becoming sensitive to the
properties of the environment, such systems can autonomously develop effective
representations. The underlying hypothesis is that perception is an active process; even in the
absence of overt behaviour, perception involves prediction, and the need for making better
predictions is what drives the development of useful representations and cognitive structures.
We will explore these issues within the realm of music cognition. Music is an ideal domain in
which to investigate cognitive behaviour, since it is a universal phenomenon containing
complex abstractions and temporally extended structures. As music is self-referential there are
no externally determined semantics; the appropriate segmentation of the stream of sounds
depends upon the structure of the signal itself, rather than the need to individuate objects in
the external world. By focusing on music cognition we can directly address problems such as
the autonomous development of representations and processes that support the
characterisation of events and event sequences, the development of categories and useful
abstractions, the representation of situational context, interactions between long-term
knowledge structures and working memory, the role of attention in optimising processing
with respect to the current object of interest, the representation of temporal expectancies, and
the integration of events across many different time scales. We will investigate music
cognition through perceptual experiments and computational modelling studies, embodying
our understanding in the construction of an emergent interactive music system, which will
learn to develop representations and expectations in response to the music it experiences, and
will use these predictions to generate actions in the form of appropriately timed and pitched
sounds.
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 3 of 101
2. Project objectives
a. Introduction
The goal of this project is to investigate how complex cognitive behaviour in artificial
systems can emerge through interacting with an environment, and how, by becoming sensitive
to the properties of the environment, such systems can develop effective representations and
processing structures autonomously.
The central hypothesis underlying this project is that perception is essentially an active
process. Recent work on auditory processing has shown that the auditory system, far from
being a passive receptor of sounds, is constantly adjusting its processing to reflect the current
acoustic context and task demands [1, 2]. The idea is that perception, even in the absence of
overt behaviour, involves a process of prediction, and that the need for making better
predictions is what drives the development of useful representations and cognitive structures;
ultimately giving rise to intelligent cognition. Conversely, perceptual phenomena and the
representations and processing structures in the brain can only be understood in relation to the
structure of the environment.
We intend to explore these issues within the realm of music cognition. Music is an ideal
domain in which to investigate complex cognitive behaviour, since music, like language, is a
universal phenomenon containing complex abstractions and temporally extended structures,
whose organisation is constrained by underlying rules or conventions that participants need to
understand for effective cognition and interaction. Music shares many other characteristics
with language; perception evolves in time, and the acoustic stimulus is processed within the
context of locally determined expectations, long-term knowledge and the focus of attention.
However, since music is self-referential there are no externally determined semantics; the
appropriate segmentation of the stream of sounds depends upon the structure of the signal
itself, rather than the need to individuate objects in the external world. By focusing our
investigations on music cognition we can directly address problems such as the autonomous
development of representations and processes that support the characterisation of events and
event sequences, the development of categories and useful abstractions, the representation and
evaluation of situational context, interactions between long-term knowledge structures and
working memory, the role of attention in optimising processing with respect to the current
object of interest, the representation of time and temporal expectancies, and the integration of
events across many different time scales.
We will investigate the development of music cognition by combining the
complementary approaches of perceptual experiments using human subjects, functional and
neurocomputational modelling, and the implementation of an interactive embodied cognitive
system. Experimental studies using neonates will determine for the first time whether certain
basic perceptual abstractions, such as pitch, are innate, or whether they develop through early
experience. Adult studies will also explore how the perception of musical form is influenced
by the characteristics of natural language. The results of these experiments will inform
neurocomputational modelling studies. Most existing models of auditory perception are based
upon adult data; however, here we will use the new experimental results to constrain our
models so that they account for the emergence of adult processes and representations through
experience. The computational models will, as far as possible, be consistent with current
understanding of the neurobiology of the human auditory system. This is important both as a
valuable source of guidance and constraint, and to ensure the relevance of predictions made
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 4 of 101
Emergent
Music System
Computational
principles
Perceptual
Experiments
Neurocomputational
Modelling
also be applied in the implementation of an interactive music processing system, the Music
Projector; in which the autonomous development of internal musical codes and expectancies,
and phenomena such as categorization, similarity ranking, and streaming will be investigated.
The system will synthesize, as musical output, the expectancies generated in response to
musical stimuli, which will allow us to compare the musical perceptions and expectancies of
the artificial system with music cognition in humans; thereby closing the experimental loop,
as depicted in the diagram below.
b. Objectives
The principal objectives of the project fall into the three categories described; i)
experimental investigations into the perception of musically relevant stimuli, ii)
neurocomputational modelling of auditory processes subserving music cognition, iii)
identification of theoretical principles and implementation of an emergent music cognition
system. Detailed verifiable and timed objectives are specified within each of the
workpackages.
Experimental investigations into the perception of musically relevant stimuli
Compare the processing of musically meaningful sounds in neonates and adults in
order to distinguish innate from learned levels of abstraction in auditory processes
underlying music perception.
Investigate the role of timbre, in particular the timbres of language, in adult
perception of musical form.
Neurocomputational modelling of auditory processes subserving music cognition
Develop an integrated neurocomputational architecture for auditory processing and
music cognition.
Investigate the role of attention and the computational principles underlying an
active listening system.
Investigate the emergence of representations, processing strategies and perceptual
categories through experience.
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 5 of 101
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 6 of 101
3. Participant list
List of Participants
Partic.
Role*
Partic.
no.
Participant
name
Participant
short name
Country
Date
enter
project
Date exit
project
CO
University of
Plymouth
UoP
UK
Month 1
Month 36
(end of
project)
CR
Universitat
Pompeu Fabra
FUPF
Spain
Month 1
Month 36
(end of
project)
CR
Magyar
Tudomnyos
Akadmia
Pszicholgiai
Kutatintzet
MTAPI
Hungary
Month 1
Month 36
(end of
project)
CR
Universiteit van
Amsterdam
UvA
Netherlands
Month 1
Month 36
(end of
project)
*CO = Coordinator
CR = Contractor
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 7 of 101
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 8 of 101
auditory stimulus representations as well as what type of regularities can be detected. MMN
studies have shown the effects of general auditory perceptual learning [23, 24] as well as that
of musical training [25-27] on the representation of acoustic regularities and the detection of
regularity violations.
Within the scope of the current project we will take a significant step forward,
determining whether or not some important perceptual processes underlying music perception
are innate. In adults, it has been established that the features of the auditory representations
indexed by MMN closely match perception [11]. As described in the previous section, the
formation of predictions is thought to underlie music cognition. Because the MMN method
determines whether the auditory system of a subject population can produce predictions on
the basis of a given rule, the elicitation of MMN in neonates would suggest that they perceive
the corresponding aspect of music. In contrast, finding differences between the results
obtained in neonates and adults would suggest that the given function develops through
maturation and/or learning. Thus, the experiments of the current proposal will provide new
insights into 1) innate vs. learned processing of musically relevant abstract acoustic features,
2) the operation of model-based auditory prediction at birth, and 3) the perception of music by
newborn babies.
Investigate the influence of timbre on the perception of musical form
Another approach for distinguishing between functions which are innate and those that
develop through experience is to analyse to what extent particular perceptual predispositions
correlate with the properties of important classes of sounds in the environment. The most
significant class of sounds to which humans are exposed during early development is speech.
Speech and other communication sounds are characterised by time-varying spectral patterns,
i.e. by changing timbre, and also by smoothly changing pitch. In most European languages,
timbral patterns generally convey the majority of the semantic information, while pitch tends
to convey complementary information such as intonation and mood. Recent work has shown
that many idiosyncratic phenomena of general pitch perception, as well as common musical
intervals, and the perception of consonance can be predicted from speech spectra and from the
characteristics of pitch in speech [28, 29]; suggesting that these aspects of perception may
develop through normal early experience. This approach can also be used to investigate the
perception of musical form.
The perception of musical form arises from our ability to organise sequences of sounds
into coherent global structures; and interestingly, while musical training may enhance this
ability, it appears to be part of the normal perceptual development of the general population.
We propose to investigate whether typical sequential relationships such as chord progressions,
melodic contours, tension profiles and preferred rhythmic patterns, can similarly be shown to
arise from speech, and whether in this way, the perception of musical form can be predicted
by the characteristics of subjects native language. This project also benefits from having
access to people from a range of contrasting language backgrounds; which will allow
comparative perceptual experiments to be conducted in order to verify the findings of this
analysis.
In further experiments the influence of musical timbre per se on the perception of form
will be investigated. Given the huge variety of musical instruments, and the unlimited
possibilities electronic synthesis, composers have a very rich palette of musical timbres, or
tone colours, at their disposal. However, the combination of different timbres can have
unforeseen consequences on form perception; for example, it has been shown to be far more
difficult to compare pitches of different timbres than those of the same timbre [30]. Although,
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 9 of 101
there have been a number of studies of the use of timbre in music [31-33], the influence of
timbre on the perception of large-scale musical form has not been systematically investigated.
Development of an integrated neurocomputational architecture for music cognition
Despite some evidence for specialisation, it is clear that music cognition involves a very
widespread network of processing structures in the brain [34]. Contrary to the case in the
visual system, where very detailed and biologically realistic computational models exist [35],
there are few large-scale models of auditory cortical processing [36]. Instead, auditory
modelling has largely focussed on specific aspects of subcortical auditory processing, e.g.
sound localisation, pitch perception or spectral decomposition in the cochlea. There is an
urgent need for systems-level models of auditory processing to support investigations into
important aspects of audition such as cortical representations and processes, the effects of
early experience, the formation of perceptual categories, and the influence of attention in
modulating the processing of incoming stimuli; none of which have been addressed to any
great extent in previous modelling studies. We therefore propose in this project to develop an
integrated neurocomputational model of music cognition, constrained by the known
architecture of the auditory system, and incorporating neurobiologically realistic models of
processing in the cochlea and sub-cortical auditory nuclei, thalamus, primary auditory cortex,
superior temporal gyrus and prefrontal cortex.
The modelling of auditory cortical processing will be based upon our previous work in
the visual system [37-42], where a powerful theoretical framework was developed and shown
to be able to account simultaneously for empirical evidence from experimental measurements
at three different levels of cognitive neuroscience, namely: microscopic (single-cells) [35, 38],
mesoscopic (fMRI, EEG, neuroanatomy) [39, 43], and macroscopic (psychophysics,
neuropsychology) [44-46]. This approach to the computational modelling of cortex was
motivated by the need to employ a level of description accurate enough to allow the relevant
mechanisms at the level of neurons and synapses to be properly taken into account, while at
the same time simple enough, so that inferences regarding the relevant principles underlying
perception and cognition could be made.
A common assumption is that a proper level of description at the microscopic level is
captured by the spiking and synaptic dynamics of one-compartment, point-like models of
neurons, such as Integrate-and-Fire-Models [47]; these dynamics allow the use of realistic
biophysical constants (like conductances and delays) and a thorough study of the actual time
scales and firing rates involved in the evolution of the neural activity underlying cognitive
processes for comparison with experimental data. However, the integrate-and-fire model,
although in itself a simplification of the original work by Hodgkin and Huxley [48], is
actually too elaborate to be simulated completely for a whole network of thousands of
neurons with current technology. One solution to this problem is to simplify the dynamics
using a mean-field approximation, at least for the stationary conditions, and to use this to
exhaustively analyse the bifurcation behaviour of the dynamics. This analysis enables the
selection of parameter regions that show the emergent behaviour of interest. Full nonstationary simulations using the true dynamics of the full integrate-and-fire scheme may then
be run using these parameters [41, 47, 49].
In order to model each cortical area a network of interconnected excitatory and
inhibitory neurons is defined. Within this structure the strength of connectivity can be
adjusted in order to allow the organisation of functional clusters [43]. This allows inputs from
other regions to be processed in the context of neuronal reverberation, cooperation and
competition biased by task-relevant information. Networks, representing each of the cortical
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 10 of 101
areas involved in auditory processing will be connected according to the known architecture
of the auditory system. In order that the cortical model receives realistic inputs, it will be built
upon an existing modelling system for peripheral auditory processing [50], which includes
well-established models of cochlear and subcortical processing,
The proposed large-scale model of auditory processing will be far more extensive than
any previously developed, and will provide us with a common framework within which other
modelling advances emanating from this project can be incorporated.
Attention and active listening
An important objective of neurocomputational modelling studies within the project will
be to investigate attentional modulation of auditory processing. The integrated neurocomputational model, described above, will be used to investigate the role of attentional, or
top-down control, on bottom-up, stimulus-driven processing within the peripheral auditory
system, and in auditory streaming.
Auditory streaming is the phenomenon in which a sequence of sounds perceptually
splits into separate streams, the auditory analogue of visual figure-ground segregation. Once
streaming occurs, subjects lose the ability to recognise relationships between sounds falling
into different streams, and only within-stream patterns can be recognised [51]. Composers
typically take account of the features that cause streaming to ensure the perception of coherent
melodic or rhythmic patterns, or to create perceptual ambiguities. Auditory streaming appears
to be an innate function [17], therefore we propose to investigate streaming using the largescale model as described above, before the development of experience-dependent
representations.
The role of attention in auditory streaming is somewhat controversial, with some
arguing that streaming is a pre-attentive phenomenon [51], and others that it depends crucially
upon of the involvement of attention [52]. Previous models of auditory streaming have been
restricted to simple stimuli, and only very rudimentary models of attention have been used
[53-57].
Attention is much better understood in the visual than in the auditory system. The
notions of object-based attention, and the attentional modulation of low-level processing in
the form of biased competition, are important concepts in current theories of visual attention
[58-62]; however, these ideas have not so far been explored in audition. We have previously
shown how a biased competition model of visual object-based attention can explain many of
the phenomena of visual attention [38, 39, 42, 43, 49]. In this work, a large-scale hierarchical
model of the visual cortex, incorporating biased competition mechanisms at the neuronal
level, was used to simulate and explain visual attention in a wide variety of tasks. The
proposed model of auditory cortex is based upon this visual model, therefore it will allow us
to investigate to what extent the computational principles of biased competition can also
account for the attentive processing of auditory objects, and in particular, whether these
principles can account for auditory streaming.
The same modelling framework will be used to investigate the implications of top-down
modulation of subcortical processing. There is growing evidence that processing in the
peripheral auditory system is under attentional control. It has been known for some time that
there are extensive feedback projections to, and within, the subcortical auditory system [63],
but investigation of the effect of this feedback has only recently begun. The auditory cortex
can directly influence subcortical processing [64]; and there are observable effects on
responses, even at the level of the cochlea [64, 65].
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 11 of 101
Although there are numerous models of peripheral processing, these models often exist
only in isolation and generally process stimuli in a strictly feed forward manner. To our
knowledge, there has not been any previous study of the neurocomputational implications of
an active peripheral system. In bats at least four effects of cortical control on peripheral
processing have been identified, including short term egocentric selection, long term
egocentric selection, gain control and shaping or retuning of responses properties [64]. These
properties will be investigated by including known feedback projections within the peripheral
auditory model. In the integrated model, the top-down signals will result from the attentional
modulation of processing in primary auditory cortex, which in turn modulate subcortical
processing; allowing us to investigate cognitive control of peripheral auditory processing. In
doing so this project will directly address a strategic goal of the IST programme, namely that
of creating an intelligent sensory periphery.
This will also influence auditory streaming, in that once the system has begun to
segregate a subset of the total incoming acoustic input, top-down signals associated with this
subset can be used to enhance peripheral processing selectively, causing increased cortical
activity in response to that subset; a possible explanation of the increase in perceived
loudness, or pop-out, of the foreground stream [51].
Development of experience-dependent abstractions
Further computational modelling studies will consider three aspects of auditory
perception particularly relevant to music cognition; namely timbre, rhythm and pitch. We
propose to investigate the development of experience-dependent representations and
processing strategies, and the emergence of perceptual categories in each of these cases.
i.
Timbre is not a well-defined concept in audition but the consensus is that it is related to
the spectral and temporal envelope properties of sounds. Here we propose to investigate how
the response fields of cells in primary auditory cortex support the representation of timbre. In
doing so we will also consider the computational the properties of this network, since the
primary auditory cortex and thalamus are tightly linked in a stereotypical network architecture
through feed forward and feedback connections.
The thalamocortical network is an important computational hub in the auditory system;
it is the point at which there is a sudden loss of phase locking to rapid fluctuations in the
stimulus and the neural code appears to change. The stimulus-determined response of the
thalamocortical network is frequently specified in terms of the spectrotemporal receptive
fields (STRFs) of the principal thalamic and cortical neurons involved in the network. STRFs
indicate influential factors determining the response properties of the cell, and are usually
measured at the level of a single neuron using the technique of reverse correlation [66-71].
It is important to note that the STRFs, by the nature of their analysis, only represent the
linear relationship between the stimulus and the response. Auditory cortical neurons have
highly nonlinear responses, in particular to natural stimuli, where the linearly derived STRFs
were found to predict, on average, only about 11% of the response power [71]. The
spectrotemporal response pattern of a single neuron is governed by the spatiotemporal activity
of the network in which the neuron is situated. The effect of the network on an individual
neurons STRF is likely to be: (a) nonlinear, in that the effect of any two neurons on a third is
unlikely to be linear owing to the nature of synaptic and dendritic integration mechanisms; (b)
nonstationary, in that the synaptic connections between neurons are not constant but change
as function of time, and (c) adapting, in that repetition of stimuli may cause the synaptic
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 12 of 101
connections between neurons in the network to change. In addition, the local network is
subject to influences from non-local sources of activity that are determined by factors such as
stimulus context, attentional requirements and task demands.
In animal studies it has been found that the STRFs in primary auditory cortex are
formed during an early critical period through normal exposure to the acoustic environment
[72-74]. Although the nature of STRFs in human primary auditory cortex is obviously not
known, we have found suggestive evidence that they too might develop primarily through
early experience of speech sounds; in a recent study we found that ensembles of STRFS
constructed from fragments of speech stimuli can support the robust classification of other
sounds, and furthermore, the spectrotemporal properties of useful speech fragments had
similar characteristics to those measured experimentally in animals [75].
In this project, we propose to construct a detailed computational model of the
thalamocortical system, and to investigate how the dynamical properties of this network can
give rise to the experimentally observed spectrotemporal response fields, and the mechanisms
underlying the development of such response fields through exposure to different auditory
experiences [72, 76, 77]. A major advance will be obtained by incorporating this more
detailed model of the thalamocortical network within the large-scale cortical model, allowing
the development of experience-dependent representations in the integrated model. The ability
of the enhanced model to support the categorisation of ongoing auditory stimuli will be
investigated; in particular, the contextual role of intracortical signals in facilitating the
segmentation and categorisation of ongoing stimuli, and the attentional control of this process.
ii.
It is clear that any system interacting with an external world must become sensitive to
the timing of events in that world and the timescales appropriate to understanding different
events. In music abstracting the regularities due to rhythmic patterns allows the formation of
temporal expectancies which can facilitate perceptual processing, the integration of events
occurring at different time scales, and the generation of well-timed predictions, or actions. In
addition, it is known that attention can be focussed in time [78-80], and there is evidence for
the periodic predictive engagement of attention entrained to rhythmic stimuli [81].
Research in music perception has shown that time, as a subjective structuring of events
in music, is quite different from the concept of time in physics [82]. Listeners to music do not
perceive rhythm on a continuous scale. Instead, rhythmic categories are recognized and
function as a reference relative to which the deviations in timing are appreciated [83, 84]. In
fact, temporal patterns in music combine a number or time scales that are essentially different:
the discrete rhythmic durations as symbolized by, for example, the half and quarter notes in a
musical score; the continuous timing variations that characterize an expressive musical
performance; and tempo, the impression of the speed (or changes thereof) of the performed
pattern, which is related to the music theoretical notion of tactus [85] and the cognitive
process of beat induction [86].
A knowledge representation that makes the relationships between rhythmic structure,
tempo and timing explicit, and shows how expressive timing can be expressed in terms of the
temporal structure and global tempo, has been previously proposed [87] and will form the
formal basis for the current study. A central idea in this approach is the notion of rhythm
space [83, 88], i.e. the space of all possible performances of a small number of time intervals.
In this n-dimensional space every point constitutes a different temporal pattern. This infinite
set contains musical and unmusical rhythmic patterns, rhythms often encountered in music,
and those rarely used. The rhythm space captures, in principle, all possible expressive
interpretations in any musical style of any rhythm of n+1onsets. The cognitive process of
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 13 of 101
Relative pitch
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 14 of 101
missing fundamental, and the spectral dominance region, as well as interval sensitivities and
the perception of consonance could be predicted from the characteristics of speech [29, 98].
The claim here is that if developmental influences are considered, and the problem is
reformulated in terms of predictions based on previous experience then many of the
phenomena of pitch perception, which have previously been difficult to explain, are a natural
consequence. What is currently lacking though is a neurocomputational model of this process.
A fundamental problem is that it is not yet clear how pitch is represented in the brain
[99, 100]. There is evidence for the extraction and representation of periodicity within each
frequency channel, subcortically [101, 102], and in AI [103]. There is also evidence that
cortex is involved in pitch-related processing; for example, a pitch onset response has been
identified in Heschl's gyrus [104]; fMRI evidence has been found for pitch sensitive-regions
in cortex [105, 106], and for the separate representation of relative and absolute pitch [107];
and lesion studies have shown that interactions between prefrontal and temporal regions of
cortex are necessary for the perception of tonality [108]. However, a more detailed theoretical
account of the representation of pitch at the neuronal and network level is lacking.
In this project, the modelling of pitch perception will be informed by the neonate
experiments, which will identify for the first time those aspects of pitch perception that are
innate, and those which develop through experience. The problem of pitch perception will be
considered in terms of active perception, motivated by the idea that the brain is constantly
trying to abstract regularities from the stimuli it experiences. We propose to apply this
approach to the problem of extracting the regularities associated with the abstraction of pitch,
and aim to formulate a model which can account for the development of discrete pitch
categories, preferred pitch interval relationships and contextual influences on pitch
judgement, in addition to the well-documented characteristics of the perception of individual
pitches.
Extraction of theoretical insights
The fundamental hypothesis that cognition emerges through active perception of the
environment is a guiding principle of this project. The idea that music cognition, even in the
absence of overt behaviour, depends upon the development of expectations, conditioned by
the current musical context and by previous musical experience, is well accepted, but a
detailed theoretical understanding of this process has yet to be developed. We expect
therefore that the proposed experimental and computational studies will suggest many
important theoretical and computational principles of music cognition, and its autonomous
development through experience. By extracting and explicitly identifying these essential
principles, the project will contribute significantly to furthering understanding of the
emergence of autonomous complex cognitive behaviour and its realisation in artificial
systems.
Although cognitive musicology is a comparatively new discipline, and until fairly
recently music was seldom studied scientifically, it is now recognized, along with vision and
language, as an important and informative domain in which to study a variety of aspects of
cognition, including expectation, emotion, perception, and memory [109-111]. Much of the
early research in the field was criticized for focusing too much on low-level issues of
sensation, often using impoverished stimuli (e.g., small rhythmic fragments) or music
restricted to the Western classical repertoire, as well as for a general lack of awareness of the
role of music in its wider social and cultural context [109], and it is only recently that the
neuroscientific basis for music cognition has begun to be explored [34]. However, this is
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 15 of 101
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 16 of 101
development of interactive music systems has focused on understanding the interactions that
occur during performance with acoustic musical instruments. These interactions are very
complex and engage several communication channels (tactile, haptic and kinesthetic in
addition to sonic) [117]. In this approach the idea is that interactive music systems should be
able to affect and modify the performers expected actions, thus provoking a permanent
dialog between the performers and the system.
The focus in this project is rather different in that we are interested in understanding the
mental processes underlying music cognition, the factors which determine the creation of
contextual models, the controlling of attention in time, and experience-dependent
developmental processes, all of which are necessary to support the emergence of autonomous
intelligent behaviour. We propose to apply the computational principles derived through the
perceptual and computational modelling studies in the implementation of an emergent music
processing system, the Music Projector. In this system the autonomous development of
internal musical codes and expectancies, and phenomena such as categorization, similarity
ranking, and streaming will be investigated. The system will also synthesize as musical
output, sounds corresponding to the expectancies generated in response to musical stimuli.
This will allow us to compare the musical perceptions and predictions of the artificial system
with music cognition in humans; thereby closing the experimental loop outlined in the
introduction.
b. Summary of innovative aspects of the project
The findings from the experimental investigations will inform the computational
modelling studies, by indicating which aspects of perception should be learnt through
exposure to an acoustic environment, and which aspects can reasonably be hard-wired into
the models, a priori. Perceptual experiments will provide new insights into:
Innate processing of musically relevant abstract acoustic features, such as pitch,
rhythm and sequential grouping;
Innate abilities to create model-based predictions in response to acoustic stimuli;
Relationships between language and formal structures in music;
Timbral influences on the perception of musical form.
The extensive neurocomputational modelling studies, formulated with the aim of
furthering understanding of music processing in the biological system will provide us with
many new insights into the processing strategies underlying cognition. Innovative aspects of
the modelling studies include:
The development of a large-scale model of auditory processing, far more extensive
than any previously developed, which incorporates active control of peripheral
auditory processing mediated through a detailed model of the thalamocortical
system, and a model of prefrontal cortex which supports aspects of working
memory;
Investigations into whether the computational principles of biased competition can
also account for the attentive processing of auditory objects and for auditory
streaming;
Investigations into the attentional control of peripheral auditory processing;
Investigations into the development of experience-dependent representations and
processing strategies, and the emergence of perceptual categories of timbre, rhythm
and pitch;
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 17 of 101
respect will result from the ability of an artificial system to create and maintain abstract
models, at many different levels, of its own behaviour and that of other parties, and to learn to
attribute meaning to patterns of sensory inputs and to generate meaningful sequences of
actions in response to others.
Finally, an important objective of IST Call 3 is that of facilitating the participation of
organisations from New Member States in the activities of IST. A leading participant in this
project is based in Hungary and funding for the project will support the establishment of a
field laboratory for electrophysiological experiments on newborn babies. The cognitive
abilities of neonates are extremely difficult to assess and the proposed method is one of the
very few feasible ways for doing so; therefore this work is likely to prove foundational for
developing unique capabilities in a new member state and, particularly for developing
screening programmes for the early detection of problems in auditory perception which go
beyond simple audiometrical measurements. For this reason, the proposal clearly also relates
to IST-NMP-2 and the development of health monitoring systems.
5. Potential Impact
If the project achieves its ambitious aims, its impact will be large. Major contributions
will relate to enhancing scientific understanding of the cognitive capabilities of neonates and
the computational principles that underlie intelligent perception and cognition. A number of
important technological advances could also stem from this work, particularly in the area of
hearing prosthesis, and enhanced functionality for artificial systems. Finally, the work also
has the potential for significant societal impact in the development of improved hearing
screening programmes, and in the sphere of music education and entertainment. In this
section we expand upon some of the ways in which this project could have an impact, but
firstly we consider why it should be conducted at a European level.
The scope of the project requires that scientists from a range of disciplines collaborate,
including those active in theoretical and computational neuroscience, sensory perception and
cognition, experimental psychology, music technology, musicology and composition. The
proposed project requires participation at the European level because the requisite range of
expertise and mass of critical resources cannot be found at the national level. Furthermore, in
the next decade artificial systems and devices with perceptual capabilities will become an
important part of many people's lives. The development of perceptual and cognitive systems
that are sufficiently flexible to operate in the same environment and the same conditions as
humans do, will therefore have a strong impact on the future organisation of daily life. If
Europe is to help shape this future, it is important that such developments occur in Europe and
that critical expertise and know-how is assembled in its member states. In addition, the project
will make a strong contribution to training scientific specialists in Europe in interdisciplinary
topics, following the trends and fulfilling the requests made by an increasing number of
academic institutions. The young investigators taking part in this project will gain skills in
areas that are of fundamental importance to European scientific and economic success.
The cognitive capabilities of neonates are not well understood and are difficult to
investigate. This project will significantly advance understanding of these capabilities, and
may therefore provide important knowledge for helping to devise effective hearing screening
programmes in the future. The early detection of perceptual dysfunction is likely to have
profound effects, since prompt treatment at this time generally has a better chance of success
due to higher brain plasticity in early life, and also minimises related problems such as
deficient communication, late onset of speech, or difficulties in social interaction. Hearing aid
technology could take advantage of the processing mechanisms we find to be important for
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 19 of 101
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 20 of 101
Administrative coordinator
The administrative coordinator will provide:
ongoing monitoring of budget use of all partners;
receipt of all payments made by the Commission, and transfer of funds to
members of the consortium according to the agreed budget;
liaison with workpackage leaders to ensure consistency of project progress
and project expenditure;
coordination of progress reports after each 6-month period;
coordination of mid-term report, interim report, and final report;
coordination of all contractual issues: such as project contract, amendments,
collaboration agreements, and audit certificates.
Workpackage coordinators
The project is divided into nine workpackages, described in detail in section 7, each of
which will be led by the workpackage coordinator designated below. Workpackage
coordinators will be responsible for:
planning the scientific and technical work of the workpackage;
monitoring workpackage progress relative to the project plan in order to
ensure that project timescales are maintained;
reporting any problems or slippages promptly to the project coordinator;
initiating remedial action plans in the event of project deviations;
ensuring that relevant information regarding their work is communicated to
the project coordinator (and to other members of the consortium where
appropriate) promptly and accurately;
ensuring that the objectives and milestones of the workpackage are achieved;
ensuring that deliverables are available on time.
WP
number
WP description
WP leader
Susan Denham
(UoP)
Istvn Winkler
(MTAPI)
Eduardo Miranda
(UoP)
Gustavo Deco
(FUPF)
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 22 of 101
Michael Denham
(UoP)
Henkjan Honing
(UvA)
Susan Denham
(UoP)
Susan Denham
(UoP)
Xavier Serra
(FUPF)
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 23 of 101
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 24 of 101
popular meetings, such as the European Science Week. It will contain an interactive
presentation of the project as well as other elements such as those above, also downloadable
from the projects web-site. The production of version 1 of this kit is scheduled for month 6,
by which time the project is expected to be well under way and the first deliverables
available; it will subsequently be updated periodically throughout the remainder of the
project.
All members of the consortium are affiliated to educational institutions, and during the
course of the project they will offer graduate courses (for PhD and Masters students) about
topics related to the project. In addition, the holding of highly focussed short courses (either
industrially or academically oriented) will be considered as the project matures.
Other aspects of the project which are likely to be of interest to the wider public will be
publicised and communicated as and when sufficient information becomes available; these
include the findings relevant to infant hearing screening programmes, and the possibilities for
early detection and intervention; cultural and language influences on music perception and
cultural differences in the perception of music which could be used to guide the selection of
suitable music for Europe-wide media (e.g. in advertisements); possibilities for improving and
enhancing the learning experience in musical education; and the relevance to contemporary
music practice which may be of interest to composers, and which we will showcase during the
regular public Contemporary Music Weekends which are held biannually under the auspices
of Peninsula Arts in Plymouth.
The ultimate goal of understanding how complex cognitive behaviour can emerge
through experience, and how useful representations and processes result from the need to
interact effectively within an environment, is a very challenging one. In this project we will
restrict ourselves to musical environments and will address these questions with regard to
musically meaningful stimuli. The project is therefore structured so as to allow us to
investigate the perception of musically relevant sounds and sound features experimentally and
through theoretical and modelling studies, and also to distil from these the essential elements
for an artificial interactive system. The research work falls into three distinct areas as
described in the scientific case for support, and this is also reflected in the organisation of the
workpackages. In this section the major milestones and information flows are identified,
while the details of timing, the parties involved and the nature of the interactions are specified
in each of the workpackages.
Experimental investigations into the perception of musically relevant stimuli
WP1: Higher level auditory functions underlying music perception: Innate vs. learned
operations.
Music perception is subserved by auditory processing, a large part of which, including
some higher-level analyses, is regarded as being possibly innate [5]. In previous work, we
have established that several higher-level auditory processes can operate without attention
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 26 of 101
being focused on sound [12, 120] and that the operation of these processes can be studied
with the mismatch negativity (MMN) event-related brain potential [112]. Because, the MMN
method can be employed to study auditory processing in neonates [14, 15], it allows us to test
whether some of the higher-level auditory functions underlying music perception in humans
are innate. Specifically, we will ask the following questions: 1) Do neonates form groups
from repeating pitch patterns (milestone 1, month 14); 2) Do neonates process pitch
independently of other spectral sound features (timbre) (milestone 2, month 20); 3) Do
neonates process relative pitch (i.e., equate equal musical steps independently of absolute
pitch) (milestone 3, month 28); and 4) Are pitch steps, which have a special relevance in
western music processed preferentially in neonates (milestone 4, month 36)? Because
studying question 4 is dependent on the results of the experiment testing question 3, an
alternative question is considered: Do neonates form groups on the basis of repeating
rhythmic patterns?
These studies are detailed in WP1. The stimuli for the experiments will be designed in
consultation with the other groups. By identifying those aspects of perception which can be
reasonably be hard-wired and those which should be learnt through experience of sounds,
these findings will provide fundamental information to guide computational modelling; in
particular, the models of pitch perception (WP6) and rhythm (WP5). In general, in trying to
design an artificial system capable of autonomous behaviour, it is important to understand the
extent to which representations and processes should reflect the stimuli experienced during
development; these results will therefore provide important theoretical insights into design
principles for artificial cognitive systems (WP7), and guidelines for the development of the
interactive music system (WP8).
WP2. Perception of musical form
Musical form arises from the sequential combination of a coherent set of sounds. The
perception of musical form is commonly thought to arise from the interplay between
expectation and surprise [121], normally investigated in terms of pitch, tonality and rhythm,
e.g. [89, 122-124]. Although there is a great deal of understanding about expectations in
response to sound sequences, the strategies the brain employs to follow pieces of music on a
larger scale are not yet well understood. In these studies we will investigate the role of the
timbral properties of sounds in the formation of long-term expectations, and will relate this to
the concepts of predictive modelling and abstraction. We will investigate the role of timbre
though an analytical study of the relationships between typical musical structures and the
timbral patterns in speech, and by means of perceptual experiments.
These studies are detailed in WP2. The findings will inform computational modelling
studies, particularly those relating to the perception of pattern sequences and streaming (WP3)
and pitch perception (WP6). They will also provide theoretical insights (WP7) into how prior
experience with other sound classes, such as speech, can affect the development of
expectancies in music. The experiments and the musical stimuli used will be formulated in
collaboration with researchers working on the interactive music system (WP8) since the
results will be directly relevant to the functionality of the system as well as for setting
performance goals.
Computational modelling of important components subserving music cognition
WP3. Prefrontal cortical function in the control of attention and short term memory
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 27 of 101
arguably more important to understand how the stimulus-induced activity is combined with
intrinsic cortical activity, which is directly related to the brains current understanding of the
meaning of the stimulus, as determined by its context and its role in the ongoing perceptual
task.
In order to explore these ideas, we will construct a detailed computational model of the
thalamocortical and associated intracortical network, and replicate within the model the
experimentally observed STRFs for a selected subset of cortical and thalamic neurons. We
will use the model to investigate mechanisms of self-organisation and plasticity in the
development of STRFs through exposure to different auditory experiences, in a manner
consistent with experimentally observed modifications of STRFs in early development [72,
76, 77]. The ability of the model STRFs to support the categorisation of ongoing auditory
stimuli will also be studied; in particular, the role of intracortical signals in the model in
modifying the STRFs in a way which facilitates and improves the categorisation of ongoing
stimuli.
This work is detailed in WP4. As discussed before, there is clearly considerable overlap
between the detailed modelling of the thalamocortical networks underlying auditory STRFs in
WP4 and the large-scale modelling of auditory cortex in WP3. The studies in WP4 will
provide the opportunity for a detailed investigation of a very important auditory processing
component, and this will allow us later to refine this component within the large-scale model.
WP4 will be informed by the statistical analysis into relationships between music and
language (WP2), and will provide further insights into the development of effective
representations and the way in which context can influence ongoing processing, contributing
directly to our understanding of the processes underlying music cognition (WP7). The work in
WP4 will also suggest useful representations and processing strategies for the interactive
music system (WP8).
WP5. Perception and categorisation of rhythmic patterns
Research in music perception has shown that time, as a subjective structuring of events
in music, is quite different from the concept of time in physics [82]. Listeners to music do not
perceive rhythm on a continuous scale. Instead, rhythmic categories are recognized which
function as a reference relative to which the deviations in timing can be appreciated [83, 84].
In fact, temporal patterns in music combine two time scales which are essentially different:
the discrete rhythmic durations as symbolized by, for example, the half and quarter notes in a
musical score, and the continuous timing variations that characterize an expressive musical
performance. Here we will investigate the formation of rhythmic categories (rhythmic
categorization) and the influence of temporal context (such as metrical structure, tempo and
previous exposure) in active perception. Three computational modelling approaches to
categorization will be evaluated using existing empirical data. Based on these modelling
studies, a more comprehensive model of rhythmic expectation will be formulated. Such a
model would form the key temporal component in any system that engages interactively with
an environment in time.
These studies are detailed in WP5. The functionality of the model will be informed by
the relevant findings from previous perceptual experiments, as well as those to be performed
in WP1 and WP2. From the model of rhythmic expectation, processing principles will be
derived and formalised in WP7, and these will form a central component in the interactive
music system (WP8). Rhythmic expectation is also important in the controlling of attention in
time [3, 128]; the model will therefore also influence the way in which attentional processing
is implemented within the large-scale cortical model (WP3).
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 29 of 101
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 30 of 101
effectively disseminated beyond the project participants, we propose a formal mechanism for
collecting and documenting them, which will form the basis of WP7. The culmination of this
aspect of WP7 will be a scientific workshop where the theoretical principles will be
presented. The proceedings of the workshop will also be published in the form of a book.
This workpackage will form the theoretical focal point of the project, the purpose of
which will be to integrate the theoretical advances emanating from all of the previously
described workpackages, both experimental (WP1, WP2), and computational (WP3, WP4,
WP5, WP6). The computational principles underlying music cognition will be used to
formulate a generic functional computational architecture for intelligent perception and
cognition, and the utility of this architecture will be assessed through the implementation of
an emergent music processing system (WP8), allowing us to verify and refine our
conclusions; work in WP8 will also therefore contribute to the theoretical model.
WP8. Interactive music system
The fundamental principles and generic cognitive architecture formalised in WP7 will
guide the design and development of the proposed interactive music system. Although there
has been an enormous amount of experimental work in music cognition, there are very few
artificial systems in existence, and none of these to our knowledge has been motivated by the
desire to gain a deeper understanding of the neural processes underlying cognition. Most
music analysis systems exploit domain-specific knowledge and are usually formulated to
maximise performance goals; hence they are generally designed without strong constraints on
their perceptual or cognitive plausibility. In contrast, in this project we propose to implement
a system that will initially have limited a priori knowledge. It will however process musical
stimuli using a biologically realistic low-level acoustic analysis front-end and will learn
through experience to derive a multi-faceted representation of its content. Inspired by the idea
that active perception is the key to understanding self-organisation and autonomous
development, the system will be designed to develop musical expectancies with which it will
compare incoming sounds. By being immersed in a continuous musical environment, the
aim is that it will thereby discover useful features and processing strategies. In order to
support interactive behaviour, these expectancies will also be used to synthesize output
signals.
The system will provide the opportunity, both within the project and subsequently, to
study the effects of different patterns of exposure to music on the internal high-level
representations they generate, and the extent to which behavioural phenomena in music
perception and cognition (WP1, WP2) can be replicated within an artificial system. This work
is detailed in WP8, and forms a major focus for collaboration within the project. A functional
system using current state of the art algorithms will form the basis for the proposed system.
This system will then be incrementally enhanced and refined by incorporating the models and
theoretical insights derived from the experimental and modelling workpackages (WP1-6). It is
expected that the process of developing and experimenting with the artificial system in more
realistic musical environments, will provide additional significant theoretical insights into
music cognition (WP7).
ii.
Here we consider the main scientific risks arising from the three parts of the project:
experimental investigations, computational modelling, formalisation and embodiment.
The experimental investigations will determine some of the challenges to be faced by
the modelling studies and the interactive system, and hence to some extent the complexity of
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 31 of 101
the project. These experiments will define what aspects of processing should be learnt from
experience, and devising models which can learn and self-organise may prove to be more
difficult than simply building in the known functionality of the adult system. On the other
hand a deeper understanding of how the properties of the environment can be used to
determine the representations and processes within a system would open up many prospects
for the development of more powerful autonomous systems. Since the scientific questions are
clear and the methodology is well established, in themselves the experimental investigations
comprise a relatively low-risk part of the project.
The computational modelling to be undertaken in workpackages WP3-6 presents
significant scientific challenges. The theoretical understanding of audition lags behind that of
vision, and hampers the development of large-scale models of auditory processing. However,
there is a rapidly growing neuroscientific literature on the perception and processing of
complex sounds, and we believe that incorporating current data within computational models
provides a powerful means for formalising current theories, and for pointing out new
experimental questions and exposing inconsistencies. While the development of a large-scale
cortical model is difficult, and hence a high-risk component of the project, the investigator
responsible has a great deal of experience in computational modelling; in particular, in the
development of large-scale cortical models of vision. Detailed modelling of the
thalamocortical system will similarly derive much from more advanced work in vision, and in
considering the development of representations through experience we will be able to draw
upon previous studies such as [129-131], and our own recent work [132]. The modelling
studies on the perception of rhythm and pitch will both have a considerable base upon which
to build, and therefore although we expect to extend current models, these studies do not carry
as much risk as the others.
The scientific goal of formalizing and communicating the important theoretical insights
into music cognition that we gain during this project does not carry any significant risk.
However, the embodiment of this understanding within an interactive music system, which
can develop effective representations and processes through exposure to musical stimuli, is
very challenging, and carries considerable risk. In order to reduce this risk we will build upon
existing work. An initial baseline system will be developed and then enhanced incrementally,
as we make theoretical advances in the modelling studies. The group responsible for this work
has a great deal of experience in music technology, and the work plan is structured explicitly
so as to allow significant periods of time for collaborative work.
iii.
Efficient and effective management of the project will be essential to ensure the
integration of the work carried out by the consortium members; workpackage WP0 will be
devoted to this aspect of the project. Management activities will include organisational,
technical, administrative, and financial co-ordination, the monitoring of progress on the
project, the enforcement of quality standards, and the facilitation of effective communications
and information flow. All partners will contribute to this workpackage; however, the majority
of this activity and management of the project will the primary responsibility of the Project
Coordinator, aided by an Administrative Assistant.
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 32 of 101
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 33 of 101
M6,28
WP6
WP6
WP1
M14,20,28
M20,28,36
M14,20,28,36
M14,20,28
WP8
WP7
M14,20,28,36
WP8
Experimental
design & stimuli
Results
WP3
M36
WP4
M12
WP8
M12,24
WP2
M12,24
M12,24
WP5
WP6
M12,24,36
M12,24,36
Experimental
design & stimuli
WP7
WP8
Results
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 34 of 101
WP2
WP4
M12,30
M30
WP4
WP5
M30
WP3
WP6
M12,18,30
M12,18,30
M30
WP7
M12,18,30
M3,18,30
WP6
WP8
Models &
experimental results
WP3
WP2
M12
WP3
M30
M12,24
WP4
M3
M24
WP6
M24,32,36
WP7
WP6
M24,32,36
WP8
Experimental
results & models
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 35 of 101
WP1
M6,28
WP1
WP3
M14,36
WP5
WP2
M12,24
M30
M12,30,36
WP7
M30
WP8
Experimental
results
Stimuli, model
& results
WP1
M14,20,28
WP1
WP3
M3,24,32
WP4
M20,28,36
M12,18,30
WP6
WP3
M3,32
M3
WP5
M18,24,32
M18,24,32
Experimental results
& models
WP7
WP8
Stimuli, models
& results
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 36 of 101
WP1
WP2
M14,20,28,
M12,24,30
WP3
M12,18,30,36
WP4
WP7
M12,24
WP8
M24,32,36
M12,30,36
WP5
M18,24,32,36
WP6
Experiments, models
& results
M18,24,32,3
WP8
Experimental results,
models & theories
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 37 of 101
WP1
WP2
M14,20,28,36
WP1
M12,24,36
WP3
M14,20,2
M12,18,30
WP4
M24,32,36
M12,24
WP8
M18,24,32,36
M24,32
WP5
WP6
WP2
WP7
M3,18,24,32
M12,24
Experimental stimuli,
results & theories
WP7
Experimental results
& models
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 38 of 101
Workpackage title
Lead
contractor
No
Person- Start
months month
End
Deliverable
month No
125
36
D1.1-D1.4
38
36
D2.1-D2.3
68
36
D3.1-D3.6
Spectrotemporal response
1
fields in the thalamocortical
system
38
36
D4.1-D4.4
Perception and
categorisation of rhythmic
patterns
54
36
D5.1-D5.4
50
36
D6.1.1-D6.3
30
36
D7.1-D7.4
68
36
D8.1.1-D8.5
Management,
communication and
documentation
25
36
D0.1-D0.3
TOTAL
496
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 39 of 101
Deliverables list
Deliverable
No1
Deliverable title
Delivery
date2
Nature3
Dissemination
level4
D1.1
14
PU
D1.2
20
PU
D1.3
28
PU
D1.4
36
PU
D2.1
12
PU
D2.2
24
PU
D2.3
36
PU
D3.1
12
R, P
PU, PP
D3.2
18
PU
D3.3
30
R, P
PU, PP
D3.5
30
PU
D3.6
36
PP
D4.1
12
R, P
PU, PP
D4.2
24
R, P
PU, PP
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 40 of 101
D4.3
R, P
PU, PP
D4.4
36
PP
D5.1
12
PU
D5.2
24
PU
D5.3
30
PP
D5.4
36
PP
D6.1.1
12
R, P
PU, PP
D6.1.2
32
R, P
PU, PP
D6.2.1
PP
D6.2.2
R, P
PU, PP
D6.2.3
32
PP
D6.3
36
PP
D7.1
PU
D7.2
PU
D7.3
12
PP
D7.4
24
R, P
PP
D7.5
36
PU
D7.6
36
PU
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 41 of 101
D7.7
36
O, R
PU
D8.1.1
PP
D8.1.2
12
PP
D8.2.1
10
PP
D8.2.2
18
PP
D8.3.1
24
PP
D8.3.2
36
PU
D8.4.1
30
PU
D8.4.2
36
PU
D8.5
36
PP
D0.1
12
PP, PU
D0.2
24
PP, PU
D0.3
36
PP, PU
D0.4
36
PP, PU
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 42 of 101
UvA
1
Objectives
1. Distinguish innate and learned levels of abstraction in auditory processes underlying music
perception
2. Compare auditory model-based predictive functions between neonates and adults
3. Test whether neonates experience pitch similarly to adults
Description of work
Methodology
Event-related brain potentials are measured between electrodes attached to various scalp
locations and a reference electrode. The measurement of these potentials has been standardized
(see [133]). For typical recording parameters and electrode montage in neonates as well as in
adults, see [134]. Epochs, time-locked to the test stimuli are extracted from the continuous EEG
record. Epochs are then aligned with each other with respect to stimulus onset and averaged in
groups formed according to the role of the stimuli in the stimulus sequence (in the present case,
typically, standard stimuli form one group and deviant stimuli form another group). Averaged
responses are compared between the different stimulus groups and across conditions and/or subject
groups by means of parametric statistical tests, such as the ANOVA or MANOVA including
planned comparisons and possible post-hoc tests.
The parameters (amplitude, latency, etc.) of MMN can be best assessed by subtracting from
the response elicited by the deviant stimulus the response elicited by a control stimulus, which
shares as many features as possible with the deviant stimulus but does not violate any regularity
within its own context. The simplest method is to subtract the response elicited by the standard
stimulus from the deviant-stimulus response. When possible, we shall use a control condition, in
which stimuli that are acoustically identical to the deviant appear with the same probability as the
deviant is presented in the test condition, but the control stimulus is a regular (standard) stimulus
within the control sequences (see [135]).
Because MMN elicitation requires the normal functioning of the auditory system, we
regularly perform audiometry on our subjects. This can be done with a simple behavioural
procedure in adults. In neonates, we will employ the objective audiometry procedure that measures
the parameters of the brainstem auditory evoked responses (BAER). BAERs are widely used as an
objective screening for hearing deficits (e.g. [136]). BAER waveforms indicate the functionality of
different stages of the auditory pathway from the cochlea up to the thalamic level. Because
latencies of BAER waveforms reflect maturation at an early age (e.g. [137]), data may serve as a
basis of further studies investigating effects of maturation on higher-level auditory functions in
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 43 of 101
healthy pre-term infants. Early detection of possible hearing deficits may be an additional benefit
because corrective measures are more effective when the plasticity of the brain is still higher.
In infants, MMN can be elicited in sleep [138]. MMN is elicited in adults while they perform
a visual primary task (such as an n-back task, see [17]). In all experiments, adult subjects will
perform a visual primary task. Recordings in neonates will be carried during quiet sleep.
Description of the experiments
Experiment 1: Grouping by periodic pitch pattern
In adults, Sussman et al. [22] found that at short inter-stimulus intervals (ISI), no MMN was
elicited by the infrequent tone in a tone sequence having the AAAABAAAAB structure (where
A and B are two tones differing from each other in frequency), although MMN was elicited when
the order of A and B tones was randomised while retaining their ratio (4:1). The authors
interpreted their results in terms of temporal grouping: When the unit of the stimulus sequence
becomes the AAAAB tonal group then the B tone does not violate any rule, because its is part of
the repeating standard and, therefore, no MMN is elicited. This interpretation has been confirmed
when, using a longer ISI, MMN was elicited by the B tones as long as subject was not informed
about the structure of the tone sequence [139] (see WP1-Figure 1). Since then, the ISI below which
automatic grouping of the AAAABAAAAB sequence occurs (i.e., no MMN is elicited when
subjects do not attend to the sounds) has been established as ca. 400 ms (Sussman, unpublished
data).
WP1-Figure 1. ERP responses to A (Tone 1)
and B (Tone 2) tones in five different
experimental conditions. Randomised sequences
presented 80% A and 20% B tones in a
randomised order. The structure of the patterned
sequences was AAAABAAAAB Subjects read a
book in the Ignore conditions. In the
Attend-Pitch condition, subjects were instructed
to press a response key when they heard the T
tone, which occurred very infrequently in each
sequence (2.5%) and was lower in pitch than
either A or B. In the Attend-Pattern
condition, subjects were instructed to press the
response key when the repeating AAAAB pattern
of the sequence was violated (that is, they
pressed the key again for the T tones, which
broke the regular structure of the patterned
sequence). The Attend-Pattern condition was
administered after the Attend-Pitch condition.
In the current experiment, we will test whether neonates automatically group this
periodically presented tone pattern and, whether the ISI limit of automatic grouping is similar to
that in adults. Control sequences will present the same tones in randomised order. On the basis of
previous studies [14], we expect MMN to be elicited in the control sequences.
Results of this study will provide constraints for models of rhythm perception (WP5) and
music cognition (WP7). The experiment will also provide task design and empirical data to guide
some of the simulations planned for developing the interactive music system (WP8).
Experiment 2: Timbre-independent extraction of pitch
Adults are able to equate the pitch of spectrally very different sounds (e.g., tell that a sound
produced by flute had the same pitch as another sound produced by a violin), which is an
important prerequisite of music perception. Using the missing fundamental pitch phenomenon
[140] in which pitch is retained although the fundamental (lowest) harmonic of complex tones are
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 44 of 101
removed from the sound, we have shown that MMN is elicited by change in (virtual) pitch even
when no spectral component of the pitch-deviant sound was infrequent in the sound sequence
[141, 142]. For these studies, nine complex tones were created, all of which had the same
(removed) fundamental frequency (standard tones). One complex tone was composed of
harmonics, which were also present in four of the other tones, but based on the (removed)
fundamental whose frequency was two times the fundamental frequency of the other nine tones
(deviant tone). This resulted in the deviant tone being perceptually an octave higher than the nine
standard tones. In sequences containing all ten complex tones with equal probability, the deviant
tone elicited the MMN response. This result suggested that 1) the MMN response is based on
perceived (pitch) rather than physical (frequency) stimulus properties and 2) that MMN is elicited
by pitch change even when other spectral parameters vary in the sound sequence. (Note that the
memory representations involved in the MMN-generating process also contain timbre information;
see [143].)
The current experiment will use an extension of the above procedure. Sounds of identical
pitch produced by various musical instruments will be presented in a sequence (standards).
Responses to occasional sounds produced by the same instruments as the standards but having a
different pitch (deviants) will be checked for signs of MMN elicitation. In control sequences,
responses to the same deviants will be recorded when these follow homogeneous sequences of the
corresponding (same-instrument) standard sound. Behavioural studies in adults have shown that
timbre-independent extraction of pitch is significantly facilitated by placing the sounds within a
musical phrase [30]. In the second condition of this experiment, the same standard and deviant
sounds will be delivered within short, musical phrases. Results will be compared across the
musical-context and isolated-sound conditions. This experiment will reveal, whether neonates
separate pitch from other spectral sound features and whether musical context helps pre-attentive
processing of pitch or the advantage found in behavioural experiments is the product of later
processes (e.g., decision).
The experimental design and stimulus materials will be developed in conjunction with UoP
(WP6) and FUPF (WP8). The results will provide a crucial constraint for models of pitch
perception (WP6): they will tell whether the separation of pitch and timbre should be regarded as a
basic feature of the system or learning mechanisms should be postulated to account for this
separation. Results will also be used in modelling music cognition (WP7), and to provide task
design and empirical data to guide some of the simulations planned for developing the interactive
music system (WP8).
Experiment 3: Representation of relative pitch
Relative pitch is more important for music than absolute pitch, since it carries the melody
contour. Thus it is important to test, whether there exist innate operations extracting frequency
ratio (the physical parameter underlying relative spectral pitch) independent of the absolute
frequency (absolute spectral pitch) level. In a previous study, Paavilainen and his colleagues [144]
have shown that occasional ascending-pitched tone pairs with a frequency increment that was
higher or lower than that of the majority of tone pairs elicited the MMN, despite the variation of
the absolute frequency level (WP1-Figure 2).
A version of this paradigm will be tested in new-born babies. For maximizing the size of the
response, we will use version 2a with the tones presented in parallel as in 3b (see WP1-Figure 2).
Instead of tones, valid musical sounds (e.g., guitar) and musical intervals will be used. The effect
of musical context can be tested by embedding the sound-pairs in full chords. The experimental
design and stimulus materials will be developed in conjunction with UoP (WP6) and FUPF (WP8).
The results will be used in theoretical models of music cognition (WP7) and they will be especially
important for constraining computational modelling of the perception of relative pitch (WP6) and
for guiding some of the simulations of the interactive music system (WP8).
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 45 of 101
WP1-Figure 2. Left side: Schematic illustration of the stimulus sequences. The tone-pairs are depicted by small
connected boxes. The y axis represents frequency, the x axis time. Tone-pairs were either delivered sequentially (2a
and b, 3a) or in parallel (3b). Frequent regular (standard) tone-pairs are marked with the letter s, infrequent deviant
tone-pairs with d. Right side: ERP responses elicited by the standard and deviant tone pairs (overplotted),
separately for the different conditions and direction of the deviance (2nd row: decreased within-pair frequency
difference; 3rd row: increased within-pair frequency difference). The MMM response appears as the difference
between the standard and deviant ERPs was statistically significant for all conditions and deviance direction; it is
depicted by shading of the difference between the responses.
Deliverables
D1.1 Manuscript for WP1 experiment 1 (month 14)
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 46 of 101
D1.2
D1.3
D1.4
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 47 of 101
UvA
0
Objectives
1. Investigate relationships between the timbres of language and musical structures
2. Determine whether the phonetics of native languages bias musical expectation
3. Investigate whether timbral pattern sequences can affect the perception of musical form
Description of work
Introduction
In this workpackage we will investigate whether musical expectations can be created and
influenced by timbral patterns. Specifically, the following questions will be asked: a) Can the
statistical structure of timbral patterns in speech of specific languages predict the typical musical
structures, such as tension profiles, or idiomatic phrases, found in music of that culture? b) Is
possible to detect musical sequence universals across a number of different linguistic groups? c)
Can the phonetics of different languages bias musical expectation? d) Can spectral relationships
bring about expectation in music? The statistical experiment will be performed on linguistic and
musical corpora. The psychoacoustic experiments will be performed with realistic experimental
pieces of music composed and synthesised to address the issues in question. These examples will
be prepared in collaboration with the FUPF team and will be informed by the ongoing work
developed in WP8.
Description of the experiments
Experiment 1: The role of speech systems in musical expectation
Statistical analysis of speech and music corpora
It has been demonstrated experimentally that language and music share brain resources; they
can be studied in parallel to address questions of neural specificity in cognitive processing [145,
146], and general cognitive principles are involved when aspects of syntactic processing in
language are compared with aspects of harmonic processing in music [147]. The music of most, if
not all, human cultures share a number of characteristics believed to be musical universals; e.g.,
division of the continuous dimension of pitch into iterated sets of intervals defining a musical scale
and the preferential use in musical composition of particular subsets of these intervals [148, 149].
Lieberman [150] argues that the sounds of human speech also share a number of features across
most, if not all, human languages. Prosody and intonation may have originated very early in the
course of human evolution, perhaps even before we evolved the neural apparatus to deal with
language and music as two distinct phenomena [151, 152]. The notion that speech plays an
important role in music perception is shared by a number of researchers [85, 153, 154]. However,
although it has recently been demonstrated that the probability distribution of amplitude-frequency
combinations in human utterances of a number of different languages matches the structure of the
chromatic scale intervals [28], the notion that the probability distribution of sequential patterns in
speech may also predict the structure of musical sequences has not yet been investigated.
Therefore this study will address the following questions: Can the statistical structure of timbral
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 48 of 101
sequences in specific languages predict the structure of musical sequences within music of that
culture? Is possible to detect musical sequencing universals cross a number of different linguistic
groups?
Speech corpora comprising a number of different living languages will be segmented using
variable resolutions [155], and the spectrotemporal pattern of each segment will be characterised.
A corpus of musical pieces (without singing) typical from the cultures that speak the respective
languages will be segmented and analysed in the same way. The probability distribution of the
speech and corresponding musical patterns will be compared in order to identify timbral structures
common to both.
The results of the analysis will reveal whether one can infer rules about musical form based
on the statistical structure of different speech systems, and will contribute to the debate concerning
possible influences of musical experience on speech perception.
Expectations from formant contexts
The findings of the study described above will be verified in perceptual experiments using
musical stimuli. In this experiment will address the question: Can the phonetics of different
languages bias musical expectation? Subjects from different linguistic backgrounds will listen to
musical pieces within which musical passages appear in different contexts. They will be asked to
judge the degree of similarity between the passages. The contexts will be derived so that they
correspond to the formant configurations of the vowel systems of different languages. In addition,
the sequential structures identified in the analytical study will be used to suggest typical and
atypical rhythmic patterns, phrases and chord sequences.
The musical passages, designed in consultation with FUPF (WP8), will be carefully crafted
for the following scenarios:
The pitches of the passage in question will be derived from the centre frequencies of first
three formants of a typical vowel of the mother tongue of the subject, and larger scale form
will be derived from typical sequential structures found in the analytical study;
The pitches of the passage in question will be derived from the centre frequencies of first
three formants of a vowel that is not characteristic of the mother tongue of the subject, and
the larger scale form will contain atypical sequential patterns;
The ability to rank the similarity of musical passages embedded in different contexts will
measure the influence of the phonetic and structural properties of different languages in musical
expectation.
The results from the analytical study and this perceptual experiment will inform the
implementation of the music system (WP8) and the theoretical models of music cognition (WP7),
and will also add further insights into the distinction between innate and learned levels of
abstractions in the perception of pitch (WP6) and rhythmic patterns (WP5), and into the likely
nature of learned representations (WP4).
Experiment 2: Expectations from spectral relationships
There have been a number of studies aimed at characterising musical similarity in terms of
pitch and rhythm, including a model to compute approximate repetitions of musical sequences in
terms of pitch distances [156], and a method to measure melodic similarity based psychological
rating tests with subjects [157]. Recently, a theory of musical understanding has been proposed,
which suggests how musical structure may be processed in the brain, and within which similarity,
derivation, categorisation and schematisation function in an integrated way [158]. Although a few
composers have considered timbre to be the main musical attribute in structuring musical form
[159-161], there is a lack of supporting experimental data. In order to address this problem we will
consider in this experiment whether spectral context can characterise similarity of musical content.
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 49 of 101
This experiment will address the following question: Can spectral relationships bring about
expectations in music? Subjects will listen to pairs of musical passages that are presented one after
the other, separated by a short delay and will be asked to judge the degree of similarity between
the passages. The second passage will be slightly changed in a number of ways, including
manipulation of the spectral envelope and altering the relationships between partials, transposition,
as well as rhythmic and pitch changes. Details of the experimental design and stimuli will be
decided in consultation with FUPF (WP8).
The ability to rank the similarity of spectral transpositions will measure the expectations of
the subjects with relation to the spectral dynamics of a musical piece. The experiment will also
reveal whether timbre can mask the perception of small variations in tonality and rhythm. If this is
the case then it will provide experimental data for the investigation into the role of working
memory in perceptual categorization (WP3). The results from this experiment will inform the
implementation of the music system (WP8) and will have implications for the design of the
theoretical model of musical cognition (WP7).
Deliverables
D2.1 Manuscript for WP2 analytical study (month 12)
D2.2 Manuscript for WP2 experiment 1 (month 24)
D2.3 Manuscript for WP2 experiment 2 (month 36)
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 50 of 101
WP3. Prefrontal cortical function in the control of attention and short term
memory
1
3
Start date or starting
Workpackage
event:
number:
Participant id:
UoP
FUPF
MTAPI
60
0
Person-months per participant: 6
UvA
2
Objectives
1. Develop a large-scale cortical model of auditory processing.
2. Investigate the role of auditory attention in the formation of auditory streams.
3. Extend the large-scale cortical model to include prefrontal brain areas, and investigate the
role of working memory in contextual processing.
4. Integrate models from other modelling studies within the large-scale cortical model.
5. Extract computational principles relevant for a general model of music cognition, and for
technical music applications.
Description of work
Introduction
Contrary to the case in visual perception, where very detailed and biological realistic
computational models exist (see for example [35] for a review), there are very few models of
auditory cortical processing, in particular, for the analysis of attention and memory in auditory
perception. We have previously developed a theoretical framework, incorporating mathematically
explicit spiking and synaptic dynamics, which enables single neuron responses, fMRI activations,
psychophysical results, the effects of pharmacological agents and the effects of damage to parts of
the neural system under study, to be explicitly simulated and predicted. This framework is
consistent with the leading theory of visual attention, namely the hypothesis of biased competition,
which postulates that populations or pools of activated neurons engage in inhibition-induced
competitive interactions which can be biased toward a specific population by an external input
representing attention or context [58, 59, 61, 62]. In a generalized version of this hypothesis,
neural populations are combined in such a way as to model an individual brain structure (e.g. a
cortical area) and engage in competitive and cooperative interactions, through which they try to
represent their input in a context-dependent way. Different model areas bias each other, through
which interaction different aspects of the environment are represented by different areas, leading to
a more complete percept [35]. In this workpackage we will develop a similar large-scale model of
auditory processing, and use it to investigate the role of attention in auditory streaming, and in the
active control of peripheral processing. Extensions to the model to include prefrontal regions will
also allow investigations into aspects of working memory and perceptual constancy.
Develop a large-scale cortical model of auditory processing
Develop a large-scale, neurobiologically realistic cortical model of the primate auditory
cortex, including the medial geniculate nucleus, the auditory areas I and II, and the superior
temporal gyrus. Processes occurring at the AMPA, NMDA and GABA synapses will be
dynamically modelled in an integrate-and-fire implementation to produce realistic spiking
dynamics. We assume a hierarchically organized set of different attractor network pools in the
primate auditory cortex consistent with the brain areas mentioned above. The hierarchical structure
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 51 of 101
will be organized within the general framework of the biased competition model of attention. In
this approach each cortical area is modelled as a network of interconnected excitatory and
inhibitory neurons, with the strength of connectivity adjusted to reflect the organisation of
functional clusters [43]. This allows inputs from other regions to be processed in the context of
neuronal reverberation, cooperation and competition biased by task-relevant information.
Networks, representing each of the cortical areas involved in auditory processing will be connected
according to the known architecture of the auditory system. In order that the cortical model
receives realistic inputs, it will be built upon an existing modelling system for peripheral auditory
processing [50], which includes well-established models of cochlear and subcortical processing.
The common model of peripheral processing will be implemented in WP6 and also used as the
basis of investigations in WP4 and WP6. The enhanced peripheral model with feedback
connections will be incorporated into this modelling framework later in the project.
The proposed large-scale model of auditory processing will be far more extensive than any
previously developed, and will provide us with a common integrating modelling framework for the
project.
Investigate the role of auditory attention in the formation of auditory streams
Using the large-scale cortical model we will investigate to what extent the computational
principles of biased competition can also account for the attentive processing of auditory objects,
and in particular, whether these principles can account for auditory streaming. We will analyse the
stationary states in the model via mean-field techniques, and non-stationary transient states via the
full spiking simulations; for a review of this approach see [41, 49]. In particular, we will
concentrate on the process of auditory streaming in response to musical stimuli. Most experiments
on auditory streaming have used simple stimuli, and the relationship between the spectra of
successive sounds and the inter-stimulus interval have been shown to dominate the formation of
streams [51]. However, pitch can also influence stream formation [162]. Pitch is thought to arise
from sub-cortical processing which extracts the dominant periodicities within each frequency
channel [102, 163], although the formation of a global pitch percept and its representation in
cortex remains controversial [99, 164, 165]. Here we will include a simple model of pitch
processing, which will also form the starting point for the model of relative pitch perception
(WP6), in which the autocorrelation of the activity in each channel is used to form a correlogram
[163, 166]. The results of these investigations will have implications for the theoretical model of
musical cognition (WP7), the modelling of relative pitch perception (WP6), and the
implementation of the interactive music system (WP8).
Extensions to the large-scale cortical model
We will collaborate with researchers from WP4 in order to incorporate a more detailed
model of the thalamocortical system within the large-scale cortical model. This will provide the
large-scale model with the possibility of developing time-varying receptive fields in response to
acoustic stimuli. We will also work with researchers from WP4 and WP6 in implementing topdown attentional control of peripheral processing (mediated through the thalamocortical system),
and in the investigations into active perception. In collaboration with researchers from WP5 the
model will also be extended to account for the generation of rhythmic expectancies in response to
musical stimuli, and through this the control of attention in time.
Extend the large-scale cortical model to include prefrontal brain areas, and investigate the role
of working memory in contextual processing
We will extend the large-scale cortical model described above to include an explicit model
of cortical prefrontal brain areas (dorsolateral and inferior frontal regions) [40], and its coupling
with the superior temporal gyrus [125, 126]. This will allow us to investigate the role of working
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 52 of 101
Deliverables
D3.1 Report describing the large-scale cortical model of auditory processing (month 12)
D3.2 Report describing the investigations into the role of auditory attention in the formation of
auditory streams (month 18)
D3.4 Report describing extensions to large-scale model beyond those documented in year 1
(month 30)
D3.5 Report describing the investigations into the role of working memory in contextual
processing (month 30)
D3.6 Report describing the computational principles emanating from these modelling studies
relevant to a general model of music cognition (month 36)
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 53 of 101
UvA
0
Objectives
1. Construct a computational model of the thalamocortical and associated intracortical
network and replicate within the model spectrotemporal response fields (STRFs) such as
those observed experimentally.
2. Investigate mechanisms of self-organisation and plasticity in the development of STRFs
through exposure to different auditory experiences.
3. Investigate the ability of the model STRFs to support the categorisation of ongoing auditory
stimuli within the arge-scale cortical modelling framework.
4. Extract computational principles relevant for a general model of music cognition, and for
technical music applications
Description of work
The aim of this workpackage is to understand how the thalamocortical network of the
auditory system contributes to the representation of timbre and the extraction of meaning from an
auditory stimulus, where the stimulus forms part of a continuous stream of stimuli, such as in
music. Our approach is to develop a detailed computational model of the auditory thalamocortical
network (TCN) and its associated intracortical networks and to investigate its response to auditory
stimuli and how this response is modified by factors which assist in specifying the meaning of a
stimulus. Such factors include the temporal context in which the stimulus appears, the brains
expectation of the presence of the stimulus in this context; the attentional status of the stimulus
(whether it is being attended to or not at the current time), and the role the stimulus plays in the
current goals of the listener.
The proposed plan of work is presented below in three stages:
Stage 1: Detailed computational model of the auditory thalamocortical system
Construction of a computational model of the thalamocortical and associated intracortical
network; replication within the model of experimentally observed STRFs for a selected subset of
cortical and thalamic neurons, i.e. those with relatively simple receptive field structures with a
small number of excitatory and inhibitory fields distributed both spectrally and temporally.
The proposed computational model will comprise mathematical descriptions of the dynamic
properties of the principal populations of excitatory and inhibitory neurons and their connections,
in the following thalamic and cortical areas: medial geniculate nucleus of the thalamus (MGN);
nucleus reticularis of the thalamus (NRT); and layers 2/3, 4, 5 and 6 of the primary auditory
cortex (AI2-6). The models will be based on neurobiological data on the cellular and synaptic
mechanisms involved, and initially constructed at a range of levels of description, including
population-based models and conductance-based spiking neural network models, the latter
incorporating the membrane currents considered necessary to adequately describe the
spatiotemporal dynamics of the network. If necessary, e.g. to take account of the spatial
distribution of synaptic inputs onto cortical pyramidal cells, multi-compartmental neuron models
will be incorporated into the network model. From these initial models, the final model will be
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 54 of 101
constructed, as the simplest possible model which is capable of replicating the major dynamical
features of the experimentally observed STRFs of principal thalamic and cortical neurons. This
detailed model of the auditory thalamocortical network will be incorporated into the large-scale
cortical model (WP3), and will provide the interface whereby cortical processing can generate
appropriate top-down signals in order to influence subcortical processing (WP6).
Stage 2: Mechanisms of self-organisation and plasticity
Investigation into the mechanisms of self-organisation and plasticity in the development of
the STRFs of selected neurons in the model through exposure to different auditory experiences, in
a manner which is consistent with experimentally observed modification of STRFs in early
development.
It is likely that the main features of the STRF properties of selected thalamic and cortical
neurons develop using self-organising mechanisms as a result of exposure to natural auditory
environments during early developmental stages. Exposure to abnormal auditory environments has
been shown experimentally to lead to disruption of normal STRF properties. Because the
mechanisms which determine the early development of STRF properties are presently unknown,
this stage of the workpackage is speculative and based largely on hypothetical mechanisms. It is
thought however that sharpening of tuning curves and the refinement of tonotopicity in AI is
dependent on appropriately patterned input activity during development [76, 77, 170, 171] and
there is evidence that modification of neural circuits in AI can be induced by abnormal patterns of
neural activity [76, 77, 170, 171]. These investigations will also be informed by the results from
the statistical study of the relationship between language and music in WP2. The results will
inform the development of the theoretical model of musical cognition (WP7), and suggest suitable
representations for the interactive music system (WP8).
Stage 3: Categorisation of ongoing auditory stimuli
Investigation of the ability of the model STRFs of the model to support the categorisation of
ongoing auditory stimuli, e.g. relating the stimuli to particular components of a complex auditory
scene; the form of intracortical signals in the model which relate to the meaning of the stimulus,
and their ability to modify the STRFs in a way which facilitates and improves the categorisation of
ongoing stimuli.
Sensory information which might be used to create the meaning of a stimulus must be
interpreted in the context of the internal knowledge related to the present task. Neurons and local
neuronal circuits respond to external sensory-evoked input and to internally generated input within
the context of a background network or population activity which exists at the time. In this way the
neuronal network, within which an individual neuron or neuronal circuit participates, can exercise
a fine control over the individual neurons response, in accordance with the contextual knowledge
represented by the activity of the network. Such a neuronal control mechanism seems to provide a
potential link between context at the neural response level and context at the cognitive response
level.
In this stage of the workpackage we will investigate the way in which contextual
intracortical activity in the primary auditory cortex is created, and how this activity affects
neuronal response properties. It appears that, in the case of a single neuron, contextual network
activity can create spatiotemporal patterns of subthreshold membrane potential activity across the
dendritic tree of the neuron. The pattern of activity at any time will have a strong influence on the
synaptic integration properties of the cell and therefore on its STRF. Theoretical and
computational studies of these properties in auditory cortex and the thalamocortical network in
order to determine the way in which contextual cortical activity affects neuronal response
properties in order to improve the role of STRFs in categorisation may provide a link between the
neuronal and the cognitive level on the role of context in understanding the meaning of sensory
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 55 of 101
stimuli in the brain. In order to conduct this study effectively it will be necessary to integrate the
detailed thalamocortical model into the large-scale cortical model developed in WP3. The results
will inform the development of the theoretical model of musical cognition (WP7), and suggest
suitable representations for the interactive music system (WP8).
Stage 4: Extraction of computational principles relevant for a general model of music cognition,
and for technical music applications
From these modelling studies we will extract the computational principles that we have
found to be important in music cognition. These will be included in the formulation of a generic
architecture for cognition (WP7), and in designing more powerful algorithms for use in the
interactive music system (WP8).
Deliverables
D4.1 Report describing the thalamocortical model (month 12)
D4.2 Report on developmental mechanisms for self-organisation in response to auditory
experience (month 24)
D4.3 Report on investigations into the role of intracortical inputs in modifying the STRF
properties in relation to categorising auditory stimuli (month 32)
D4.4 Report describing the computational principles emanating from these modelling studies
relevant to a general model of music cognition (month 36)
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 56 of 101
UvA
52
Objectives
1. Rhythmic categorization. Study the formation of rhythmic categories (rhythmic
categorization) and the influence of temporal context (such as metrical structure, tempo and
previous exposure) in active perception.
2. Model selection. Three computational modeling approaches to categorization will be
evaluated on existing empirical data. One approach is based on the Gestalt principles of
perception, simplicity or ease of encoding being a key aspect. An alternative approach,
called memory-based, is based on the notion of likelihood. Here, models try to explain
structural interpretations in terms of the most probable encoding, the probabilities being
extracted from previously heard examples. A third approach is based on the laws of
kinematics, modeling rhythm directly in terms of action, using the apparent similarities
between physical and musical motion.
3. Rhythmic expectation. Based on the results of 2) a model of rhythmic expectation can be
formulated that will be the key temporal component in a model of emergent cognition.
4. Extract computational principles relevant for a general model of music cognition, and for
technical music applications
Description of work
Introduction
Research in music perception has shown that time, as a subjective structuring of events, is
quite different from the concept of time in physics [82]. Listeners to music do not perceive rhythm
on a continuous scale. Instead, rhythmic categories are recognized which function as a reference
relative to which the deviations in timing can be appreciated [83, 84]. In fact, temporal patterns in
music combine two time scales which are essentially different: the discrete rhythmic durations as
symbolized by, for example, the half and quarter notes in a musical score, and the continuous
timing variations that characterize an expressive musical performance. In this workpackage we
will evaluate current theories of rhythmic perception and formulate a model which can generate
rhythmic expectancies in response to musical stimuli using a categorical representation of
rhythmic patterns.
Investigate the perceptual formation of rhythmic categories
Honing [87] proposed a knowledge representation that makes these three aspects (i.e.
rhythmic structure, tempo and timing) explicit by introducing a way in which expressive timing
can be expressed in terms of the temporal structure and global tempo. This representation will
form the formal basis for the current study in trying to disentangle these components by studying
systematically obtained empirical data. The key idea in this approach is the notion of rhythm space
[83, 88]. Instead of using the more common method of studying a corpus of typical examples
[172], we consider the space of all possible performances of a small number of time intervals (or
note durations). In this n-dimensional space every point constitutes a different temporal pattern.
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 57 of 101
This infinite set contains musical and unmusical rhythmic patterns; rhythms often encountered in
music, and those rarely used. This rhythm space captures, in principle, all possible expressive
interpretations in any musical style of any rhythm of n+1 onsets. For example, in considering
rhythmic patterns of four onsets, any pattern can be represented in three dimensions, with the three
axes representing the three inter-onset intervals (IOIs). All patterns that add up to a fixed total
duration form a diagonal triangular slice in such a space (see Figure WP5-1a). Looking from
above, towards the origin, the triangle can be presented as a ternary plot (see Figure WP5-1b), and
the particular rhythmic pattern by any point is this space can be interpreted (see Figure WP5-1c for
two examples).
Figure WP5-1. Rhythm space (A), ternary plot (B), and two example patterns (C) (see text for details).
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 58 of 101
Deliverables
D5.1 Evaluation of existing approaches to rhythmic categorization on shared data (paper)
(month12)
D5.2 Comparison of rhythm perception models based on simplicity vs. likelihood (paper)
(month24)
D5.3 Prototype of rhythmic expectation (model) (month 30)
D5.4 Report describing the computational principles emanating from these modelling studies
relevant to a general model of music cognition (month 36)
Milestones and expected results
1) Evaluation predictive power of kinematic, memory-based and perception-based models of
rhythm perception and production (month 12).
2) Construct and evaluate rhythm perception models based on simplicity and on likelihood
(month 24).
3) Model of rhythmic expectation, computational principles (month 36).
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 59 of 101
UvA
0
Objectives
1. Active perception and the emergence of relative pitch. Investigate the development of
discrete pitch categories through the experience of pitch sequences, and the extension of
current models of absolute pitch perception, to account for the perception of relative pitch,
pitch interval relationships and contextual influences on pitch judgements.
2. Computing with active sensors. Investigate the computational and functional implications
of active control of peripheral sensory processing through gain control and filter retuning.
3. Active pitch perception. Investigate the properties of a model of relative pitch perception,
reformulated to actively modify peripheral processing.
4. Extract computational principles relevant for a general model of music cognition, and for
technical music applications
Description of work
Introduction
Pitch is a fundamental perceptual attribute of communication sounds and music, and is also a
powerful cue for the grouping and segregation of sounds within a mixture (reviewed in [91]). Here
we propose to formulate a model of pitch perception which can account for contextual influences
on pitch perception and for the emergence of discrete pitch categories. Pitch relationships are an
essential aspect of pitch perception, and we generally experience pitches not in isolation but in
sequences and as part of higher-level cognitive structures. Perceptual judgements of pitch are
influenced by context; judgements can be facilitated if the context is a melodic sequence [30, 92,
93], or a chord sequence with tonality consistent with the target, but can be impaired by the
presence of a non-matching context [93]. However, current models of pitch perception do not
account for the influence of context on perception. In addition, most models of pitch perception
focus on the representation of absolute pitch, a perceptual phenomenon very few people actually
possess [94]; although most people do have a good sense of relative pitch, e.g. judging whether a
note is in tune or not. We propose to formulate an active model of pitch perception that develops
representations by modelling the regularities in the stimuli it experiences, at increasingly higher
levels of abstraction. The investigations into predictive modelling of incoming stimuli will also be
applied to the development of an intelligent sensory periphery and the attentional modulation of
bottom-up processing.
Active perception and the emergence of relative pitch
We will investigate the development of discrete perceptual pitch categories through the
experience of pitch sequences, and formulate extensions to current models of absolute pitch
perception to account for the perception of relative pitch, pitch interval relationships and
contextual influences on pitch judgements.
Models of pitch perception fall into two broad classes, spectral or place models which rely
upon the distribution of energy across the basilar membrane to isolate individual harmonics in a
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 60 of 101
complex tone, and temporal models which analyse the periodicities in the firing patterns of nerve
fibres. Both sources of information are available in the auditory nerve activity, and the competing
models have different strengths and weakness; so there remains some controversy over which is
correct. We will consider a representative model of each class, and investigate their response in the
presence of multiple pitches, and their ability to represent interval relationships and to account for
contextual effects.
An influential temporal model, the summary autocorrelation function (SACF) model [163]
has been shown to account for a wide range of pitch phenomena. It consists of a model of cochlear
processing which includes spectral decomposition by a bank of band pass filters, half wave
rectification and low pass filtering of inner hair cell processing, periodicity estimation within each
frequency channel using an autocorrelation function (ACF), and a linear sum of the of the ACFs to
give the SACF. The incorporation of adaptive delays into this model would give it contextual
sensitivity. We will investigate whether an adaptive SACF model shows sensitivities to pitch
similarities that account for the preference for simple pitch relationships.
Although successful, spectral models have previously been criticised for their assumption of
the existence of harmonic templates used to group harmonics to derive pitch. However, recently it
has been shown how such templates could arise simply through correlations between auditory
nerve spike trains [177]. The model of [178] is a spectral model based upon the idea of a harmonic
sieve and it too explains a great deal of perceptual data. Processing in the model results in a pattern
of activity across a pitch map in which energy levels indicate the degree to which a particular pitch
is activated. There is potential in this model too for the inclusion of contextual effects and
facilitated processing of related pitches, and for combining it with place-time peripheral processing
[177].
Our investigations will evaluate which of the candidate models is better able to represent
pitch intervals, the influence of tonality and the preference for discrete relationships. Both models
extract absolute pitch, and so we will formulate extensions to support pitch-invariant interval
representations. We will use this to inform the formulation of a model of pitch perception in terms
of active perception, motivated by the idea that the brain is constantly trying to abstract regularities
from the stimuli it experiences.
Researchers on WP6 will collaborate with MTAPI (WP1) in order to design suitable stimuli
for distinguishing between learnt and innate levels of pitch perception. The model of pitch
perception developed here will be informed and constrained by the results of the perceptual
experiments (WP1). The model will be integrated into the large-scale modelling framework, in
collaboration with researchers in WP3, in order to investigate the role of interactions between
prefrontal cortex and superior temporal gyrus in the perception of tonality found to be necessary
experimentally [108, 179, 180].
Computing with active sensors
We will investigate the computational and functional implications of active control of
peripheral sensory processing through gain control and filter retuning, and the properties of a
model of relative pitch perception, reformulated to actively modify peripheral processing.
In the bat it has been shown that cortical stimulation can modulate cochlear tuning in a
number of ways [181]; these include a simple gain increase at the site with matching best
frequency, as well as adjustments to the tuning of the basilar membrane filtering properties at nonmatched sites. Depending on the site of cortical stimulation, the retuning can move the best
frequency of the filter towards or away from the stimulating frequency, and can alter the
bandwidth of the filter, and increase or decrease its gain. There are many models of cochlear
processing, and we will base our experiments on the one of Meddis and colleagues, since it has
been shown to account for many aspects of cochlear processing [182, 183].We will extend this
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 61 of 101
model to incorporate active control of the filter properties, and conduct a systematic empirical
investigation into the computational implications for models operating on its output. In particular,
we will consider the impact on the pitch models, described above, of dynamic adjustments to
cochlear processing.
In collaboration with researchers in WP3 and WP4, the active peripheral model will be
incorporated into the large-scale cortical model in order to simulate the attentional control of
peripheral processing.
Extraction of computational principles relevant for a general model of music cognition, and for
technical music applications
From these modelling studies we will extract the computational principles that we have found to
be important in music cognition. These will be included in the formulation of a generic
architecture for cognition (WP7), and in designing more powerful algorithms for use in the
interactive music system (WP8).
Deliverables
D6.1.1 Comparison between temporal and spectral models of pitch in relationship to the
representation of pitch intervals, the influence of tonality, and categorical perception of relative
pitch (report) (month 12).
D6.1.2 Formulation of an active model of pitch perception, which can account for contextual
influences and the emergence of discrete pitch intervals (model, report) (month 24).
D6.2.1 Baseline peripheral model, distributed to partners (model) (month 3).
D6.2.2 Study of the computational and functional implications of active control of peripheral
sensory processing through gain control and filter retuning (report) (month 18).
D6.2.3 Integration of enhanced peripheral model into large-scale cortical model to investigate
attentional control of peripheral processing (model, report) (month 32).
D6.3 Report describing the computational principles emanating from these modelling studies
relevant to a general model of music cognition (report) (month 36)
Milestones and expected results
1) Comparison between computational models of relative pitch (month 12).
2) Formulation of an active model of pitch perception, which can account for contextual
influences and the emergence of discrete pitch intervals (month 24).
3) Integration of the pitch model into the large-scale cortical model, and investigations into the
role of working memory in extracting pitch relationships (month 30).
4) Empirical investigation of the computational and functional implications of active control of
peripheral sensory processing through gain control and filter retuning (month 18).
5) Integration of enhanced peripheral model into large-scale cortical model to investigate
attentional control of peripheral processing (month 30).
6) Report describing the computational principles emanating from these modelling studies
relevant to a general model of music cognition (month 36).
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 62 of 101
UvA
4
Objectives
1. Extract and document the essential computational principles and key insights into music
cognition gained through the project as a result of behavioural experiments,
neurocomputational modelling, and the development of an artificial system for real time
interaction.
2. Define a generic functional computational architecture for cognition.
3. Identify potential beneficiaries of project outcomes.
4. Communicate the outcomes of the project to people outside our immediate research
community.
Description of work
The fundamental hypothesis that cognition emerges through active perception of the
environment has been a guiding principle in structuring this proposal. The idea that music
cognition, even in the absence of overt behaviour, depends upon the development of expectations,
conditioned by the current musical context and by previous musical experience, is well accepted,
but a detailed theoretical understanding of this process has yet to be developed. We expect
therefore that the proposed experimental and computational studies will suggest many important
theoretical and computational principles of music cognition, and its autonomous development
through experience. This workpackage will receive as input reports produced during the course of
the work on the other workpackages. Associated with each of the experiments and modelling
studies, workpackage leaders will produce reports for WP7 in which they highlight the theoretical
insights derived during that stage of their work.
In order to ensure that we take advantage of the benefits offered by the diversity of the work
in the project, one of the tasks here will be to compile the theoretical insights we gain from each of
the investigations in to a coherent report, and use this work to formulate a generic functional
computational architecture for intelligent perception and cognition. In this way we will contribute
significantly to furthering understanding of the emergence of autonomous complex cognitive
behaviour and its realisation in artificial systems. However, in order to begin the project with a
consensus view of the generic architecture for which we are aiming, an initial prototype will be
formulated through discussions at the project meetings during the first year, and used as a basis for
WP8 during the initial stages of the project.
Communication with the wider public and with potential users and beneficiaries of the
technology developed through this project is an important objective. To this end we will in the
early stages of the project produce a promotional multimedia kit, which will include an interactive
presentation of the project and many of the other interesting introductory and non-technical
elements which are also available on the web-site. This kit will be updated as the project proceeds.
The project web-site will also be established early in the project, and will be a major source of
useful information for people outside the consortium. This web-site will contain a description of
the project and its goals, and the project consortium (with links to other sites and associated
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 63 of 101
information); all public project documentation; a discussion forum for all the EmCAP community
(including project partners, interested members of the research community, and members of the
public potentially interested in the project deliverables); a news repository that will cover not only
news from the project, but also related news from around the world; an electronic compass which
will introduce newcomers to the field of musical neurocognition with links to relevant tutorials,
papers, researchers and projects; a special section offering non-technical explanations of the goals
and achievements of the project.
A public workshop will be organised and held at the end of the project. This will allow us to
present the work of this project to other scientists in the field. We will also seek out and
communicate with other parties who we identify as having a potential interest in taking advantage
of our findings; such as healthcare professionals, interested in developing hearing screening
programmes or improvements to prosthetic devices (hearing aids, cochlear implants, etc);
technologists interested in exploiting this work in developing commercial music systems, or more
generic applications which require intelligent autonomous or interactive behaviour; and
educationalists who see possible applications in musical education or in extending the public
awareness of science. In addition, the proceedings of the workshop will be published as a book in
order to disseminate the work on this project as widely as possible.
Deliverables
D7.1 Establish the project web-site and populate it with information available initially (FUPF,
month 3).
D7.2 Produce version 1 of the multimedia promotional kit (FUPF, month 6).
D7.3 Initial prototype for the generic computational architecture for intelligent perception (month
12)
D7.4 Updates to prototype for the generic computational architecture for intelligent perception and
multimedia kit (month 24)
D7.5 Compilation of the theoretical insights into music cognition (month 36)
D7.6 Definition of a generic model for intelligent perception and cognition (month 36)
D7.7 Organisation of scientific workshop, and publication of proceedings (month 36)
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 64 of 101
UvA
2
Objectives
1. Build a music processing system to study the development of internal musical codes,
music expectancies and phenomena of music cognition such as music categorization,
similarity ranking and streaming. The system will be able to synthesize, as musical output,
the expectancies generated after processing short musical excerpts provided as input.
2. Investigate the performance of the music processing system experimentally, considering
such aspects as attention, categorization, similarity, and stream segregation.
3. Integrate, into the music processing system, enhanced algorithms emanting from the work
of other project partners.
Description of work
Introduction
The methodology underlying most applications of music technology, such as music content
processing through the automatic analysis and description of musical sounds, or musically
expressive synthesis and voice performance enhancement, depends upon analysis algorithms
which exploit domain-specific knowledge. These algorithms are generally designed without
strong constraints on their neurobiological plausibility and aim to maximise computational
performance without explicitly emulating any known perceptual or cognitive processes. The
focus in this project is rather different in that we are interested in understanding the mental
processes underlying music cognition, e.g. the factors which determine the creation of contextual
models, the control of attention in time, and experience-dependent developmental processes; all
of which are necessary to support the emergence of autonomous intelligent behaviour. The
success of interactive music improvisation depends to a large extent on consistency between the
predictive models created by each participant, and therefore the degree to which they can predict
each others behaviour. While the creation of a fully-fledged artificial music improviser is some
way off, we will address the fundamental problems of creating and maintaining predictive models
of the acoustic environment, and the role of developmental experience in shaping these models.
Methodology
We propose to devise and implement a music analysis prototype that initially has minimal
hard-wired musical knowledge (the initial capabilities will be informed by the experiments
conducted in WP1 and WP2). The system, which we have tentatively nicknamed the Music
Projector, will take music data processed by a low-level acoustic analysis front-end and will
elaborate a multi-faceted description of its content, including symbolic and sub-symbolic
representations. The idea is that the system will learn useful abstractions by forming predictive
models of the musical input, i.e. the abstractions that allow it to make accurate predictions. In this
way, the system will develop representations that can be interpreted as music expectancies, which
will be translated into audible music by means of synthesis, thereby providing a kind of
interactive behaviour to the system. The fundamental goal is to include the ability for self-
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 65 of 101
organization in the implementation: i.e. the system will find useful features and ways for
processing them by being immersed in a continuous musical environment. We also envisage
that such a system could bootstrap itself by forming a hierarchy of increasingly higher-level
concepts.
The basic starting point for the system will need to contain some inbuilt functionality. Our
brains generally exhibit innate properties that constrain the range of feasible processing strategies
and representations. Since the goal of the Music Projector to emulate human perception at the
functional or algorithmic level, some of those architectural constraints will also be included. In
addition, it is well known that the human brain is particularly malleable during early
development. This means that early experience may affect the tuning of representations and
processes, facilitating or making more difficult the development of certain knowledge structures,
or the processing of specific types of stimuli (e.g. our exposure to occidental music makes it
difficult for us to understand Indian or Chinese music). The early development of representational
structures is considered in some detail in WP4, and this work will help to guide developments in
the Music Projector. In summary, the goal of the system is to study the effects of different
patterns of exposure to music on the internal high-level representations they generate, and the
approximate replication of music perceptual and cognitive phenomena. The system will detect
regularities across several perceptual dimensions, and organize its internal representations in
order to account for them.
The architecture of the system will include:
An acoustic front-end that exhibits an acceptable degree of perceptual plausibility. This
will be the same front-end used in WP3, 4, 5 & 6.
Simple detectors such as those that have been found or hypothesized in the auditory
pathway and cortex. For example, detectors of noise, continuity, change, harmonicity,
correlation across channels, etc.
Specific detectors for pitch and temporal information, although these will be refined later
in response to work in WP5 and WP6.
One or more memory systems with a resonance mechanism that is capable of maintaining
a fading trace of the input for some time, for long-term storage, and also for generating
expectancies.
One or more learning components that make possible auto-association and also learning
by external teaching.
A stream generation component that makes it possible to decompose a complex
combination of stimuli into a series of simpler auditory streams.
An attentional component that makes it possible to change the saliency of specific musical
dimensions or of specific streams, according to the ongoing outputs of the analysis
processes and the goals given to the system (if any).
A categorization or chunking mechanism to associate complex streams and sub-streams
with simpler representations that may act as labels for them.
A music generation component that is capable of synthesizing musical sounds that can act
as an auditory feed back of the expectancies generated by the system.
Plan of work
The work to be undertaken will be organized around the following tasks:
T1. Reviews of existing literature and software components. The review will focus on: (1)
acoustic front ends, (2) auditory memory and cognition processes, (3) music similarity and
saliency, and (4) music streaming. (months 1-12)
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 66 of 101
T2. System development. The development of the music projector is expected to follow four
phases:
a. Elaboration of a mock-up of music analysis system, (based upon the initial
architecture formulated in WP7), used as a proof of concept and as a way to elicit
feedback for further development, but where some functionalities may be absent or
very limited (months 6-12)
b. Elaboration and testing of the first version of music projector (months 13-24)
c. Elaboration and testing of the final version of music projector, incorporating, as
input, theoretical and empirical findings from other partners, and software
developments from them, if feasible; this version will also be interactive (i.e., it
will generate audible music outputs corresponding to music predictions) (months
25-36)
T3. Simulation experiments. (months 12-36) Different simulation experiments are envisioned:
a. Novelty detection
b. Perceptual learning and categorization
c. Similarity
d. Saliency
e. Stream segregation
f. Implicit learning by exposure to different music cultures
Deliverables
D8.1.1 Survey and evaluation of existing auditory software components and cognitive
architectures (Report, month 7)
D8.1.2 Overview of saliency in music processing (Report, month 12)
D8.2.1 Mock-up of music analysis system (Software prototype, Month 10)
D8.2.2 First version of music projection system (Software prototype, Month 18)
D8.3.3 Music Projector incorporating other partners innovative contributions (Software
prototype, Month 24)
D8.3.4 Final Music Projector (Software prototype, Month 36)
D8.4.1 First results from experimental simulations (Report, Month 30)
D8.4.2 Results from advanced experimental simulations (Report, Month 36)
D8.5 Computational principles relevant to a general model of music cognition (Report, Month36)
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 67 of 101
UvA
1
Objectives
1. Ensure effective project coordination and administration.
2. Facilitate effective collaboration, integration, and communication.
Description of work
Project coordination
Liaise with and prepare and submit reports and deliverables to the Commission. Inform the
Commission about any circumstances that may alter project goals, and negotiate changes in goals.
Receive and disseminate deliverables to and between workpackage leaders, while monitoring their
quality and timeliness. Ensure efficient management of tasks and the establishment of effective
communications between members of the consortium. Mediate and manage in the event of conflict
arising between partners. Monitor risk elements and identify problems or delays; take appropriate
actions and adjust manpower assignment, if necessary. Monitor milestone decision points.
Coordinate the consortium's representation at major meetings that are likely to generate useful
feedback. Identify any developments outside the collaboration that may impact the project.
Coordinate joint publications, ensure consistent quality standards and that authorships conform to
consortium agreement. Organise and coordinate bi-annual meetings of the Project Steering
Committee and irregular meetings of smaller groups; together with workpackage leaders set the
agenda for consortium meetings; record, circulate and agree the minutes for each meeting.
Administrative coordination
Monitor budget and manpower use of all partners. Receive payments from the Commission
and transfer to consortium members according to the agreed budget. Liaise with workpackage
coordinators to ensure consistency of project progress and to assemble six monthly progress
reports, mid-term reports and the final report. Coordinate travel to consortium and international
meetings. Coordinate organisation of final workshop, including venue, invitations to participants,
publicity and subsequent publication of proceedings. Coordinate contractual issues such as
amendments to the project contract, collaboration agreements and audit certificates.
Ensuring integration, collaboration and scientific and technological progress
Monitor and report on workpackage progress in relationship to the agreed project plan,
paying particular attention to project timescales and integration points. Report problems and
slippages promptly and plan remedial actions. Ensure the timely and accurate communication of
experimental results and modelling advances to all members of the consortium. Ensure that the
objectives and milestones of each workpackage is achieved, and that deliverables are available on
time.
Deliverables
D0.1 Progress report, year 1 (month 12)
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 68 of 101
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 69 of 101
Research/innovation
activities
WP1
WP2
WP3
WP4
WP5
WP6
WP7
WP8
Total
research/innovation
Management
activities
WP0
Total management
TOTAL
ACTIVITIES
UOP
FUPF
MTAPI
UVA
TOTAL
PARTNERS
2
36
6
36
0
48
10
5
143
2
2
60
2
2
2
12
61
143
120
0
0
0
0
0
4
0
124
1
0
2
0
52
0
4
2
61
125
38
68
38
54
50
30
68
471
22
22
1
1
1
1
1
1
25
25
165
144
125
62
496
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 70 of 101
Two high performance PCs for theoretical modelling and simulation at 4,000
each for each of the researchers, 2,000 for Audio/MIDI hardware.
Consumables:
UoP: 7,000
To cover services such as communication, publishing, photocopying and
printing, and equipment maintenance. Software licenses and media storage
needed during the lifespan of the project.
FUPF: 7,000
To cover services such as communication, publishing, photocopying and
printing, and equipment maintenance. Software licenses and media storage
needed during the lifespan of the project.
UvA: 7,000
To cover services such as communication, publishing, photocopying and
printing, and equipment maintenance. Software licenses and media storage
needed during the lifespan of the project.
MTAPI: 28,000
Electrode caps (one adult, one baby 3,000), consumables for EEG
experiments (disposable electrodes, electrode paste, cleaning materials,
headphone inserts, etc. 8,000), computer-related consumables,
communication, publishing, photocopying, printing costs and equipment
maintenance (10,000). Software licenses and media storage needed during the
lifespan of the project (4,000).
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 73 of 101
Other
UoP: 20,400
For organisation of the final public workshop on music cognition, and the
printing of a collection of papers, 12,000. For share of final project
publication, plus the groups project publications (3,000); conference/seminars
fees (6 researchers in the project at an average of 1 annual conference;
attendance at 18 conference/seminars @ 300 per conference = 5,400 in
total).
FUPF: 12,000
Establish and maintain web site (3,600). For share of final project publication,
plus the groups project publications (3,000); conference/seminars fees (6
researchers in the project at an average of 1 annual conference; attendance at
18 conference/seminars @ 300 per conference = 5,400 in total).
MTAPI: 5,200
For share of final project publication, plus the groups project publications
(2,500); conference/seminars fees (3 researchers in the project at an average
of 1 annual conference; attendance at 9 conference/seminars @ 300 per
conference = 2,700 in total).
UvA: 5,200
For share of final project publication, plus the groups project publications
(2,500); conference/seminars fees (3 researchers in the project at an average
of 1 annual conference; attendance at 9 conference/seminars @ 300 per
conference = 2,700 in total).
Management activities
UoP: 51,451
An administrative coordinator will be employed for 2 days per week at a cost
of 36,376 to assist with project coordination. Other costs requested include:
(subcontracted) audit costs at 6,000, coordination consumables at 1,500, and
overhead 7,575.
Total: 15 man-months
FUPF: 14,050
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 74 of 101
9. Ethical issues
WP1: Higher level auditory functions underlying music perception: Innate vs. learned
operations
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 75 of 101
Prior to the experiment, written consent will be obtained from the subject (adult
subjects) or the parent (neonates) after the goal, procedures, and potential risks of the
experiment, their rights, and data management issues are explained them in detail. The
information will consist of a written part (included in the signed consent form) and
consultation with the experimenter and, in case of the babies, with the physician. The parent
(or both parents) will be present at the experiment conducted on newborns. The subject (or the
parent present) can terminate the experiment at any point without the need to give a reason for
the termination.
Subjects will only be refused from participation if they do not meet the pre-set health or
age criteria (i.e., not suffering from neurological diseases or having hearing problems, 18-30
years for adults, and full-term birth for newborns). No gender or race criteria will be used,
though we will aim at an approximate balance between genders. (Previous research found no
sex- or race-related differences regarding the method used in the experiments.)
Experimenters (both for adults and for neonates) will receive training regarding all
ethical issues (subjects rights, data management, etc.) as well as about the optimal ways to
communicate with subjects.
Data management
Information that allows subject identification will be treated according the privacy act
of Hungary (i.e., not disclosed to anyone outside the research team directly involved in the
experiments and kept in safe records only for the time required by the evaluation of the results
and retention of records for the dissemination of the results). The experimental results, which
do not allow subject identification (since we do not collect genetic material or other
identifiable biological information), will be disseminated within the scientific community.
Subjects (or their parents, for newborns) will be given the possibility to learn the results of the
experiment conducted on them (or their child).
Populations possibly benefiting from the results of the research
Beyond the benefits for basic science, results of the research on newborns may be later
applied to a) develop new screening methods for hearing deficits in newborns, b) provide
early corrective measures for hearing deficits, and c) monitor the effects of such corrective
measures. At this point, we see no gender or race issues involved; however, should a
normative database be set up on the basis of our research, these issues will have to be
considered.
WP2: Perception of music form
The work package includes perceptual experiments in young healthy adult subjects. In
conducting the experiments we will strictly adhere to the applicable national, EU-wide, and
international laws, treaties, and ethical guidelines.
Subject recruitment and rights
All subjects will volunteer for the experiments. They will be recruited from the student
population and they will be paid an hourly fee for their participation. Prior to the experiment,
written consent will be obtained from the subject after the goal, procedures, and potential
risks of the experiment, their rights, and data management issues are explained them in detail.
The information will consist of a written part (included in the signed consent form) and
consultation with the experimenter. The subject can terminate the experiment at any point
without the need to give a reason for the termination.
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 76 of 101
Subjects will only be refused from participation if they do not meet the pre-set health or
age criteria (i.e., not suffering from neurological diseases or having hearing problems, 18-30
years of age). No gender or race criteria will be used, though we will aim at an approximate
balance between genders.
Experimenters will receive training regarding all ethical issues (subjects rights, data
management, etc.) as well as about the optimal ways to communicate with subjects.
Data management
Information that allows subject identification will be treated according the privacy act
of the U.K. (i.e., not disclosed to anyone outside the research team directly involved in the
experiments and kept in safe records only for the time required by the evaluation of the results
and retention of records for the dissemination of the results). The experimental results, which
do not allow subject identification, will be disseminated within the scientific community.
Subjects will be given the possibility to learn the results of the experiment conducted on them.
Populations possibly benefiting from the results of the research
Beyond the benefits for basic science, results of the research will be useful to those
interested in language learning deficits in suggesting new ways in which music could enhance
language learning.
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 77 of 101
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 78 of 101
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 79 of 101
Dr Denham will be responsible for overall coordination of the EmCAP project. She will
also be responsible for WP6 (Active perception, relative pitch and the emergence of tonality),
which will primarily involve theoretical work and computational modelling. She will
coordinate activities in WP7 (Theoretical insights into music cognition), principally the
organisation of the workshop and subsequent publication of a collection of theoretical papers
derived from the project. Dr Denham will devote 33% of her time to this project. In addition,
one postdoctoral fellow and half-post research assistant will be appointed. The postdoctoral
fellow will be responsible for formulating and developing a neurobiologically realistic model
of relative pitch perception and collaborative work with researchers in WP3 to include the
model of working memory in this process. The research assistant will carry out investigations
into active peripheral processing.
Relevant publications
Denham, S.L. (2005). "Dynamic Iterated Ripple Noise: further evidence for the importance
of temporal processing in auditory perception", BioSystems, 79(1-3),199-206.
Khurshid A, Denham SL (2004). "A Temporal Analysis Based Pitch Estimation System
for Noisy Speech with a Comparative Study of Performance of Recent Systems"
IEEE Transactions on Neural Networks, Vol. 15(5), 1112-1124.
Lanyon L J, Denham SL (2004) A model of object-based attention that guides active visual
search to behaviourally relevant locations. Lecture Notes in Computer Science,
Paletta L et al. (eds), Vol. 3368, 42-56.
Lanyon L J, Denham SL (2004). "A model of active visual search with object-based
attention guiding scan paths". Neural Networks, Vol. 17(5-6), 873-897
Lanyon L J, Denham SL (2004). "A biased competition computational model of spatial and
object-based attention mediating active visual search". Neurocomputing, Vol. 5860C, 655-662.
Denham SL (2003).Perception of the direction of frequency sweeps in moving ripple
noise stimuli, in Plasticity of the Central Auditory System and Processing of
Complex Acoustic Signals: Merzenich M, Syka S (eds.) Kluwer Plenum, New York,
273-278.
Packham ISJ & Denham SL (2003), "Visualisation Methods for Supporting the
Exploration of High Dimensional Problem Spaces in Engineering
Design", Proceedings of International Conference on Coordinated & Multiple Views
in Exploratory Visualization (CMV2003), Roberts J. (ed.), London, UK, 15 July
2003, IEEE Computer Society, pp. 2-13.
Denham SL (2001). "Cortical synaptic depression and auditory perception". In
Computational Models of Auditory Function, Greenberg S, Slaney M (ed.s), NATO
ASI Series, IOS Press, Amsterdam, 281-296.
Denham SL, Denham MJ (2001). "An investigation into the role of cortical synaptic
depression in auditory processing". In Emergent Neural Computational Architectures
based on Neuroscience, Wermter S, Austin J, Willshaw D (ed.s), Lecture Notes in
Artificial Intelligence, Springer, 494-506.
Borisyuk R, Denham MJ, Denham SL, Hoppensteadt F (1999). Computational models of
predictive and memory-related functions of the hippocampus, Reviews in the
Neurosciences, 10, 213-232.
McCabe SL, Denham MJ (1997). "A model of auditory streaming", J. Acoust. Soc. Am.,
101(3), 1611-1621.
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 80 of 101
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 81 of 101
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 82 of 101
Miranda ER, Kirby S and Todd P (2003). On Computational Models of the Evolution of
Music: From the Origins of Musical Taste to the Emergence of Grammars.
Contemporary Music Review, Vol. 22, No. 3, pp. 91-111.
Westerman G & Miranda ER (2003). Modelling the Development of Mirror Neurons for
Auditory-Motor Integration. Journal of New Music Research, Vol. 31, No. 4, pp.
367-375.
Miranda ER (2003). On the evolution of music in a society of self-taught digital creatures.
Digital Creativity, Vol. 14, No. 1, pp. 29-42.
Miranda ER (2003). On the Music of Emergent Behaviour: What can Evolutionary
Computation Bring to the Musician?, Leonardo, Vol. 36, No. 1, pp. 55-58.
Westerman G & Miranda ER (2002). Integrating Perception and Production in a Neural
Network Model, J. A. Bullinaria and W. Lowe (Eds.), Connectionist Models of
Cognition and Perception, Progress in Neural Processing Vol. 14. London: World
Scientific.
Miranda ER (2002). Emergent Sound Repertoires in Virtual Societies. Computer Music
Journal, Vol. 26, No. 2, pp. 77-90.
Miranda ER (2002). Mimetic Development of Intonation. In C. Anagnostopoulou, M.
Ferrand and A. Smaill (Eds.), Music and Artificial Intelligence, Lecture Notes in
Computer Science (LNAI 2445), pp. 107-118. Berlin: Springer Verlag.
Miranda ER (2002). Generating Source Streams for Extralinguistic Utterances. Journal of
the Audio Engineering Society (AES), Vol. 50, No. 3, pp. 165-172.
Miranda ER (2001). Automatic Sound Identification based on Prosodic Listening.
Proceedings of the 17th International Congress on Acoustics, Rome, Italy. Rome
(Italy): ICA.
Miranda ER (2001). Synthesising Prosody with Variable Resolution. Proceedings of the
110th Audio Engineering Society Convention, Amsterdam, The Netherlands. New
York (NY): AES.
Miranda ER (2001). Improved Synthesis of Ultra-Linguistic Utterances. SONY Research
Forum Technical Digests, Tokyo, Japan. Tokyo (Japan): SONY Corporation.
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 84 of 101
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 85 of 101
Deco G & Rolls E (2002). Attention and Working Memory: A Dynamical Model of
Neuronal Activity in the Prefrontal Cortex. European Journal of Neuroscience, 18,
2374-2390.
Corchs S & Deco G (2002). Large-scale Neural Model for Visual Attention: Integration of
Experimental Single Cell and fMRI Data. Cerebral Cortex, 12, 339-348.
Deco G & Rolls E (2002). Object-Based Visual Neglect: A Computational Hypothesis.
European Journal of Neuroscience, 16, 1994-2000.
Deco G, Pollatos O, Zihl J (2002). The Time Course of Selective Visual Attention: Theory
and Experiments. Vision Research, 42, 2925-2945
Deco G & Zihl J (2001). A Neurodynamical Model of Visual Attention: Feedback
Enhancement of Spatial Resolution in a Hierarchical System. Computational
Neuroscience, 10, 231-251.
Deco G & Zihl J (2001). Top-down Selective Visual Attention: A Neurodynamical
Approach. Visual Cognition, 8, 119-140.
Deco G & Schrmann B (1997). Information Transmission and Temporal Code in Central
Spiking Neurons. Physical Review Letters, 79, 4697-4700.
Participant 5: Prof Xavier Serra
Professor Dr Xavier Serra is the head of the Music Technology Group, Director of the
Audiovisual Institute (IUA) and Director of the Department of Technology of the Pompeu
Fabra University (UPF) in Barcelona, where he has been Professor since 1994. He holds a
Masters degree in Music from Florida State University (1983), a Ph.D. in Computer Music
from Stanford University (1989) and worked for two years as Chief Engineer in Yamaha
Music Technologies USA, Inc. His research interests are in sound analysis and synthesis for
music and other multimedia applications. Specifically, he is working with spectral models and
their application to synthesis, processing and high quality coding, as well as other music
related problems such as: sound source separation, performance analysis and content-based
retrieval of audio.
Dr. Serra is editor for a number of international journals, reviewer for several
international conferences and for the 6th framework program of the European Commission,
member of a number of professional organisations and he is often invited at conferences and
workshops as a guest speaker. He is the principal investigator of more than 10 major research
projects funded by the European Commission and other public and private institutions. He has
more than 30 patents, most of them submitted in Japan and the USA, he has published more
than 30 articles in international journals and proceedings of conferences and he has
contributed to several books. Dr. Serra also maintains an active music activity by playing the
cello and teaching at the Escola Superior de Msica de Catalunya (ESMUC), where he is the
head of the Department of Sonology.
Current international activities of Dr. Serra include being the coordinator of the
European project SIMAC (Semantic Interaction with Music Audio Contents), chair of the
ISMIR 2004 (5th International Conference on Music Information Retrieval) and research
chair of the ICMC 2005 (International Computer Music Conference).
Role in present project
Dr. Serra will be responsible for WP8 (Interactive music system), which will involve
the development of a music processing system (the music projector) to study the development
of internal musical codes, music expectancies and music cognition phenomena such as music
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 86 of 101
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 89 of 101
Winkler, I., Karmos, G., & Ntnen, R. (1996). Adaptive modeling of the unattended
acoustic environment reflected in the mismatch negativity event-related potential.
Brain Research, 742, 239-252.
Winkler, I., Kushnerenko, E., Horvth, J., eponien, R., Fellman, V., Huotilainen, M.,
Ntnen, R., & Sussman, E. (2003). Newborn infants can organize the auditory
world. Proceedings of the National Academy of Sciences USA, 100, 1182-1185.
Winkler, I., Schrger, E., & Cowan, N. (2001). The role of large-scale perceptual
organization in the mismatch negativity event-related brain potential. Journal of
Cognitive Neuroscience, 13, 59-71.
Winkler, I., Sussman, E., Tervaniemi, M., Ritter, W., Horvth J., & Ntnen, R. (2003).
Pre-attentive auditory context effects. Cognitive, Affective, & Behavioral
Neuroscience, 3 (1), 57-77.
Winkler, I., Tervaniemi, M., & Ntnen, R. (1997). Two separate codes for missing
fundamental pitch in the auditory cortex. Journal of the Acoustical Society of
America, 102, 1072-1082.
van Zuijen, T.L., Sussman, E., Winkler, I., Ntnen, R., & Tervaniemi, M. (2004).
Pre-attentive grouping of sequential sounds - an event-related potential study
comparing musicians and non-musicians. Journal of Cognitive Neuroscience, 16,
331-338.
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 90 of 101
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 91 of 101
Honing H (2001). From time to time: The representation of timing and tempo. Computer
Music Journal, 35(3), 50-61.
Desain P & Honing H (2003). The formation of rhythmic categories and metric priming.
Perception, 32(3), 341-365.
Timmers R & Honing H (2002). On music performance, theories, measurement and
diversity. In M.A. Belardinelli (ed.). Cognitive Processing (International Quarterly
of Cognitive Sciences), 1-2, 1-19.
Desain P, Honing H, van Thienen H & Windsor WL (1998.) Computational Modeling of
Music Cognition: Problem or Solution?. Music Perception, 16(1),151-166.
Desain P & Honing H (1998). A reply to S. W. Smoliar's "Modelling Musical Perception:
A Critical View". N. Griffith, & P. Todd (eds.), Musical Networks, Parallel
Distributed Perception and Performance, 111-114.
A.2 Sub-contracting
Each partner will obtain the required audit certificates through sub-contracts agreed
with recognised local auditors.
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 92 of 101
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
Middlebrooks, J.C., The acquisitive auditory cortex. Nat Neurosci, 2003. 6(11): p.
1122-3.
Fritz, J., et al., Rapid task-related plasticity of spectrotemporal receptive fields in
primary auditory cortex. Nat Neurosci, 2003. 6(11): p. 1216-23.
Drake, C. and D. Bertrand, The quest for universals in temporal processing in music.
Annals of the New York Academy of Sciences, 2001. 930: p. 17-27.
Trehub, S., Human Processing predispositions and musical universals. N.L. Wallin,
B. Merker, & S. Brown (Eds.), The origins of music (pp. 427-448). Cambridge
Massachusetts: The MIT press., 2000.
Imberty, M., The question of innate competencies in musical communication, in The
Origins of Music, N.L. Wallin, B. Merker, and S. Brown, Editors. 2000, The MIT
Press: Cambridge, MA. p. 449-462.
Picton, T.W., et al., Mismatch negativity: different water in the same river. Audiol
Neurootol, 2000. 5(3-4): p. 111-39.
Ntnen, R. and K. Alho, Mismatch negativity--the measure for central sound
representation accuracy. Audiol Neurootol, 1997. 2(5): p. 341-53.
Winkler, I., Change detection in complex auditory environment: beyond the oddball
paradigm, in Detection of change: Event-related potential and fMRI findings, J.
Polich, Editor. 2003, Kluwer Academic Publishers: Boston. p. 61-81.
Paavilainen, P., et al., Preattentive extraction of abstract feature conjunctions from
auditory stimulation as reflected by the mismatch negativity (MMN).
Psychophysiology, 2001. 38(2): p. 359-65.
Winkler, I., G. Karmos, and R. Ntnen, Adaptive modeling of the unattended
acoustic environment reflected in the mismatch negativity event-related potential.
Brain Res, 1996. 742(1-2): p. 239-52.
Ntnen, R. and I. Winkler, The concept of auditory stimulus representation in
cognitive neuroscience. Psychol Bull, 1999. 125(6): p. 826-59.
Ntnen, R., The role of attention in auditory information processing as revealed by
event-related potentials and other brain measures of cognitive function. Behavioral
and Brain Sciences, 1990. 13: p. 201-288.
Sussman, E., I. Winkler, and W.J. Wang, MMN and attention: Competition for
deviance detection. Psychophysiology, 2003. 40: p. 430-435.
Alho, K., et al., Event-related brain potential of human newborns to pitch change of
an acoustic stimulus. Electroencephalogr Clin Neurophysiol, 1990. 77(2): p. 151-5.
Kurtzberg, D., et al., Developmental studies and clinical application of mismatch
negativity: problems and prospects. Ear Hear, 1995. 16(1): p. 105-17.
Leppnen, P.H.T., K.M. Eklund, and H. Lyytinen, Event-related brain potentials to
change in rapidly presented acoustic stimuli in newborns. Developmental
Neuropsychology, 1997. 13: p. 175-204.
Winkler, I., et al., Newborn infants can organize the auditory world. Proc Natl Acad
Sci U S A, 2003. 100: p. 1182-1185.
Paavilainen, P., et al., Neuronal populations in the human brain extracting invariant
relationships from acoustic variance. Neurosci Lett, 1999. 265(3): p. 179-82.
Imada, T., et al. Mismatch fields evoked by a rhythm passage. in The 9th International
Conference on Biomagnetism. 1993. Vienna.
Nordby, H., W.T. Roth, and A. Pfefferbaum, Event?related potentials to time?deviant
and pitch?deviant tones. Psychophysiology, 1988. 25: p. 249?261.
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 93 of 101
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
Winkler, I. and E. Schrger, Neural representation for the temporal structure of sound
patterns. Neuroreport, 1995. 6: p. 690?694.
Sussman, E., W. Ritter, and H.G. Vaughan, Jr., Predictability of stimulus deviance and
the mismatch negativity. Neuroreport, 1998. 9(18): p. 4167-70.
Ntnen, R., et al., Development of a memory trace for a complex sound in the
human brain. Neuroreport, 1993. 4: p. 503-506.
Kraus, N., et al., Neurophysiologic bases of speech discrimination. Ear Hear, 1995.
16(1): p. 19-37.
Brattico, E., R. Ntnen, and M. Tervaniemi, Context effects on pitch perception in
musicians and non-musicians: Evidence from ERP recordings. Music Perception,
2002. 19: p. 1-24.
Koelsch, S., E. Schrger, and M. Tervaniemi, Superior attentive and pre-attentive
auditory processing in musicians. NeuroReport, 1999. 10: p. 1309-1313.
van Zuijen, T.L., et al., Grouping of sequential sounds--an event-related potential
study comparing musicians and nonmusicians. J Cogn Neurosci, 2004. 16(2): p. 3318.
Schwartz, D.A., C.Q. Howe, and D. Purves, The statistical structure of human speech
sounds predicts musical universals. J Neurosci, 2003. 23(18): p. 7160-8.
Schwartz, D.A. and D. Purves, Pitch is determined by naturally occurring periodic
sounds. Hearing Research, 2004. 194(1-2): p. 31-46.
Warrier, C.M. and R.J. Zatorre, Influence of tonal context and timbral variation on
perception of pitch. Percept Psychophys, 2002. 64(2): p. 198-207.
Riotte, A., Quelques reflexions sur la controle formel du timbre, in Timbre:
Metaphore pour la Composition, J.-B. Barriere, Editor. 1991, Christian Bourgois:
Paris.
Smalley, D., Spectromorphology: Explaining sound-shapes. Organised Sound, 1997.
2(2): p. 107-126.
Lerdahl, F., Les hierarchies de timbres, in Timbre: Metaphore pour la Composition,
J.-B. Barriere, Editor. 1991, Christian Bourgois: Paris.
Peretz, I. and R.J. Zatorre, The cognitive neuroscience of music. Oxford University
Press, Oxford, 2003.
Rolls, E.T. and G. Deco, Computational neuroscience of vision. 2002, Oxford: Oxford
University Press.
Husain, F.T., et al., Relating neuronal dynamics for auditory object processing to
neuroimaging activity: a computational modeling and an fMRI study. Neuroimage,
2004. 21(4): p. 1701-20.
Deco, G. and J. Zihl, A neurodynamical model of visual attention: feedback
enhancement of spatial resolution in a hierarchical system. J Comput Neurosci, 2001.
10(3): p. 231-53.
Deco, G. and T.S. Lee, A unified model of spatial and object attention based on intercortical biased competition. Neurocomputing, 2002. 44-46: p. 775-781.
Corchs, S. and G. Deco, Feature-based attention in human visual cortex: simulation of
fMRI data. Neuroimage, 2004. 21(1): p. 36-45.
Deco, G. and E.T. Rolls, Attention and working memory: a dynamical model of
neuronal activity in the prefrontal cortex. Eur J Neurosci, 2003. 18(8): p. 2374-90.
Deco, G., E.T. Rolls, and B. Horwitz, "What" and "where" in visual working memory:
a computational neurodynamical perspective for integrating FMRI and single-neuron
data. J Cogn Neurosci, 2004. 16(4): p. 683-701.
Deco, G. and E.T. Rolls, A neurodynamical cortical model of visual attention and
invariant object recognition. Vision Res, 2004. 44(6): p. 621-42.
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 94 of 101
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63.
64.
65.
Corchs, S. and G. Deco, Large-scale neural model for visual attention: integration of
experimental single-cell and fMRI data. Cereb Cortex, 2002. 12(4): p. 339-48.
Deco, G. and E.T. Rolls, Object-based visual neglect: a computational hypothesis. Eur
J Neurosci, 2002. 16(10): p. 1994-2000.
Deco, G., O. Pollatos, and J. Zihl, The time course of selective visual attention: theory
and experiments. Vision Res, 2002. 42(27): p. 2925-45.
Heinke, D., et al., A computational neuroscience account of visual neglect.
Neurocomputing, 2002. 44-46: p. 811-816.
Brunel, N. and X.J. Wang, Effects of neuromodulation in a cortical network model of
object working memory dominated by recurrent inhibition. J Comput Neurosci, 2001.
11(1): p. 63-85.
Hodgkin, A.L. and A.F. Huxley, A quantitative description of membrane current and
its application to conduction and excitation in nerve. J Physiol, 1952. 117(4): p. 50044.
Szabo, M., et al., Cooperation and biased competition model can explain attentional
filtering in the prefrontal cortex. Eur J Neurosci, 2004. 19(7): p. 1969-77.
Meddis, R. and L.P. O'Mard, DSAM: Development System for Auditory Modelling.
Centre for the neural basis for hearing, Essex University: p.
http://www.essex.ac.uk/psychology/hearinglab/dsam.
Bregman, A.S., Auditory Scene Analysis. MIT Press, 1990.
Carlyon, R.P., et al., Effects of attention and unilateral neglect on auditory stream
segregation. J Exp Psychol Hum Percept Perform, 2001. 27(1): p. 115-27.
McCabe, S.L. and M. Denham, A model of auditory streaming. J Acoust Soc Am,
1997. 101(3): p. 1611-1621.
Wrigley, S. and G. Brown, A neural oscillator model of auditory attention. Lecture
notes in computer science, 2001: p. 1163-1170.
Wrigley, S. and G. Brown, A neural oscillator model for auditory selective attention,
in Advances in neural information processing systems 14, T.G. Dietterich, S. Becker,
and Z. Ghahramani, Editors. 2002, MIT Press.
Beauvois, M.W. and R. Meddis, A computer model of auditory stream segregation.
Quaterly Journal of Experimental Psychology, 1991. 43A(3): p. 517-541.
Beauvois, M.W. and R. Meddis, Computer simulation of auditory stream segregation
in alternating-tone sequences. J Acoust Soc Am, 1996. 99(4 Pt 1): p. 2270-80.
Moran, J. and R. Desimone, Selective attention gates visual processing in the
extrastriate cortex. Science, 1985. 229(4715): p. 782-4.
Chelazzi, L., et al., A neural basis for visual search in inferior temporal cortex.
Nature, 1993. 363(6427): p. 345-7.
Chelazzi, L., Serial attention mechanisms in visual search: a critical look at the
evidence. Psychol Res, 1999. 62(2-3): p. 195-219.
Duncan, J., Cooperating brain systems in selective perception and action, in Attention
and Performance XVI, T. Inui and J.L. McClelland, Editors. 1996, MIT Press:
Cambridge MA. p. 433-458.
Reynolds, J.H. and R. Desimone, The role of neural mechanisms of attention in
solving the binding problem. Neuron, 1999. 24(1): p. 19-29, 111-25.
Spangler, K.M. and W.B. Warr, The descending auditory system. Neurobiology or
Hearing: The central auditory system, Altschuler et al (ed.s) Raven Press., 1991.
Suga, N., et al., The corticofugal system for hearing: recent progress. Proc Natl Acad
Sci U S A, 2000. 97(22): p. 11807-14.
Maison, S., C. Micheyl, and L. Collet, Influence of focused auditory attention on
cochlear activity in humans. Psychophysiology, 2001. 38(1): p. 35-40.
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 95 of 101
66.
67.
68.
69.
70.
71.
72.
73.
74.
75.
76.
77.
78.
79.
80.
81.
82.
83.
84.
85.
86.
87.
88.
Kowalski, N., D.A. Depireux, and S.A. Shamma, Analysis of dynamic spectra in ferret
primary auditory cortex. I. Characteristics of single-unit responses to moving ripple
spectra. J Neurophysiol, 1996. 76(5): p. 3503-23.
deCharms, R.C., D.T. Blake, and M.M. Merzenich, Optimizing sound features for
cortical neurons. Science, 1998. 280(5368): p. 1439-43.
Klein, D.J., et al., Robust spectrotemporal reverse correlation for the auditory system:
optimizing stimulus design. J Comput Neurosci, 2000. 9(1): p. 85-111.
Miller, L.M., et al., Spectrotemporal receptive fields in the lemniscal auditory
thalamus and cortex. J Neurophysiol, 2002. 87(1): p. 516-27.
Linden, J.F., et al., Spectrotemporal structure of receptive fields in areas AI and AAF
of mouse auditory cortex. J Neurophysiol, 2003. 90(4): p. 2660-75.
Machens, C.K., M.S. Wehr, and A.M. Zador, Linearity of cortical receptive fields
measured with natural sounds. J Neurosci, 2004. 24(5): p. 1089-100.
Zhang, L.I., S. Bao, and M.M. Merzenich, Persistent and specific influences of early
acoustic environments on primary auditory cortex. Nat Neurosci, 2001. 4(11): p.
1123-30.
Zhang, L.I., S. Bao, and M.M. Merzenich, Disruption of primary auditory cortex by
synchronous auditory inputs during a critical period. Proc Natl Acad Sci U S A, 2002.
99(4): p. 2309-14.
Chang, E.F. and M.M. Merzenich, Environmental noise retards auditory cortical
development. Science, 2003. 300(5618): p. 498-502.
Coath, M. and S.L. Denham, Robust sound classification through the representation of
similarity using response fields derived from stimuli during early experience.
Biological Cybernetics, 2004. submitted.
Kral, A., et al., Congenital Auditory Deprivation Reduces Synaptic Activity within the
Auditory Cortex in a Layer-specific Manner. Cereb. Cortex, 2000. 10(7): p. 714-726.
Kral, A., et al., Postnatal Cortical Development in Congenital Auditory Deprivation.
Cereb. Cortex, 2004: p. bhh156.
Jones, M.R., Time, our lost dimension. Psychol. Rev., 1976. 83(5): p. 323-55.
Coull, J.T. and A.C. Nobre, Where and when to pay attention: the neural systems for
directing attention to spatial locations and to time intervals as revealed by both PET
and fMRI. J Neurosci, 1998. 18(18): p. 7426-35.
Coull, J.T., et al., Functional anatomy of the attentional modulation of time
estimation. Science, 2004. 303(5663): p. 1506-8.
Jones, M.R., et al., Temporal aspects of stimulus-driven attending in dynamic arrays.
Psychol Sci, 2002. 13(4): p. 313-9.
Michon, J.A. and J.L. Jackson, Time, mind and behaviour. 1985, Berlin: Springer 7.
Desain, P. and H. Honing, Music, mind and machine, studies in computer music,
music cognition and artificial intelligence. 1992, Amsterdam: Thesis Publishers.
Clarke, E.F., Rhythm and timing in music, in Psychology of Music 2nd edition, D.
Deutsch, Editor. 1999, Academic Press: New York. p. 473-500.
Lerdahl, F. and R. Jackendoff, A generative theory of tonal music. 1983, Cambridge
MA: MIT Press.
Povel, D.J. and P. Essens, Perception of temporal patterns. Music Perception, 1985.
2(4): p. 411-440.
Honing, H., From time to time: The representation of timing and tempo. Computer
Music Journal, 2001. 35(3): p. 50-61.
Honing, H., Structure and interpretation of rhythm and timing. Tijdschrift voor
muziektheorie, 2002. 7(3): p. 227-232.
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 96 of 101
89.
90.
91.
92.
93.
94.
95.
96.
97.
98.
99.
100.
101.
102.
103.
104.
105.
106.
107.
108.
109.
110.
Desain, P. and H. Honing, The formation of rhythmic categories and metric priming.
Perception, 2003. 32(3): p. 341-65.
Honing, H., The final ritard: on music, motion and kinematic models. Computer
Music Journal, 2003. 27(3): p. 66-72.
Darwin, C.J. and R.P. Carlyon, Auditory grouping. Handbook of Perception and
Cognition, Volume 6: Hearing, B.C.J. Moore (Ed.), Orlando, Florida: Academic Press,
387-424, 1995.
Warrier, C.M. and R.J. Zatorre, Right temporal cortex is critical for utilization of
melodic contextual cues in a pitch constancy task. Brain, 2004. 127(Pt 7): p. 1616-25.
Tillmann, B. and E. Bigand, Further investigation of harmonic priming in long
contexts using musical timbre as surface marker to control for temporal effects.
Percept Mot Skills, 2004. 98(2): p. 450-8.
Zatorre, R.J., Absolute pitch: a model for understanding the influence of genes and
development on neural and cognitive function. Nat Neurosci, 2003. 6(7): p. 692-5.
Krumhansl, C.L., Rhythm and pitch in music cognition. Psychol Bull., 2000. 126(1): p.
159-79.
Krumhansl, C.L. and E.J. Kessler, Tracing the dynamic changes in perceived tonal
organization in a spatial representation of musical keys. Psychol Rev, 1982. 89(4): p.
334-68.
Tramo, M.J., et al., Neurobiological foundations for the theory of harmony in western
tonal music. Ann N Y Acad Sci, 2001. 930: p. 92-116.
Schwartz, D.A., C.Q. Howe, and D. Purves, The Statistical Structure of Human
Speech Sounds Predicts Musical Universals. J. Neurosci., 2003. 23(18): p. 7160-7168.
Nelken, I., et al., Primary auditory cortex of cats: feature detection or something else?
Biol Cybern, 2003. 89(5): p. 397-406.
Griffiths, T.D., et al., Cortical processing of complex sound: a way forward? Trends
Neurosci, 2004. 27(4): p. 181-5.
Langner, G., Periodicity coding in the auditory system. Hear Res., 1992. 60(2): p. 11542.
Wiegrebe, L. and R. Meddis, The representation of periodic sounds in simulated
sustained chopper units of the ventral cochlear nucleus. J Acoust Soc Am, 2004.
115(3): p. 1207-18.
Langner, G., et al., Frequency and periodicity are represented in orthogonal maps in
the human auditory cortex: evidence from magnetoencephalography. Journal of
Comparative Physiology A: Sensory, Neural, and Behavioral Physiology, 1997.
181(6): p. 665 - 676.
Krumbholz, K., et al., Neuromagnetic evidence for a pitch processing center in
Heschl's gyrus. Cereb Cortex., 2003. 13(7): p. 765-72.
Griffiths, T.D., Functional imaging of pitch analysis. Ann N Y Acad Sci., 2003. 999:
p. 40-9.
Patterson, R.D., et al., The processing of temporal pitch and melody information in
auditory cortex. Neuron, 2002. 36(4): p. 767-76.
Warren, J.D., et al., Separating pitch chroma and pitch height in the human brain.
Proc Natl Acad Sci U S A, 2003. 100(17): p. 10038-42.
Zatorre, R.J., A.C. Evans, and E. Meyer, Neural mechanisms underlying melodic
perception and memory for pitch. J Neurosci, 1994. 14(4): p. 1908-19.
Huron, D., Foundations of Cognitive Musicology. Berkeley, University of California.,
1999. [http://www.music-cog.ohio-state.edu/Music220/Bloch.lectures/].
Juslin, P. and J.E. Sloboda, Music and Emotion: Theory and Research. Oxford
University Press, Oxford, 2001.
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 97 of 101
111.
112.
113.
114.
115.
116.
117.
118.
119.
120.
121.
122.
123.
124.
125.
126.
127.
128.
129.
130.
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 98 of 101
131.
132.
133.
134.
135.
136.
137.
138.
139.
140.
141.
142.
143.
144.
145.
146.
147.
148.
149.
150.
151.
Lewicki, M.S., Efficient coding of natural sounds. Nat Neurosci, 2002. 5(4): p. 35663.
Coath, m. and S.L. Denham, Robust sound classification through the representation of
similarity using response fields derived from stimuli during early experience.
Biological Cybernetics, 2004, submitted.
Davidson, R.J., C.J. Jackson, and C.L. Larson, Human electroencephalography, in
Handbook of Psychophysiology (second edition), J.T. Cacioppo, L.G. Tassinary, and
G.G. Bernston, Editors. 2000, Cambridge University Press: Cambridge. p. 27-52.
Winkler, I., et al., Preattentive auditory context effects. Cogn Affect Behav Neurosci,
2003. 3(1): p. 57-77.
Jacobsen, T., et al., Mismatch negativity to pitch change: varied stimulus proportions
in controlling effects of neural refractoriness on human auditory event-related brain
potentials. Neurosci Lett, 2003. 344(2): p. 79-82.
Messner, A.H., et al., Volunteer-based universal newborn hearing screening program.
Int J Pediatr Otorhinolaryngol, 2001. 60(2): p. 123-30.
Karmel, B.Z., et al., Brain-stem auditory evoked responses as indicators of early brain
insult. Electroencephalogr Clin Neurophysiol, 1988. 71(6): p. 429-42.
Friederici, A.D., M. Friedrich, and C. Weber, Neural manifestation of cognitive and
precognitive mismatch detection in early infancy. Neuroreport, 2002. 13(10): p. 12514.
Sussman, E., et al., Top-down effects can modify the initially stimulus-driven auditory
organization. Brain Res Cogn Brain Res, 2002. 13(3): p. 393-405.
de Boer, E., On the residue and auditory pitch perception, in Handbook of Sensory
Physiology: Vol 3, W.D. Keidel and W.D. Neff, Editors. 1976, Springer: New York. p.
479-583.
Winkler, I., et al., From objective to subjective: pitch representation in the human
auditory cortex. Neuroreport, 1995. 6(17): p. 2317-20.
Winkler, I., M. Tervaniemi, and R. Ntnen, Two separate codes for missingfundamental pitch in the human auditory cortex. J Acoust Soc Am, 1997. 102(2 Pt 1):
p. 1072-82.
Tervaniemi, M., I. Winkler, and R. Ntnen, Pre-attentive categorization of sounds
by timbre as revealed by event-related potentials. Neuroreport, 1997. 8(11): p. 2571-4.
Paavilainen, P., et al., Neuronal populations in the human brain extracting invariant
relationships from acoustic variance. Neurosci Lett, 1999. 265(3): p. 179-82.
Besson, M. and D. Schon, Comparison between language and music. Ann N Y Acad
Sci, 2001. 930: p. 232-58.
Patel, A.D., et al., Processing syntactic relations in language and music: an eventrelated potential study. J Cogn Neurosci, 1998. 10(6): p. 717-33.
Maess, B., et al., Musical syntax is processed in Broca's area: an MEG study. Nat
Neurosci, 2001. 4(5): p. 540-5.
Krumhansl, C.L. and R.N. Shepard, Quantification of the hierarchy or tonal functions
within a diatonic context. Journal of Experimental Psychology, 1979. 5: p. 579-594.
Deutsch, D., Octave generalization of specific interference effects in memory for tonal
pitch. Percept. Psychophys., 1973. 13: p. 271-275.
Lieberman, P., Uniquely Human: The Evolution of Speech, Thought and Selfless
Behaviour. Harvard University Press, Cambridge, MA, 1991.
Nazzi, T., C. Floccia, and J. Bertoncini, Discrimination of pitch contour by neonates.
Infant Behaviour, 1998. 21: p. 543-554.
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 99 of 101
152.
153.
154.
155.
156.
157.
158.
159.
160.
161.
162.
163.
164.
165.
166.
167.
168.
169.
170.
171.
172.
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 100 of 101
173.
174.
175.
176.
177.
178.
179.
180.
181.
182.
183.
013123 (EmCAP) Annex I, vers. 3 (27/05/05) Approved by EC on 1 June 2005 page 101 of 101