Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

The Psychology of Music
The Psychology of Music
The Psychology of Music
Ebook1,622 pages19 hours

The Psychology of Music

Rating: 4 out of 5 stars

4/5

()

Read preview

About this ebook

The Psychology of Music serves as an introduction to an interdisciplinary field in psychology, which focuses on the interpretation of music through mental function. This interpretation leads to the characterization of music through perceiving, remembering, creating, performing, and responding to music. In particular, the book provides an overview of the perception of musical tones by discussing different sound characteristics, like loudness, pitch and timbre, together with interaction between these attributes. It also discusses the effect of computer resources on the psychological study of music through computational modeling. In this way, models of pitch perception, grouping and voice separation, and harmonic analysis were developed. The book further discusses musical development in social and emotional contexts, and it presents ways that music training can enhance the singing ability of an individual. The book can be used as a reference source for perceptual and cognitive psychologists, neuroscientists, and musicians. It can also serve as a textbook for advanced courses in the psychological study of music.
  • Encompasses the way the brain perceives, remembers, creates, and performs music
  • Contributions from the top international researchers in perception and cognition of music
  • Designed for use as a textbook for advanced courses in psychology of music
LanguageEnglish
Release dateOct 29, 2012
ISBN9780123814616
The Psychology of Music

Related to The Psychology of Music

Related ebooks

Psychology For You

View More

Related articles

Reviews for The Psychology of Music

Rating: 4 out of 5 stars
4/5

4 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    The Psychology of Music - Diana Deutsch

    Preface

    The aim of this book is to interpret musical phenomena in terms of mental function—to characterize the ways in which we perceive, remember, create, perform, and respond to music. The book is intended as a comprehensive reference source for perceptual and cognitive psychologists, neuroscientists, and musicians, as well as a textbook for advanced courses on the psychology of music.

    In 1982, when the first edition of The Psychology of Music was published, this interdisciplinary field was in its infancy. Music had no established position within psychology, and few music theorists acknowledged the relevance of empirical research. The book, which drew together the diverse and scattered literature that had accumulated over the previous decade, was written by a group of visionaries from different areas of scholarship—psychologists, neuroscientists, engineers, music theorists and composers—who were committed to establishing this new discipline.

    During the years since the first edition was published the field has expanded rapidly, and there have been enormous strides in our understanding of the psychology of music, particularly since publication of the second edition of this volume in 1999. This progress has been due in part to the development of computer technology, and more specifically to the availability of new software that has enabled researchers to generate, analyze and transform sounds with ease, precision and flexibility. Developments in neuroscience—in particular neuroimaging techniques—have led to an enormous increase in findings concerning the neuroanatomical substrates of musical processing. In addition, input from music theorists and composers continues to play a central role in addressing fundamental questions about the way we process musical structures.

    The massive development of research on the psychology of music has resulted in the recent publication of a number of highly readable books on the subject written for a general audience. Among these are Oliver Sacks’ Musicophilia, Philip Ball’s The Music Instinct, and Daniel Levitin’s This Is Your Brain On Music. William Thompson’s Music, Thought, and Feeling serves as an excellent textbook for undergraduate courses on the psychology of music. Other recently published and highly successful books include John Sloboda’s The Musical Mind, Aniruddh Patel’s Music, Language, and the Brain, and David Huron’s Sweet Anticipation. The present volume serves to provide in-depth coverage of research findings and theories in the different subareas of the field, written by world-renowned authorities in these subareas.

    The volume opens with a chapter on The Perception of Musical Tones, by Andrew Oxenham (Chapter 1), which sets the stage for those that follow. Oxenham first reviews psychoacoustic methodology. Then drawing on behavioral and physiological evidence, together with theoretical models, he provides a thoughtful overview of findings concerning tone perception, particularly in musical contexts. Here we find discussions of loudness, pitch, and timbre, together with interactions between these attributes. Consonance, dissonance, and roughness are also explored, as are higher-level interactions that occur when multiple pitches are presented.

    The understanding of timbre perception is of central importance to composers of new music. In his interdisciplinary chapter Musical Timbre Perception (Chapter 2), Stephen McAdams provides a detailed exploration of research on timbre, particularly involving the multidimensional scaling of timbre spaces. Such spaces have been put to intriguing use, for example in defining and exploiting fine-grained relationships between timbres. McAdams also discusses the perceptual blending of instruments to create new timbres, as well as the use of timbre to organize events into coherent groupings and to achieve perceptual separations between groupings.

    Johan Sundberg’s provocative chapter on Perception of Singing (Chapter 3) addresses many puzzling questions. For example, how is it that we can hear a singer’s voice against a loud orchestral background? How are we able to identify sung vowels, even when these differ considerably from those of speech? How do we identify the gender and register of a particular singer even when the range of his or her voice is common to all singers and several registers? These questions are expertly addressed in the context of an overview of the acoustics of the singing voice.

    In Intervals and Scales (Chapter 4), William Thompson examines our sensitivity to pitch relationships in music, and to the musical scales that help us organize these relationships—issues that are essential to the understanding of music perception. The chapter addresses questions such as how musical intervals are processed by the auditory system, whether certain intervals have a special perceptual status, and why most music is organized around scales. One discussion of particular importance concerns the characteristics of scales that appear as cross-cultural universals, and those that appear to be culture-specific.

    The genesis of absolute pitch has intrigued musicians for centuries, and this is explored in Absolute Pitch (Deutsch, Chapter 5). Is it an inherited trait that becomes manifest as soon as the opportunity arises? Alternatively, can it be acquired at any time through extensive practice? Or does it depend on exposure to pitches in association with their names during a critical period early in life? These hypotheses are explored, and evidence for a strong tie with speech and language is discussed. The neuroanatomical substrates of absolute pitch are examined, as are relationships between this abililty and other abilities.

    Consider what happens when we listen to a performance by an orchestra. The sounds that reach our ears are produced by many instruments playing in parallel. How does our auditory system sort out this mixture of sounds, so that we may choose to listen to a particular instrument, or to a particular melodic line? Grouping Mechanisms in Music (Deutsch, Chapter 6) examines this and related questions, drawing from perceptual and physiological studies, together with input from music theorists. It is also shown that listeners may perceptually reorganize what they hear, so that striking illusions result.

    The next chapter, on The Processing of Pitch Combinations (Deutsch, Chapter 7) explores how pitch is represented in the mind of the listener at different levels of abstraction. The chapter examines how listeners organize pitches in music so as to perceive coherent phrases, and it is argued that at the highest level of abstraction music is represented in the form of coherent patterns that are linked together as hierarchical structures. The chapter also surveys research on short-term memory for different features of tone, and explores a number of musical illusions that are related to speech.

    With the development of computer resources, computational modeling has assumed increasing importance in the field of music cognition—particularly in combination with behavioral and physiological studies. In Computational Models of Music Cognition (Chapter 8), David Temperley provides a thoughtful overview and evaluation of research in the field. He examines models of key and meter identification in detail. In addition, he discusses models of pitch perception, grouping and voice separation, and harmonic analysis. Models of music performance (including expressivity) are evaluated, as are models of musical experience. Finally, computer algorithms for music composition are considered.

    Research concerning temporal aspects of music perception and cognition has expanded considerably over the last decade. In Structure and Interpretation of Rhythm in Music (Chapter 9), Henkjan Honing provides an overview of findings concerning the perception of rhythm, meter, tempo, and timing, from both a music theoretic and a cognitive perspective. He also considers how listeners distill a discrete rhythmic pattern from a continuous series of intervals, and emphasizes that rhythms as they are perceived often deviate considerably from the temporal patterns that are presented. Related to this, the roles of context, expectations and long-term familiarity with the music are discussed.

    The performance of music draws on a multitude of complex functions, including the visual analysis of musical notations, translating these into motor acts, coordinating information from different sensory modalities, employing fine motor skills, and the use of auditory feedback. In Music Performance: Movement and Coordination (Chapter 10), Caroline Palmer addresses these issues, particularly focusing on recent work involving the use of new motion capture and video analysis techniques. She also considers research on ensemble playing, in particular how musicians conform the details of their performance to those of other members of the ensemble.

    Laurel Trainor and Erin Hannon, in Musical Development (Chapter 11), address fundamental issues concerning the psychology of music from a developmental perspective. Following a discussion of musical capacities at various stages of development, the authors consider innate and environmental influences, including the roles played by critical periods. They consider those aspects of musical processing that appear universal, and those that appear specific to particular cultures. They also review findings indicating that music and language have overlapping neurological substrates. As a related issue, the authors examine effects of musical training on linguistic and other cognitive abilities.

    Continuing with Music and Cognitive Abilities (Chapter 12), Glenn Schellenberg and Michael Weiss provide a detailed appraisal of associations between music and other cognitive functions. The chapter discusses cognitive ability immediately following listening to music (termed the Mozart effect), the effects of background music on cognitive function, and associations between musical training and various cognitive abilities. The authors provide evidence that musical training is associated with general intelligence, and more specifically with linguistic abilities. They argue, therefore, that musical processing is not solely the function of specialized modules, but also reflects general properties of the cognitive system.

    Isabelle Peretz, in The Biological Foundations of Music: Insights from Congenital Amusia (Chapter 13), stresses the opposing view—that musical ability is distinct from language, and is subserved primarily by specialized neural networks. Here she focuses on congenital amusia—a musical disability that cannot be attributed to mental retardation, deafness, lack of exposure, or brain damage after birth. She discusses evidence for an association of this condition with an unusual brain organization, and provides evidence that congenital amusia has a genetic basis.

    Relationships between musical ability and other abilities are further considered by Catherine Wan and Gottfried Schlaug, in Brain Plasticity Induced by Musical Training (Chapter 14). The authors point out that music lessons involve training a host of complex skills, including coordination of multisensory information with bimanual motor activity, development of fine motor skills, and the use of auditory feedback. They review findings showing effects of musical training on brain organization, and they focus on research in their laboratory that explores the therapeutic potential of music-based interventions in facilitating speech in chronic stroke patients with aphasia, and in autistic children.

    The reason why music invokes emotions has been the subject of considerable debate. In their chapter on Music and Emotion (Chapter 15) Patrik Juslin and John Sloboda provide a thoughtful overview of findings and theories in the field. They draw an important distinction between emotion as expressed in music, and emotion as induced in the listener, pointing out that there is no simple relation between the two. They hypothesize that many of the characteristics of musical communication can best be explained, at least in part, in terms of a code for expression of the basic emotional categories by the human voice.

    In Comparative Music Cognition: Cross-Species and Cross-Cultural Studies (Chapter 16), Aniruddh Patel and Steven Demorest address two issues of fundamental importance to the understanding of musical processing. First, which musical capacities are uniquely human, and which do we share with nonhuman species? In addressing this issue, the authors shed light on the evolution of musical abilities. The second issue concerns the enormous diversity of human music across cultures. Theories and research findings that are based on the music of a single tradition are in principle limited in their application. The authors present evidence that certain aspects of music cross cultural lines while others are culture-specific, so clarifying the scope of existing theory.

    The book concludes with Robert Gjerdingen’s Psychologists and Musicians: Then and Now (Chapter 17), which supplies an engaging and informative overview of past and present thinking about the psychology of music. In reviewing approaches to this subject over the centuries, Gjerdingen contrasts those that stress low-level factors such as the physiology of the inner ear with those that consider musical processing in terms of complex, high-order functions. The chapter includes intriguing biographical information concerning some of the notable contributors to the field, which are reflected in their formal writings about music and musical processing. The chapter also provides a critical overview of the psychology of music as it stands today.

    An interdisciplinary volume such as this one can only be considered a group endeavor, and I am grateful to all the authors, who have devoted so much time and thought in bringing the book to fruition. I am grateful to Nikki Levy and Barbara Makinster for their help, and am particularly grateful to Kirsten Chrisman, Publishing Director of Life Sciences Books at Elsevier, for her wise and effective guidance, and to Katie Spiller for her expertise and professionalism in producing the book.

    Diana Deutsch

    1

    The Perception of Musical Tones

    Andrew J. Oxenham

    Department of Psychology, University of Minnesota, Minneapolis

    I. Introduction

    A. What Are Musical Tones?

    The definition of a tone—a periodic sound that elicits a pitch sensation—encompasses the vast majority of musical sounds. Tones can be either pure—sinusoidal variations in air pressure at a single frequency—or complex. Complex tones can be divided into two categories, harmonic and inharmonic. Harmonic complex tones are periodic, with a repetition rate known as the fundamental frequency (F0), and are composed of a sum of sinusoids with frequencies that are all integer multiples, or harmonics, of the F0. Inharmonic complex tones are composed of multiple sinusoids that are not simple integer multiples of any common F0. Most musical instrumental or vocal tones are more or less harmonic but some, such as bell chimes, can be inharmonic.

    B. Measuring Perception

    The physical attributes of a sound, such as its intensity and spectral content, can be readily measured with modern technical instrumentation. Measuring the perception of sound is a different matter. Gustav Fechner, a 19th-century German scientist, is credited with founding the field of psychophysics—the attempt to establish a quantitative relationship between physical variables (e.g., sound intensity and frequency) and the sensations they produce (e.g., loudness and pitch; Fechner, 1860). The psychophysical techniques that have been developed since Fechner’s time to tap into our perceptions and sensations (involving hearing, vision, smell, touch, and taste) can be loosely divided into two categories of measures, subjective and objective. The subjective measures typically require participants to estimate or produce magnitudes or ratios that relate to the dimension under study. For instance, in establishing a loudness scale, participants may be presented with a series of tones at different intensities and then asked to assign a number to each tone, corresponding to its loudness. This method of magnitude estimation thus produces a psychophysical function that directly relates loudness to sound intensity. Ratio estimation follows the same principle, except that participants may be presented with two sounds and then asked to judge how much louder (e.g., twice or three times) one sound is than the other. The complementary methods are magnitude production and ratio production. In these production techniques, the participants are required to vary the relevant physical dimension of a sound until it matches a given magnitude (number), or until it matches a specific ratio with respect to a reference sound. In the latter case, the instructions may be something like adjust the level of the second sound until it is twice as loud as the first sound. All four techniques have been employed numerous times in attempts to derive appropriate psychophysical scales (e.g., Buus, Muesch, & Florentine, 1998; Hellman, 1976; Hellman & Zwislocki, 1964; Stevens, 1957; Warren, 1970). Other variations on these methods include categorical scaling and cross-modality matching. Categorical scaling involves asking participants to assign the auditory sensation to one of a number of fixed categories; following our loudness example, participants might be asked to select a category ranging from very quiet to very loud (e.g., Mauermann, Long, & Kollmeier, 2004). Cross-modality matching avoids the use of numbers by, for instance, asking participants to adjust the length of a line, or a piece of string, to match the perceived loudness of a tone (e.g., Epstein & Florentine, 2005). Although all these methods have the advantage of providing a more-or-less direct estimate of the relationship between the physical stimulus and the sensation, they have a number of disadvantages also. First, they are subjective and rely on introspection on the part of the subject. Perhaps because of this they can be somewhat unreliable, variable across and within participants, and prone to various biases (e.g., Poulton, 1977).

    The other approach is to use an objective measure, where a right and wrong answer can be verified externally. This approach usually involves probing the limits of resolution of the sensory system, by measuring absolute threshold (the smallest detectable stimulus), relative threshold (the smallest detectable change in a stimulus), or masked threshold (the smallest detectable stimulus in the presence of another stimulus). There are various ways of measuring threshold, but most involve a forced-choice procedure, where the subject has to pick the interval that contains the target sound from a selection of two or more. For instance, in an experiment measuring absolute threshold, the subject might be presented with two successive time intervals, marked by lights; the target sound is played during one of the intervals, and the subject has to decide which one it was. One would expect performance to change with the intensity of the sound: at very low intensities, the sound will be completely inaudible, and so performance will be at chance (50% correct in a two-interval task); at very high intensities, the sound will always be clearly audible, so performance will be near 100%, assuming that the subject continues to pay attention. A psychometric function can then be derived, which plots the performance of a subject as a function of the stimulus parameter. An example of a psychometric function is shown in Figure 1, which plots percent correct as a function of sound pressure level. This type of forced-choice paradigm is usually preferable (although often more time-consuming) than more subjective measures, such as the method of limits, which is often used today to measure audiograms. In the method of limits, the intensity of a sound is decreased until the subject reports no longer being able to hear it, and then the intensity of the sound is increased until the subject again reports being able to hear it. The trouble with such measures is that they rely not just on sensitivity but also on criterion—how willing the subject is to report having heard a sound if he or she is not sure. A forced-choice procedure eliminates that problem by forcing participants to guess, even if they are unsure which interval contained the target sound. Clearly, testing the perceptual limits by measuring thresholds does not tell us everything about human auditory perception; a primary concern is that these measures are typically indirect—the finding that people can detect less than a 1% change in frequency does not tell us much about the perception of much larger musical intervals, such as an octave. Nevertheless it has proved extremely useful in helping us to gain a deeper understanding of perception and its relation to the underlying physiology of the ear and brain.

    Figure 1 A schematic example of a psychometric function, plotting percent correct in a two-alternative forced-choice task against the sound pressure level of a test tone.

    Measures of reaction time, or response time (RT), have also been used to probe sensory processing. The two basic forms of response time are simple response time (SRT), where participants are instructed to respond as quickly as possible by pushing a single button once a stimulus is presented, and choice response time (CRT), where participants have to categorize the stimulus (usually into one of two categories) before responding (by pressing button 1 or 2).

    Although RT measures are more common in cognitive tasks, they also depend on some basic sound attributes, such as sound intensity, with higher intensity sounds eliciting faster reactions, measured using both SRTs (Kohfeld, 1971; Luce & Green, 1972) and CRTs (Keuss & van der Molen, 1982).

    Finally, measures of perception are not limited to the quantitative or numerical domain. It is also possible to ask participants to describe their percepts in words. This approach has clear applications when dealing with multidimensional attributes, such as timbre (see below, and Chapter 2 of this volume), but also has some inherent difficulties, as different people may use descriptive words in different ways.

    To sum up, measuring perception is a thorny issue that has many solutions, all with their own advantages and shortcomings. Perceptual measures remain a crucial systems-level analysis tool that can be combined in both human and animal studies with various physiological and neuroimaging techniques, to help us discover more about how the ears and brain process musical sounds in ways that elicit music’s powerful cognitive and emotional effects.

    II. Perception of Single Tones

    Although a single tone is a far cry from the complex combinations of sound that make up most music, it can be a useful place to start in order to make sense of how music is perceived and represented in the auditory system. The sensation produced by a single tone is typically divided into three categories—loudness, pitch, and timbre.

    A. Loudness

    The most obvious physical correlate of loudness is sound intensity (or sound pressure) measured at the eardrum. However, many other factors also influence the loudness of a sound, including its spectral content, its duration, and the context in which it is presented.

    1. Dynamic Range and the Decibel

    The human auditory system has an enormous dynamic range, with the lowest-intensity sound that is audible being about a factor of 1,000,000,000,000 less intense than the loudest sound that does not cause immediate hearing damage. This very large range is one reason why a logarithmic scale—the decibel or dB—is used to describe sound level. In these units, the dynamic range of hearing corresponds to about 120 dB. Sound intensity is proportional to the square of sound pressure, which is often described in terms sound pressure level (SPL) using a pressure, P0, of 2×10−5 N·m−2 or 20 μPa (micropascals) as the reference, which is close to the average absolute threshold for medium-frequency pure tones in young normal-hearing individuals. The SPL of a given sound pressure, P1, is then defined as 20log10(P1/P0). A similar relationship exists between sound intensity and sound level, such that the level is given by 10log10(I1/I0). (The multiplier is now 10 instead of 20 because of the square-law relationship between intensity and pressure.) Thus, a sound level in decibels is always a ratio and not an absolute value.

    The dynamic range of music depends on the music style. Modern classical music can have a very large dynamic range, from pianissimo passages on a solo instrument (roughly 45 dB SPL) to a full orchestra playing fortissimo (about 95 dB SPL), as measured in concert halls (Winckel, 1962). Pop music, which is often listened to in less-than-ideal conditions, such as in a car or on a street, generally has a much smaller dynamic range. Radio broadcast stations typically reduce the dynamic range even further using compression to make their signal as consistently loud as possible without exceeding the maximum peak amplitude of the broadcast channel, so that the end dynamic range is rarely more than about 10 dB.

    Our ability to discriminate small changes in level has been studied in great depth for a wide variety of sounds and conditions (e.g., Durlach & Braida, 1969; Jesteadt, Wier, & Green, 1977; Viemeister, 1983). As a rule of thumb, we are able to discriminate changes on the order of 1 dB—corresponding to a change in sound pressure of about 12%. The fact that the size of the just-noticeable difference (JND) of broadband sounds remains roughly constant when expressed as a ratio or in decibels is in line with the well-known Weber’s law, which states that the JND between two stimuli is proportional to the magnitude of the stimuli.

    In contrast to our ability to judge differences in sound level between two sounds presented one after another, our ability to categorize or label sound levels is rather poor. In line with Miller’s (1956) famous 7 plus or minus 2 postulate for information processing and categorization, our ability to categorize sound levels accurately is fairly limited and is subject to a variety of influences, such as the context of the preceding sounds. This may explain why the musical notation of loudness (in contrast to pitch) has relatively few categories between pianissimo and fortissimo—typically just six (pp, p, mp, mf, f, and ff).

    2. Equal Loudness Contours and the Loudness Weighting Curves

    There is no direct relationship between the physical sound level (in dB SPL) and the sensation of loudness. There are many reasons for this, but an important one is that loudness depends heavily on the frequency content of the sound. Figure 2 shows what are known as equal loudness contours. The basic concept is that two pure tones with different frequencies, but with levels that fall on the same loudness contour, have the same loudness. For instance, as shown in Figure 2, a pure tone with a frequency of 1 kHz and a level of 40 dB SPL has the same loudness as a pure tone with a frequency of 100 Hz and a level of about 64 dB SPL; in other words, a 100-Hz tone has to be 24 dB higher in level than a 40-dB SPL 1-kHz tone in order to be perceived as being equally loud. The equal loudness contours are incorporated into an international standard (ISO 226) that was initially established in 1961 and was last revised in 2003.

    Figure 2 The equal-loudness contours, taken from ISO 226:2003.

    Original figure kindly provided by Brian C. J. Moore.

    These equal loudness contours have been derived several times from painstaking psychophysical measurements, not always with identical outcomes (Fletcher & Munson, 1933; Robinson & Dadson, 1956; Suzuki & Takeshima, 2004). The measurements typically involve either loudness matching, where a subject adjusts the level of one tone until it sounds as loud as a second tone, or loudness comparisons, where a subject compares the loudness of many pairs of tones and the results are compiled to derive points of subjective equality (PSE). Both methods are highly susceptible to nonsensory biases, making the task of deriving a definitive set of equal loudness contours a challenging one (Gabriel, Kollmeier, & Mellert, 1997).

    The equal loudness contours provide the basis for the measure of loudness level, which has units of phons. The phon value of a sound is the dB SPL value of a 1-kHz tone that is judged to have the same loudness as the sound. So, by definition, a 40-dB SPL tone at 1 kHz has a loudness level of 40 phons. Continuing the preceding example, the 100-Hz tone at a level of about 64 dB SPL also has a loudness level of 40 phons, because it falls on the same equal loudness contour as the 40-dB SPL 1-kHz tone. Thus, the equal loudness contours can also be termed the equal phon contours.

    Although the actual measurements are difficult, and the results somewhat contentious, there are many practical uses for the equal loudness contours. For instance, in issues of community noise annoyance from rock concerts or airports, it is more useful to know about the perceived loudness of the sounds in question, rather than just their physical level. For this reason, an approximation of the 40-phon equal loudness contour is built into most modern sound level meters and is referred to as the A-weighted curve. A sound level that is quoted in dB (A) is an overall sound level that has been filtered with the inverse of the approximate 40-phon curve. This means that very low and very high frequencies, which are perceived as being less loud, are given less weight than the middle of the frequency range.

    As with all useful tools, the A-weighted curve can be misused. Because it is based on the 40-phon curve, it is most suitable for low-level sounds; however, that has not prevented it from being used in measurements of much higher-level sounds, where a flatter filter would be more appropriate, such as that provided by the much-less-used C-weighted curve. The ubiquitous use of the dB (A) scale for all levels of sound therefore provides an example of a case where the convenience of a single-number measure (and one that minimizes the impact of difficult-to-control low frequencies) has outweighed the desire for accuracy.

    3. Loudness Scales

    Equal loudness contours and phons tell us about the relationship between loudness and frequency. They do not, however, tell us about the relationship between loudness and sound level. For instance, the phon, based as it is on the decibel scale at 1 kHz, says nothing about how much louder a 60-dB SPL tone is than a 30-dB SPL tone. The answer, according to numerous studies of loudness, is not twice as loud. There have been numerous attempts since Fechner’s day to relate the physical sound level to loudness. Fechner (1860), building on Weber’s law, reasoned that if JNDs were constant on a logarithmic scale, and if equal numbers of JNDs reflected an equal change in loudness, then loudness must be related logarithmically to sound intensity. Harvard psychophysicist S. S. Stevens disagreed, claiming that JNDs reflected noise in the auditory system, which did not provide direct insight into the function relating loudness to sound intensity (Stevens, 1957). Stevens’s approach was to use magnitude and ratio estimation and production techniques, as described in Section I of this chapter, to derive a relationship between loudness and sound intensity. He concluded that loudness (L) was related to sound intensity (I) by a power law:

    (Eq. 1)

    where the exponent, α, has a value of about 0.3 at medium frequencies and for moderate and higher sound levels. This law implies that a 10-dB increase in level results in a doubling of loudness. At low levels, and at lower frequencies, the exponent is typically larger, leading to a steeper growth-of-loudness function. Stevens used this relationship to derive loudness units, called sones. By definition, 1 sone is the loudness of a 1-kHz tone presented at a level of 40 dB SPL; 2 sones is twice as loud, corresponding roughly to a 1-kHz tone presented at 50 dB SPL, and 4 sones corresponds to the same tone at about 60 dB SPL.

    Numerous studies have supported the basic conclusion that loudness can be related to sound intensity by a power law. However, in part because of the variability of loudness judgments, and the substantial effects of experimental methodology (Poulton, 1979), different researchers have found different values for the best-fitting exponent. For instance, Warren (1970) argued that presenting participants with several sounds to judge invariably results in bias. He therefore presented each subject with only one trial. Based on these single-trial judgments, Warren also derived a power law, but he found an exponent value of 0.5. This exponent value is what one might expect if the loudness of sound were proportional to its distance from the receiver, leading to a 6-dB decrease in level for every doubling of distance. Yet another study, which tried to avoid bias effects by using the entire (100-dB) level range within each experiment, derived an exponent of only 0.1, implying a doubling of loudness for every 30-dB increase in sound level (Viemeister & Bacon, 1988).

    Overall, it is generally well accepted that the relationship between loudness and sound intensity can be approximated as a power law, although methodological issues and intersubject and intrasubject variability have made it difficult to derive a definitive and uncontroversial function relating the sensation to the physical variable.

    4. Partial Loudness and Context Effects

    Most sounds that we encounter, particularly in music, are accompanied by other sounds. This fact makes it important to understand how the loudness of a sound is affected by the context in which it is presented. In this section, we deal with two such situations, the first being when sounds are presented simultaneously, the second when they are presented sequentially.

    When two sounds are presented together, as in the case of two musical instruments or voices, they may partially mask each other, and the loudness of each may not be as great as if each sound were presented in isolation. The loudness of a partially masked sound is termed partial loudness (Moore, Glasberg, & Baer, 1997; Scharf, 1964; Zwicker, 1963). When a sound is completely masked by another, its loudness is zero, or a very small quantity. As its level is increased to above its masked threshold, it becomes audible, but its loudness is low—similar to that of the same sound presented in isolation but just a few decibels above its absolute threshold. As the level is increased further, the sound’s loudness increases rapidly, essentially catching up with its unmasked loudness once it is about 20 dB or more above its masked threshold.

    The loudness of a sound is also affected by the sounds that precede it. In some cases, loud sounds can enhance the loudness of immediately subsequent sounds (e.g., Galambos, Bauer, Picton, Squires, & Squires, 1972; Plack, 1996); in other cases, the loudness of the subsequent sounds can be reduced (Mapes-Riordan & Yost, 1999; Marks, 1994). There is still some debate as to whether separate mechanisms are required to explain these two phenomena (Arieh & Marks, 2003b; Oberfeld, 2007; Scharf, Buus, & Nieder, 2002). Initially, it was not clear whether the phenomenon of loudness recalibration—a reduction in the loudness of moderate-level sounds following a louder one—reflected a change in the way participants assigned numbers to the perceived loudness, or reflected a true change in the loudness sensation (Marks, 1994). However, more recent work has shown that choice response times to recalibrated stimuli change in a way that is consistent with physical changes in the intensity, suggesting a true sensory phenomenon (Arieh & Marks, 2003a).

    5. Models of Loudness

    Despite the inherent difficulties in measuring loudness, a model that can predict the loudness of arbitrary sounds is still a useful tool. The development of models of loudness perception has a long history (Fletcher & Munson, 1937; Moore & Glasberg, 1996, 1997; Moore et al., 1997; Moore, Glasberg, & Vickers, 1999; Zwicker, 1960; Zwicker, Fastl, & Dallmayr, 1984). Essentially all are based on the idea that the loudness of a sound reflects the amount of excitation it produces within the auditory system. Although a direct physiological test, comparing the total amount of auditory nerve activity in an animal model with the predicted loudness based on human studies, did not find a good correspondence between the two (Relkin & Doucet, 1997), the psychophysical models that relate predicted excitation patterns, based on auditory filtering and cochlear nonlinearity, to loudness generally provide accurate predictions of loudness in a wide variety of conditions (e.g., Chen, Hu, Glasberg, & Moore, 2011).

    Some models incorporate partial loudness predictions (Chen et al., 2011; Moore et al., 1997), others predict the effects of cochlear hearing loss on loudness (Moore & Glasberg, 1997), and others have been extended to explain the loudness of sounds that fluctuate over time (Chalupper & Fastl, 2002; Glasberg & Moore, 2002). However, none has yet attempted to incorporate context effects, such as loudness recalibration or loudness enhancement.

    B. Pitch

    Pitch is arguably the most important dimension for conveying music. Sequences of pitches form a melody, and simultaneous combinations of pitches form harmony—two foundations of Western music. There is a vast body of literature devoted to pitch research, from both perceptual and neural perspectives (Plack, Oxenham, Popper, & Fay, 2005). The clearest physical correlate of pitch is the periodicity, or repetition rate, of sound, although other dimensions, such as sound intensity, can have small effects (e.g., Verschuure & van Meeteren, 1975). For young people with normal hearing, pure tones with frequencies between about 20 Hz and 20 kHz are audible. However, only sounds with repetition rates between about 30 Hz and 5 kHz elicit a pitch percept that can be called musical and is strong enough to carry a melody (e.g., Attneave & Olson, 1971; Pressnitzer, Patterson, & Krumbholz, 2001; Ritsma, 1962). Perhaps not surprisingly, these limits, which were determined through psychoacoustical investigation, correspond quite well to the lower and upper limits of pitch found on musical instruments: the lowest and highest notes of a modern grand piano, which covers the ranges of all standard orchestral instruments, correspond to 27.5 Hz and 4186 Hz, respectively.

    We tend to recognize patterns of pitches that form melodies (see Chapter 7 of this volume). We do this presumably by recognizing the musical intervals between successive notes (see Chapters 4 and 7 of this volume), and most of us seem relatively insensitive to the absolute pitch values of the individual note, so long as the pitch relationships between notes are correct. However, exactly how the pitch is extracted from each note and how it is represented in the auditory system remain unclear, despite many decades of intense research.

    1. Pitch of Pure Tones

    Pure tones produce a clear, unambiguous pitch, and we are very sensitive to changes in their frequency. For instance, well-trained listeners can distinguish between two tones with frequencies of 1000 and 1002 Hz—a difference of only 0.2% (Moore, 1973). A semitone, the smallest step in the Western scale system, is a difference of about 6%, or about a factor of 30 greater than the JND of frequency for pure tones. Perhaps not surprisingly, musicians are generally better than nonmusicians at discriminating small changes in frequency; what is more surprising is that it does not take much practice for people with no musical training to catch up with musicians in terms of their performance. In a recent study, frequency discrimination abilities of trained classical musicians were compared with those of untrained listeners with no musical background, using both pure tones and complex tones (Micheyl, Delhommeau, Perrot, & Oxenham, 2006). Initially thresholds were about a factor of 6 worse for the untrained listeners. However, it took only between 4 and 8 hours of practice for the thresholds of the untrained listeners to match those of the trained musicians, whereas the trained musicians did not improve with practice. This suggests that most people are able to discriminate very fine differences in frequency with very little in the way of specialized training.

    Two representations of a pure tone at 440 Hz (the orchestral A) are shown in Figure 3. The upper panel shows the waveform—variations in sound pressure as a function of time—that repeats 440 times a second, and so has a period of 1/440 s, or about 2.27 ms. The lower panel provides the spectral representation, showing that the sound has energy only at 440 Hz. This spectral representation is for an ideal pure tone—one that has no beginning or end. In practice, spectral energy spreads above and below the frequency of the pure tone, reflecting the effects of onset and offset. These two representations (spectral and temporal) provide a good introduction to two ways in which pure tones are represented in the peripheral auditory system.

    Figure 3 Schematic diagram of the time waveform (upper panel) and power spectrum (lower panel) of a pure tone with a frequency of 440 Hz.

    The first potential code, known as the place code, reflects the mechanical filtering that takes place in the cochlea of the inner ear. The basilar membrane, which runs the length of the fluid-filled cochlea from the base to the apex, vibrates in response to sound. The responses of the basilar membrane are sharply tuned and highly specific: a certain frequency will cause only a local region of the basilar membrane to vibrate. Because of its structural properties, the apical end of the basilar membrane responds best to low frequencies, while the basal end responds best to high frequencies. Thus, every place along the basilar membrane has its own best frequency or characteristic frequency (CF)—the frequency to which that place responds most strongly. This frequency-to-place mapping, or tonotopic organization, is maintained throughout the auditory pathways up to primary auditory cortex, thereby providing a potential neural code for the pitch of pure tones.

    The second potential code, known as the temporal code, relies on the fact that action potentials, or spikes, generated in the auditory nerve tend to occur at a certain phase within the period of a sinusoid. This property, known as phase locking, means that the brain could potentially represent the frequency of a pure tone by way of the time intervals between spikes, when pooled across the auditory nerve. No data are available from the human auditory nerve, because of the invasive nature of the measurements, but phase locking has been found to extend to between 2 and 4 kHz in other mammals, depending somewhat on the species. Unlike tonotopic organization, phase locking up to high frequencies is not preserved in higher stations of the auditory pathways. At the level of the auditory cortex, the limit of phase locking reduces to at best 100 to 200 Hz (Wallace, Rutkowski, Shackleton, & Palmer, 2000). Therefore, most researchers believe that the timing code found in the auditory nerve must be transformed to some form of place or population code at a relatively early stage of auditory processing.

    There is some psychoacoustical evidence for both place and temporal codes. One piece of evidence in favor of a temporal code is that pitch discrimination abilities deteriorate at high frequencies: the JND between two frequencies becomes considerably larger at frequencies above about 4 to 5 kHz—the same frequency range above which listeners’ ability to recognize familiar melodies (Attneave & Olson, 1971), or to notice subtle changes in unfamiliar melodies (Oxenham, Micheyl, Keebler, Loper, & Santurette, 2011), degrades. This frequency is similar to the one just described in which phase locking in the auditory nerve is strongly degraded (e.g., Palmer & Russell, 1986; Rose, Brugge, Anderson, & Hind, 1967), suggesting that the temporal code is necessary for accurate pitch discrimination and for melody perception. It might even be taken as evidence that the upper pitch limits of musical instruments were determined by the basic physiological limits of the auditory nerve.

    Evidence for the importance of place information comes first from the fact that some form of pitch perception remains possible even with pure tones of very high frequency (Henning, 1966; Moore, 1973), where it is unlikely that phase locking information is useful (e.g., Palmer & Russell, 1986). Another line of evidence indicating that place information may be important comes from a study that used so-called transposed tones (van de Par & Kohlrausch, 1997) to present the temporal information that would normally be available only to a low-frequency region in the cochlea to a high-frequency region, thereby dissociating temporal from place cues (Oxenham, Bernstein, & Penagos, 2004). In that study, pitch discrimination was considerably worse when the low-frequency temporal information was presented to the wrong place in the cochlea, suggesting that place information is important.

    In light of this mixed evidence, it may be safest to assume that the auditory system uses both place and timing information from the auditory nerve in order to extract the pitch of pure tones. Indeed some theories of pitch explicitly require both accurate place and timing information (Loeb, White, & Merzenich, 1983). Gaining a better understanding of how the information is extracted remains an important research goal. The question is of particular clinical relevance, as deficits in pitch perception are a common complaint of people with hearing loss and people with cochlear implants. A clearer understanding of how the brain uses information from the cochlea will help researchers to improve the way in which auditory prostheses, such as hearing aids and cochlear implants, present sound to their users.

    2. Pitch of Complex Tones

    A large majority of musical sounds are complex tones of one form or another, and most have a pitch associated with them. Most common are harmonic complex tones, which are composed of the F0 (corresponding to the repetition rate of the entire waveform) and upper partials, harmonics, or overtones, spaced at integer multiples of the F0. The pitch of a harmonic complex tone usually corresponds to the F0. In other words, if a subject is asked to match the pitch of a complex tone to the pitch of a single pure tone, the best match usually occurs when the frequency of the pure tone is the same as the F0 of the complex tone. Interestingly, this is true even when the complex tone has no energy at the F0 or the F0 is masked (de Boer, 1956; Licklider, 1951; Schouten, 1940; Seebeck, 1841). This phenomenon has been given various terms, including pitch of the missing fundamental, periodicity pitch, residue pitch, and virtual pitch. The ability of the auditory system to extract the F0 of a sound is important from the perspective of perceptual constancy: imagine a violin note being played in a quiet room and then again in a room with a noisy air-conditioning system. The low-frequency noise of the air-conditioning system might well mask some of the lower-frequency energy of the violin, including the F0, but we would not expect the pitch (or identity) of the violin to change because of it.

    Although the ability to extract the periodicity pitch is clearly an important one, and one that is shared by many different species (Shofner, 2005), exactly how the auditory system extracts the F0 remains for the most part unknown. The initial stages in processing a harmonic complex tone are shown in Figure 4. The upper two panels show the time waveform and the spectral representation of a harmonic complex tone. The third panel depicts the filtering that occurs in the cochlea—each point along the basilar membrane can be represented as a band-pass filter that responds to only those frequencies close to its center frequency. The fourth panel shows the excitation pattern produced by the sound. This is the average response of the bank of band-pass filters, plotted as a function of the filters’ center frequency (Glasberg & Moore, 1990). The fifth panel shows an excerpt of the time waveform at the output of some of the filters along the array. This is an approximation of the waveform that drives the inner hair cells in the cochlea, which in turn synapse with the auditory nerve fibers to produce the spike trains that the brain must interpret.

    Figure 4 Representations of a harmonic complex tone with a fundamental frequency (F0) of 440 Hz. The upper panel shows the time waveform. The second panel shows the power spectrum of the same waveform. The third panel shows the auditory filter bank, representing the filtering that occurs in the cochlea. The fourth panel shows the excitation pattern, or the time-averaged output of the filter bank. The fifth panel shows some sample time waveforms at the output of the filter bank, including filters centered at the F0 and the fourth harmonic, illustrating resolved harmonics, and filters centered at the 8th and 12th harmonic of the complex, illustrating harmonics that are less well resolved and show amplitude modulations at a rate corresponding to the F0.

    Considering the lower two panels of Figure 4, it is possible to see a transition as one moves from the low-numbered harmonics on the left to the high-numbered harmonics on the right: The first few harmonics generate distinct peaks in the excitation pattern, because the filters in that frequency region are narrower than the spacing between successive harmonics. Note also that the time waveforms at the outputs of filters centered at the low-numbered harmonics resemble pure tones. At higher harmonic numbers, the bandwidths of the auditory filters become wider than the spacing between successive harmonics, and so individual peaks in the excitation pattern are lost. Similarly, the time waveform at the output of higher-frequency filters no longer resembles a pure tone, but instead reflects the interaction of multiple harmonics, producing a complex waveform that repeats at a rate corresponding to the F0.

    Harmonics that produce distinct peaks in the excitation pattern and/or produce quasi-sinusoidal vibrations on the basilar membrane are referred to as being resolved. Phenomenologically, resolved harmonics are those that can be heard out as separate tones under certain circumstances. Typically, we do not hear the individual harmonics when we listen to a musical tone, but our attention can be drawn to them in various ways, for instance by amplifying them or by switching them on and off while the other harmonics remain continuous (e.g., Bernstein & Oxenham, 2003; Hartmann & Goupell, 2006). The ability to resolve or hear out individual low-numbered harmonics as pure tones was already noted by Hermann von Helmholtz in his classic work, On the Sensations of Tone Perception (Helmholtz, 1885/1954).

    The higher-numbered harmonics, which do not produce individual peaks of excitation and cannot typically be heard out, are often referred to as being unresolved. The transition between resolved and unresolved harmonics is thought to lie somewhere between the 5th and 10th harmonic, depending on various factors, such as the F0 and the relative amplitudes of the components, as well as on how resolvability is defined (e.g., Bernstein & Oxenham, 2003; Houtsma & Smurzynski, 1990; Moore & Gockel, 2011; Shackleton & Carlyon, 1994).

    Numerous theories and models have been devised to explain how pitch is extracted from the information present in the auditory periphery (de Cheveigné, 2005). As with pure tones, the theories can be divided into two basic categories—place and temporal theories. The place theories generally propose that the auditory system uses the lower-order, resolved harmonics to calculate the pitch (e.g., Cohen, Grossberg, & Wyse, 1995; Goldstein, 1973; Terhardt, 1974b; Wightman, 1973). This could be achieved by way of a template-matching process, with either hard-wired harmonic templates or templates that develop through repeated exposure to harmonic series, which eventually become associated with the F0. Temporal theories typically involve evaluating the time intervals between auditory-nerve spikes, using a form of autocorrelation or all-interval spike histogram (Cariani & Delgutte, 1996; Licklider, 1951; Meddis & Hewitt, 1991; Meddis & O’Mard, 1997; Schouten, Ritsma, & Cardozo, 1962). This information can be obtained from both resolved and unresolved harmonics. Pooling these spikes from across the nerve array results in a dominant interval emerging that corresponds to the period of the waveform (i.e., the reciprocal of the F0). A third alternative involves using both place and temporal information. In one version, coincident timing between neurons with harmonically related CFs is postulated to lead to a spatial network of coincidence detectors—a place-based template that emerges through coincident timing information (Shamma & Klein, 2000). In another version, the impulse-response time of the auditory filters, which depends on the CF, is postulated to determine the range of periodicities that a certain tonotopic location can code (de Cheveigné & Pressnitzer, 2006). Recent physiological studies have supported at the least the plausibility of place-time mechanisms to code pitch (Cedolin & Delgutte, 2010).

    Distinguishing between place and temporal (or place-time) models of pitch has proved very difficult. In part, this is because spectral and temporal representations of a signal are mathematically equivalent: any change in the spectral representation will automatically lead to a change in the temporal representation, and vice versa. Psychoacoustic attempts to distinguish between place and temporal mechanisms have focused on the limits imposed by the peripheral physiology in the cochlea and auditory nerve. For instance, the limits of frequency selectivity can be used to test the place theory: if all harmonics are clearly unresolved (and therefore providing no place information) and a pitch is still heard, then pitch cannot depend solely on place information. Similarly, the putative limits of phase-locking can be used: if the periodicity of the waveform and the frequencies of all the resolved harmonics are all above the limit of phase locking in the auditory nerve and a pitch is still heard, then temporal information is unlikely to be necessary for pitch perception.

    A number of studies have shown that pitch perception is possible even when harmonic tone complexes are filtered to remove all the low-numbered, resolved harmonics (Bernstein & Oxenham, 2003; Houtsma & Smurzynski, 1990; Kaernbach & Bering, 2001; Shackleton & Carlyon, 1994). A similar conclusion was reached by studies that used amplitude-modulated broadband noise, which has no spectral peaks in its long-term spectrum (Burns & Viemeister, 1976, 1981). These results suggest that pitch can be extracted from temporal information alone, thereby ruling out theories that consider only place coding. However, the pitch sensation produced by unresolved harmonics or modulated noise is relatively weak compared with the pitch of musical instruments, which produce full harmonic complex tones.

    The more salient pitch that we normally associate with music is provided by the lower-numbered resolved harmonics. Studies that have investigated the relative contributions of individual harmonics have found that harmonics 3 to 5 (Moore, Glasberg, & Peters, 1985), or frequencies around 600 Hz (Dai, 2000), seem to have the most influence on the pitch of the overall complex. This is where current temporal models also encounter some difficulty: they are able to extract the F0 of a complex tone as well from unresolved harmonics as from resolved harmonics, and therefore they do not predict the large difference in pitch salience and accuracy between low- and high-numbered harmonics that is observed in psychophysical studies (Carlyon, 1998). In other words, place models do not predict good enough performance with unresolved harmonics, whereas temporal models predict performance that is too good. The apparently qualitative and quantitative difference in the pitch produced by low-numbered and high-numbered harmonics has led to the suggestion that there may be two pitch mechanisms at work, one to code the temporal envelope repetition rate from high-numbered harmonics and one to code the F0 from the individual low-numbered harmonics (Carlyon & Shackleton, 1994), although subsequent work has questioned some of the evidence proposed for the two mechanisms (Gockel, Carlyon, & Plack, 2004; Micheyl & Oxenham, 2003).

    The fact that low-numbered, resolved harmonics are important suggests that place coding may play a role in everyday pitch. Further evidence comes from a variety of studies. The study mentioned earlier that used tones with low-frequency temporal information transposed into a high-frequency range (Oxenham et al., 2004) studied complex-tone pitch perception by transposing the information from harmonics 3, 4, and 5 of a 100-Hz F0 to high-frequency regions of the cochlea—roughly 4 kHz, 6 kHz, and 10 kHz. If temporal information was sufficient to elicit a periodicity pitch, then listeners should have been able to hear a pitch corresponding to 100 Hz. In fact, none of the listeners reported hearing a low pitch or was able to match the pitch of the transposed tones to that of the missing fundamental. This suggests that, if temporal information is used, it may need to be presented to the correct place along the cochlea.

    Another line of evidence has come from revisiting early conclusions that no pitch is heard when all the harmonics are above about 5 kHz (Ritsma, 1962). The initial finding led researchers to suggest that timing information was crucial and that at frequencies above the limits of phase locking, periodicity pitch was not perceived. A recent study revisited this conclusion and found that, in fact, listeners were well able to hear pitches between 1 and 2 kHz, even when all the harmonics were filtered to be above 6 kHz, and were sufficiently resolved to ensure that no temporal envelope cues were available (Oxenham et al., 2011). This outcome leads to an interesting dissociation: tones above 6 kHz on their own do not produce a musically useful pitch; however, those same tones when combined with others in a harmonic series can produce a musical pitch sufficient to convey a melody. The results suggest that the upper limit of musical pitch may not in fact be explained by the upper limit of phase locking: the fact that pitch can be heard even when all tones are above 5 kHz suggests either that temporal information is not necessary for musical pitch or that usable phase locking in the human auditory nerve extends to much higher frequencies than currently believed (Heinz, Colburn, & Carney, 2001; Moore & Sęk, 2009).

    A further line of evidence for the importance of place information has come from studies that have investigated the relationship between pitch accuracy and auditory filter bandwidths. Moore and Peters (1992) investigated the relationship between auditory filter bandwidths, measured using spectral masking techniques (Glasberg & Moore, 1990), pure-tone frequency discrimination, and complex-tone F0 discrimination in young and elderly people with normal and impaired hearing. People with hearing impairments were tested because they often have auditory filter bandwidths that are broader than normal. A wide range of results were found—some participants with normal filter bandwidths showed impaired pure-tone and complex-tone pitch discrimination thresholds; others with abnormally wide filters still had relatively normal pure-tone pitch discrimination thresholds. However, none of the participants with broadened auditory filters had normal F0 discrimination thresholds, suggesting that perhaps broader filters resulted in fewer or no resolved harmonics and that resolved harmonics are necessary for accurate F0 discrimination. This question was pursued later by Bernstein and Oxenham (2006a, 2006b), who systematically increased the lowest harmonic present in a harmonic complex tone and measured the point at which F0 discrimination thresholds worsened. In normal-hearing listeners, there is quite an abrupt transition from good to poor pitch discrimination as the lowest harmonic present is increased from the 9th to the 12th (Houtsma & Smurzynski, 1990). Bernstein and Oxenham reasoned that if the transition point is related to frequency selectivity and the resolvability of the harmonics, then the transition point should decrease to lower harmonic numbers as the auditory filters become wider. They tested this in hearing-impaired listeners and found a significant correlation between the transition point and the estimated bandwidth of the auditory filters (Bernstein & Oxenham, 2006b), suggesting that harmonics may need to be resolved in order to elicit a strong musical pitch. Interestingly, even though resolved harmonics may be necessary for accurate pitch perception, they may not be sufficient. Bernstein and Oxenham (2003) increased the number of resolved harmonics available to listeners by presenting alternating harmonics to opposite ears. In this way, the spacing between successive components in each ear was doubled, thereby doubling the number of peripherally resolved harmonics. Listeners were able to hear out about twice as many harmonics in this new condition, but that did not improve their pitch discrimination thresholds for the complex tone. In other words, providing access to harmonics that are not normally resolved does not improve pitch perception abilities. These results are consistent with theories that rely on pitch templates. If harmonics are not normally available to the auditory system, they would be unlikely to be incorporated into templates and so would not be expected to contribute to the pitch percept when presented by artificial means, such as presenting them to alternate ears.

    Most sounds in our world, including those produced by musical instruments, tend to have more energy at low frequencies than at high; on average, spectral amplitude decreases at a rate of about 1/f, or -6 dB/octave. It therefore makes sense that the auditory system would rely on the lower numbered harmonics to determine pitch, as these are the ones that are most likely to be audible. Also, resolved harmonics—ones that produce a peak in the excitation pattern and elicit a sinusoidal temporal response—are much less susceptible to the effects of room reverberation than are unresolved harmonics. Pitch discrimination thresholds for unresolved harmonics are relatively good (~2%) when all the components have the same starting phase (as in a stream of pulses). However, thresholds are much worse when the phase relationships are scrambled, as they would be in a reverberant hall or church, and listeners’ discrimination thresholds can be as poor as 10%—more than a musical semitone. In contrast, the response to resolved harmonics is not materially affected by reverberation: changing the starting phase of a single sinusoid does not affect its waveshape—it still remains a sinusoid, with frequency discriminations thresholds of considerably less than 1%.

    A number of physiological and neuroimaging studies have searched for representations of pitch beyond the cochlea (Winter,

    Enjoying the preview?
    Page 1 of 1