You are on page 1of 10

http://pss.sagepub.

com/
Psychological Science
http://pss.sagepub.com/content/early/2014/05/27/0956797614533571
The online version of this article can be found at:

DOI: 10.1177/0956797614533571
published online 2 June 2014 Psychological Science
Linda Polka, Matthew Masapollo and Lucie Mnard
Who's Talking Now? Infants' Perception of Vowels With Infant Vocal Properties

Published by:
http://www.sagepublications.com
On behalf of:

Association for Psychological Science


can be found at: Psychological Science Additional services and information for

http://pss.sagepub.com/cgi/alerts Email Alerts:

http://pss.sagepub.com/subscriptions Subscriptions:
http://www.sagepub.com/journalsReprints.nav Reprints:

http://www.sagepub.com/journalsPermissions.nav Permissions:

What is This?

- Jun 2, 2014 OnlineFirst Version of Record >>


at MCGILL UNIVERSITY LIBRARY on June 3, 2014 pss.sagepub.com Downloaded from at MCGILL UNIVERSITY LIBRARY on June 3, 2014 pss.sagepub.com Downloaded from
Psychological Science
1 9
The Author(s) 2014
Reprints and permissions:
sagepub.com/journalsPermissions.nav
DOI: 10.1177/0956797614533571
pss.sagepub.com
Research Article
Understanding spoken language involves categorization
processes at many levels. The perception of phonemes,
the smallest speech units that carry meaning, entails rec-
ognizing phonetic categories in the presence of enor-
mous acoustic variability. One of the largest sources of
acoustic variability that the perceiver must contend with
arises from the wide variation in physical attributes of
different talkers related to age and gender (Kuhl, 2004).
In particular, acoustic speech patterns are determined by
length of the vocal folds and length of the vocal tract,
which are highly correlated with an individuals height
(Fitch & Giedd, 1999). Developmental research has dem-
onstrated that young infants can differentiate phonetic
categories across different talkers (man, woman, child)
in some test situations ( Jusczyk, Pisoni, & Mullennix,
1992; Kuhl, 1979, 1983). Other research, however, sug-
gests that this process is initially difficult for infants, par-
ticularly when the task is more complex or the talkers
are more acoustically dissimilar (e.g., Houston & Jusczyk,
2000).
Currently, research is focused on understanding how
the ability to process phonetic information in complex
multitalker contexts develops and improves during
infancy. This work is motivated by a desire to understand
how speech comprehension skills are acquired. The abil-
ity to track phonetic information across talker differences
has not been addressed from a perspective that considers
the perceptual resources that infants need to acquire
speech production skills. Recent findings show that, like
adults, 4-year-olds not only recognize the phonetic
equivalence between self-generated speech and speech
sounds produced by other talkers, but they can also use
this skill to monitor and adjust their vocal output as they
speak (MacDonald, Johnson, Forsythe, Plante, & Munhall,
533571PSSXXX10.1177/0956797614533571Polka et al.Infants Perception of Infant Vowels
research-article2014
Corresponding Author:
Linda Polka, School of Communication Sciences and Disorders, McGill
University, 2001 McGill College, 8th floor, Montreal, Quebec, Canada
H3A 1G1
E-mail: linda.polka@mcgill.ca
Whos Talking Now? Infants Perception
of Vowels With Infant Vocal Properties
Linda Polka
1,2
, Matthew Masapollo
1,2
, and
Lucie Mnard
2,3
1
School of Communication Sciences and Disorders, McGill University;
2
Centre for Research on Brain, Language and Music, McGill University; and
3
Dpartement de Linguistique, Universit du Qubec Montral
Abstract
Little is known about infants abilities to perceive and categorize their own speech sounds or vocalizations produced
by other infants. In the present study, prebabbling infants were habituated to /i/ (ee) or /a/ (ah) vowels synthesized
to simulate men, women, and children, and then were presented with new instances of the habituation vowel and
a contrasting vowel on different trials, with all vowels simulating infant talkers. Infants showed greater recovery of
interest to the contrasting vowel than to the habituation vowel, which demonstrates recognition of the habituation-
vowel category when it was produced by an infant. A second experiment showed that encoding the vowel category
and detecting the novel vowel required additional processing when infant vowels were included in the habituation
set. Despite these added cognitive demands, infants demonstrated the ability to track vowel categories in a multitalker
array that included infant talkers. These findings raise the possibility that young infants can categorize their own
vocalizations, which has important implications for early vocal learning.
Keywords
speech perception, infancy, vowel categorization, babbling, talker variability
Received 10/18/13; Revision accepted 4/1/14
Psychological Science OnlineFirst, published on June 2, 2014 as doi:10.1177/0956797614533571
at MCGILL UNIVERSITY LIBRARY on June 3, 2014 pss.sagepub.com Downloaded from
2 Polka et al.
2012). However, it is unknown whether infants can accu-
rately perceive their own speech or speech produced by
another infant. Research has been silent on this issue
because infant-generated speech has not been imple-
mented in controlled perceptual experiments.
There are currently two perspectives regarding the
relationship between speech perception and production
capacities in early development. These views make dif-
ferent assumptions about the perceptual resources that
young infants need to engage in vocal learning. According
to one viewwhich we call the high-resource/imitation
viewinfants perceptual skills develop well in advance
of production skills and provide a critical infrastructure
that supports emerging production skills via imitation
(Kuhl & Meltzoff, 1996). In this view, young infants can
access phonetic category information across different
talkers (including infants), which they can use to learn to
imitate phonetic categories in the ambient language. This
account is bolstered by evidence that 3- to 5-month-olds
modified their vocalizations in response to target vowels
produced (noninteractively) by an adult on a television
(Kuhl & Meltzoff, 1996).
According to another viewwhich we term the low-
resource/interaction viewspeech perception and pro-
duction skills develop concurrently, guided by exchanges
in an interactive context (Howard & Messum, 2011; Zlatin
& Koenigsknecht, 1976). Accordingly, infants are not
obliged to interpret their own utterances; they can rely on
caregivers imitative and affective responses to indicate
when their productions perceptually match target
sounds in the ambient language. Thus, vocal learning can
proceed even if infant perceptual resources are low or
partially developed. Supporting this view, studies show
that caregivers frequently imitate the vocalizations of their
young infants (Pawlby, 1977) and provide social stimula-
tion (e.g., smiling or touching), which facilitates more
advanced vocal behavior (Goldstein, King, & West, 2003).
Crucially, the ability to recognize phonetic categories
in infant speech is a prerequisite in the high-resource/
imitation view, but not in the low-resource/interaction
view. Thus, findings pertaining to infant perception of
infant speech speak to the conceptual merits of each
view. Using technical advances in speech synthesis to
generate infant speech, we investigated, for the first time,
how infants perceive vowels produced by other infant
talkers. This can be a challenging task for infants for sev-
eral reasons.
First, vowels produced by an infant are acoustically
distinct because an infants vocal folds and vocal tract are
much shorter than those of an adult or a child (Kent &
Murray, 1982; Kuhl & Meltzoff, 1996; Mnard, Schwartz, &
Boe, 2004; Rvachew, Mattock, Polka, & Mnard, 2006;
Rvachew, Slawinski, Williams, & Green, 1996; Vorperian
& Kent, 2007). The fundamental frequency (correspond-
ing to voice pitch) and the formant frequencies (corre-
sponding to the vocal tract resonances) observed in
infant vocalizations are well above the values in adult or
child speech. This is illustrated in spectrograms of adult,
child, and infant vowel sounds in Figure 1. Vowel sounds
are characterized by acoustic energy concentrated in sev-
eral narrow frequency bands known as formants. The
first two formants (F1 and F2) provide critical information
for vowel identity. For example, the vowel /i/ (ee) has
a low F1 and high F2 frequency, whereas /a/ (ah) has
a high F1 and an intermediate F2 frequency. Vowel for-
mant frequencies are typically plotted with the F1 and F2
axes reversed, as in Figures 2 and 3. In such displays, the
corner vowels /i/ ee, /a/ ah, and /u/ oo correspond
to extreme articulatory postures (e.g., high front for ee,
high back for oo, and fully open for ah), and the
resulting space encompasses all possible vowel sounds
for a given vocal tract length. As illustrated in Figures 2
and 3, the infant acoustic vowel space overlaps only par-
tially with the adult and child acoustic space. Thus, intro-
ducing infant vowels increases the range of acoustic
variation that infants encounter in the speech they hear.
The second reason that it may be challenging for
infants to perceive vowels produced by other infant
Time (s)
0 0.5
0
10
4
10
4
10
4
10
4
F
r
e
q
u
e
n
c
y

(
H
z
)
Time (s)
0 0.5
0
Time (s)
0 0.5
0
Time (s)
0 0.5
0
Fig. 1. Spectrograms of the vowel /i/ (ee) with the vocal properties of (from left to right) a 6-month-old infant, an 8-year-old child, an
adult female, and an adult male.
at MCGILL UNIVERSITY LIBRARY on June 3, 2014 pss.sagepub.com Downloaded from
Infants Perception of Infant Vowels 3
talkers is that until they begin to babble, most infants
have limited experience listening to infant speech. Third,
tracking phonetic information across acoustically dissimi-
lar adult talkers is cognitively demanding for infants
(Houston & Jusczyk, 2000). Together, these factors sug-
gest that perceiving vowels in a multitalker context that
includes infant talkers may be particularly difficult for
young infants.
We addressed these issues in two experiments designed
to determine whether young (4- to 6-month-old) infants,
who are not yet producing canonical babble, can recog-
nize the same vowel when produced by adult, child, and
infant talkers. We synthesized isolated vowels, /i/ (ee)
and /a/ (ah), to simulate productions by men, women,
children, and infants. At 4 to 6 months, infants are just
beginning to develop productive control of fully resonant
vowels; their vowel productions typically fall within a nar-
row range of the vowel space, with front, low, and central
vowels preferred. Expansion of the infant vowel space to
stabilize the corner vowels/i/, /u/, and /a/requires
another year of practice (Rvachew etal., 2006; Rvachew
etal., 1996). Thus, infants in the present study had only
limited exposure (from their own speech) to the full range
of vowel qualities that can be produced by an infant vocal
tract (Kent & Murray, 1982).
We chose the vowels /i/ and /a/ because previous
research has shown that 5- to 6-month-olds can perceive
this phonetic contrast in a multitalker context. Using the
conditioned head-turn procedure, Kuhl (1979) tested
infants with vowels synthesized to simulate the produc-
tions of men, women, and children. Similarly, we used
vowels that were synthesized to control for acoustic
dimensions that are not related to talker age and gender,
but we tested infants with the visual-habituation proce-
dure, which does not involve conditioning or training. In
this procedure, infants were first habituated to a set of
vowel exemplars (e.g., /i/) produced by different talkers.
As soon as the infants habituated, we presented four test
trialstwo with new exemplars from the same (familiar)
vowel category (e.g., /i/) and two with new exemplars
from a novel vowel category (e.g., /a/). Crucially, all vow-
els in test trials (familiar and novel) were produced by a
new talker who was not encountered during habituation.
If infants had formed a category representation for the
habituation vowel, we expected them to recognize the
novel vowel as belonging to a different phonetic cate-
gory and to show this by recovering interest and listening
longer to the novel vowel than to the familiar vowel.
Infants may also recognize the change in talker and listen
longer to the familiar vowel (produced by a new talker)
compared with the final habituation trials. However,
despite recognizing the talker change, infants should
show a larger recovery of interest to the novel than to the
familiar test vowel if they recognize that the familiar test
vowel is another instance of the vowel presented during
habituation.
Experiment 1
Method
Participants. Data from 56 infants, aged 4 to 6 months
(36 male, 20 female; mean age = 161 days, range = 138
197 days), were analyzed; all infants were exposed to
languages in which /i/ and /a/ are phonemic, with the
majority being from English- or French-speaking families.
Twenty-three additional infants were excluded because
of fussiness (n = 15), caregiver interference (n = 3), or
experimental error (n = 5). All were full term and had no
known health problems.
Stimuli. We selected eight /i/ and eight /a/ isolated
vowels from a large corpus of vowels synthesized using
the variable linear articulatory model (VLAM), described
in Mnard et al. (2004). This corpus simulates talkers
across a broad age range from infancy to adulthood. The
selected vowels simulate eight different talkers: two
6-month-old infants; an 8-, 10-, and 12-year-old child;
two adult females; and one adult male. Example
5 10 15 20
2
3
4
5
6
7
8
9
10
11
12
F2 (Bark units)
F
1

(
B
a
r
k

u
n
i
t
s
)
i21
u21
a21




u0
a0
i0
Fig. 2. A simulation of the maximal acoustic vowel spaces of an adult
male and an infant. As in a standard vowel plot, each vowel in the
simulation set is depicted with the frequency of the first formant (F1)
plotted on the y-axis and the frequency of the second formant (F2)
on the x-axis. For the infant, the corner vowels /i/ (ee), /a/ (ah),
and /u/ (oo), which encompass the full range of vowel articulations,
are represented by circles and labeled i0, a0, u0; for the adult, the
corner vowels are represented by stars and labeled i21, a21, u21. The
axes are scaled in Bark units, which transform formant frequencies into
units of equal perceptual distance. (To convert from Bark to Hz, use
the following formula: F(Hz) = 650 sinh(F(Bark)/7); to convert from
Hz to Bark, use this formula: F(Bark) = 7 asinh(F(Hz)/650); see also
Schroeder, Atal, and Hall, 1979.)
at MCGILL UNIVERSITY LIBRARY on June 3, 2014 pss.sagepub.com Downloaded from
4 Polka et al.
spectrograms are shown in Figure 1. Details of the VLAM
synthesis and acoustic description of the vowels are pro-
vided in the Supplemental Material available online.
All vowels were 500-ms long and matched in intensity
and intonation contour. The stimuli were judged to be
intelligible, natural-sounding exemplars of each vowel
category by English- and French-speaking adults. Adults
also accurately identified the age and gender differences
simulated in the stimulus set. For testing, we created 16
stimulus files (one per vowel for each talker); each 30-s
file included 20 repetitions of the same vowel with 1-s
interstimulus intervals.
Procedure. Infants were tested using the visual-
habituation (look-to-listen) procedure (Polka, Jusczyk, &
Rvachew, 1995). The infant sat on the caregivers lap at a
distance of about 150 cm facing a 21-in. television moni-
tor in a dimly lit, curtained, soundproof booth. Audiotrak
( Jooan-Dong, Nam-Gu, Incheon, Korea) BSI-90 loud-
speakers and a Sony digital video camera were located
0
2
4
6
8
10
12
0 5 10 15 20
F
1

(
B
a
r
k

u
n
i
t
s
)
F2 (Bark units)
Male (M)
Female (F1, F2)
8-Year-Old Child (C8)
10-Year-Old Child (C10)
12-Year-Old Child (C12)
Infant (IN1, IN2)
/i/
/a/
/u/
Habituation Test
Pretest Number of Trials Varied Four Trials Posttest
Experiment 1 Music i
M
i
C8
i
F2
i
M
i
C12
i
F2
Until Criterion i
IN1
a
IN2
a
IN2
i
IN2
or a
IN2
i
IN2
i
IN2
a
IN2
Music
Music a
M
a
C8
a
F1
a
M
a
C12
a
F2
Until Criterion i
IN1
a
IN2
a
IN2
i
IN2
or a
IN2
i
IN2
i
IN2
a
IN2
Music

Number of Trials Varied Four Trials


Experiment 2 Music i
IN1
i
C8
i
F1
i
IN2
i
C12
i
F2
Until Criterion i
M
a
M
a
M
i
M
or a
M
i
M
i
M
a
M
Music
Fig. 3. Stimuli and design of the two experiments. The graph shows the frequency of the first formant (F1) as a function of the frequency of the
second formant (F2) for the /i/ (ee) and /a/ (ah) vowel stimuli used in the present study, separately for each speaker type. Frequencies for the
/u/ (oo) vowel are also plotted. The axes are scaled in Bark units (see Fig. 2). The bottom panel shows an example of the vowel stimuli presented
in each phase of Experiments 1 and 2. In the habituation phase of Experiment 1, participants heard either /i/ or /a/ vowels produced by an adult
male (M), two adult females (F1, F2), and 8-, 10-, and 12-year-old children (C8, C10, and C12, respectively). In the test phase, there were four trials,
two with the same vowel heard during habituation and two with a different vowel, with the order of the familiar and the novel vowels varying across
the two possibilities shown. Each vowel in the test phase was produced by one of two 6-month-old infants (IN1, IN2). The design of Experiment
2 was the same as that of Experiment 1, except that during the habituation phase, only the /i/ vowel was spoken and the male voice was replaced
by an infant voice, whereas during the test phase, an adult male voice was heard instead of an infant voice.
at MCGILL UNIVERSITY LIBRARY on June 3, 2014 pss.sagepub.com Downloaded from
Infants Perception of Infant Vowels 5
behind the curtain just below the TV screen; the camera
lens protruded through an opening in the curtain. An
experimenter observed the infant outside of the testing
room on a monitor linked to the video camera. The care-
giver wore noise-canceling headphones and listened to
masking music during the entire procedure to avoid
influencing the infants behavior. Habit software (Cohen,
Atkinson, & Chaput, 2000) was used to present stimuli
and record looking (listening) times, which we refer to as
listening times for simplicity.
At the start of each trial, a red flashing light was pre-
sented to direct attention, followed by a black-and-white
checkerboard. The experimenter (who could not hear
the stimuli) pressed a key when the infant fixated on the
checkerboard; this activated the auditory stimulus and
provided an index of the infants listening time. When the
infant looked away for more than 2 s, the sound stopped
and the screen went black. The minimum look time for a
trial was 1 s. If the infant looked away for less than 2 s,
the sound continued to play but the look-away time was
not included in the looking time for that trial. The trial
was terminated when the infant looked away for more
than 2 s or when the complete stimulus file (30-s long)
had played. After a brief pause, the red flashing light
returned to start the next trial.
Design. The experiment consisted of four consecutive
phases: pretest, habituation, test, and posttest (see Fig. 3),
with no breaks or pauses between. Instrumental music
was presented during pre- and posttest trials. On each
habituation and test trial, infants heard a vowel produced
repeatedly by the same talker; a different talker was pre-
sented in each trial. The vowel presented during habitu-
ation was counterbalanced across participants. During
habituation, the order of talkers was randomized within
blocks: Each block contained three trials in which a man,
a woman, and a child talker spoke. The software tracked
a running average of listening time across a three-trial
window. The habituation criterion was met when the
running average dropped below 65% of the longest
three-trial average for that infant. Thus, the number of
habituation trials varied across infants; however, all
infants completed at least four habituation trials (most
completed six or more) and were exposed to all three
talker types (man, woman, child).
During the test phase, there were four trials containing
only vowels produced by infants; two trials contained the
same vowel as during habituation (F = familiar), and two
contained the contrasting vowel not heard during habitu-
ation (N = novel). Test trials were presented in one of
two fixed orders: FNNF or NFFN. Infants were assigned
to four conditions (two habituation conditions, two test
orders) as shown in Figure 3.
Results
Data from the two habituation groups were combined
because they showed no differences in total habituation
time, number of habituation trials, or posttest listening
times. For each infant, listening time was averaged across
the last two trials in habituation and for each test-trial
type (novel, familiar).
Group means (collapsed across habituation condition
and test-trial order) are shown in Figure 4a. The scores
were submitted to a mixed analysis of variance (ANOVA)
with test-trial order (FNNF vs. NFFN) as a between-sub-
jects factor and trial type (habituation vs. familiar vs.
0
2
4
6
8
10
12
14
16
Last Two Familiar
M
e
a
n

L
i
s
t
e
n
i
n
g

T
i
m
e

(
s
)
Experiment 1
0
2
4
6
8
10
12
14
16
M
e
a
n

L
i
s
t
e
n
i
n
g

T
i
m
e

(
s
)
Experiment 2
b
a
Novel
Last Two Familiar Novel
*
*
*
Test Trials Habituation Trials
Habituation Trials Test Trials
Fig. 4. Mean listening time to the last two habituation trials and to the
novel and familiar test trials in (a) Experiment 1 and (b) Experiment 2.
Error bars show standard errors of the mean. The asterisk indicates a
significant difference between trial types (p < .05).
at MCGILL UNIVERSITY LIBRARY on June 3, 2014 pss.sagepub.com Downloaded from
6 Polka et al.
novel) as a within-subjects factor. There was no main
effect of test-trial order or interaction of test-trial order
with trial type, F < 1. There was a main effect of trial type,
F(2, 108) = 23.81, p < .001,
p
2
= .306. Post hoc compari-
sons showed that infants listened longer to the novel (M=
11.7 s, SD = 6.3) than to the familiar test vowel, (M = 9.8
s, SD = 6.1), t(55) = 2.46, p = .017, r
2
= .314, and longer
to the familiar test vowel than to the habituation vowel
(M= 6.5 s, SD = 3.0), t(55) = 4.30, p < .001, r
2
= .501.
Discussion
Experiment 1 shows that young infants can track a change
in vowel category and a change in talker when they hear
vowels produced by an infant talker. Thus, young infants
have some ability to recognize vowel categories across
acoustically and perceptually distinct talkers. Because
infants encountered the infant vowels at the end of the
task, Experiment 1 provided little insight into the process-
ing demands involved in this task. Experiment 2 addressed
this issue.
Experiment 2
Perceiving speech in a multitalker context entails some
processing costs for infants as well as adults (e.g.,
Houston & Jusczyk, 2000; Jusczyk etal., 1992; Mullennix
& Pisoni, 1990; Schmale & Seidl, 2009). Moreover, stimu-
lus complexity affects infant processing and category for-
mation in many domains (Hunter & Ames, 1988). Thus,
we reasoned that because infant speech augments stimu-
lus complexity, it might increase the cognitive demands
associated with perceiving speech in a multitalker con-
text. We tested this prediction in Experiment 2 using the
same task and stimuli as in Experiment 1, but with one
changethe infant vowels were included in the habitua-
tion set (replacing the adult male vowels), and only adult
male vowels were presented in the test phase (replacing
the infant vowels; see Fig. 3). This manipulation increased
the acoustic variability present in the habituation set, but
not the number of talkers. We predicted that this change
would raise the stimulus complexity encountered during
habituation and increase cognitive demands in the cate-
gorization task.
If this turned out to be the case, then infants would
need more time to encode the phonetic category informa-
tion in the habituation stage well enough to recognize the
novel category in the test phase. This leads to two specific
predictions. First, infants will listen longer during habitua-
tion in Experiment 2 compared with Experiment 1 before
reaching the habituation criterion. Second, infants catego-
rization performance will be affected by their level of
engagement during habituation, that is, total listening time
during habituation will be positively correlated with the
magnitude of the novelty effect observed in test trials. It is
important to note that prior categorization studies using
this paradigm show that individual differences in catego-
rization ability are typically negatively correlated with
total habituation time. That is, in both auditory and visual
tasks, infants who are better categorizers process stimuli
more efficiently and thus require less time to habituate to
a stimulus set (Arterberry & Bornstein, 2002; Polka,
Rvachew, & Molnar, 2008). Thus, a positive correlation
between total habituation time and novelty score reflects
an increase in stimulus complexity and processing load
rather than individual differences in categorization ability.
(Total habituation time and magnitude of novelty scores
were not correlated in Experiment1; r
2
= .209, p = .122.)
Method
Participants. Data from 27 infants, aged 4 to 6 months
(10 males, 17 females; mean age = 158 days, range =
132195 days), were analyzed; infants language back-
ground was the same as in Experiment 1. Nine additional
infants were excluded because of fussiness (n = 7), fail-
ure to habituate (n = 1), or experimental error (n = 1). All
were full term with no known health problems.
Stimuli and procedure. The stimuli and procedure
were the same as in Experiment 1.
Design. The design was identical to that in Experiment
1, except that all infants were habituated to /i/, the habit-
uation trials included simulations of vowels spoken by an
infant (but not a man), and vowels presented on test
trials simulated an adult male.
Results
Total listening time during habituation (summed across
habituation trials) was computed for each infant. Group
means for infants in Experiment 2 and the /i/ habituation
condition of Experiment 1 are plotted in Figure 5, which
shows that infants listened significantly longer during
habituation in Experiment 2 compared with Experiment
1, t(55) = 3.41, p = .001, r
2
= .417. Thus, as predicted,
infants required more time to encode the variability asso-
ciated with the different talkers when infant vowels were
present in the habituation set (M = 118.1 s, SD = 12.9)
compared with when only adult and child vowels were
present (M = 71.6 s, SD = 5.7); the number of talkers was
the same in both experiments. There was no difference
between experiments in the number of habituation trials
needed to reach criterion or in posttest listening times.
For each infant, listening time was averaged across the
last two habituation trials and for each test-trial type
at MCGILL UNIVERSITY LIBRARY on June 3, 2014 pss.sagepub.com Downloaded from
Infants Perception of Infant Vowels 7
(novel, familiar). Group means for these scores (col-
lapsed across test-trial order) are shown in Figure 4b. As
in Experiment 1, these scores were submitted to a mixed
ANOVA with test-trial order (FNNF vs. NFFN) as a
between-subjects factor and trial type (habituation vs.
familiar vs. novel) as a within-subjects factor. There was
no main effect of test-trial order or interaction of test-trial
order with trial type, F < 1. There was a main effect of
trial type, F(2, 50) = 3.57, p = .035,
p
2
= .125. Post hoc
comparisons showed that infants listened longer to the
novel test vowel (M = 11.9 s, SD = 6.7) than to the familiar
test vowel (M = 8.6 s, SD = 4.0), t(26) = 2.29, p = .031,
r
2
= .409. Listening times to the habituation vowel (M =
9.3 s, SD = 3.8) and to the familiar test vowel were not
significantly different, t(26) = 0.633, p = .532.
To compare the novelty response across experiments,
we conducted a mixed ANOVA with experiment
(Experiment 1: /i/ habituation group vs. Experiment 2) as
a between-subjects factor and test-trial type (familiar vs.
novel) as a within-subjects factor. There was no effect of
experiment or interaction of experiment with test-trial
type, F < 1. There was only a main effect of test-trial type,
F(1, 55) = 5.10, p = .028,
p
2
= .085, which shows that the
novelty response was comparable across experiments. As
predicted, total habituation time was positively correlated
with the size of the novelty score (listening time on novel
test trials minus listening time on familiar test trials; r
2
=
.460, p = .014). A follow-up analysis showed that roughly
half of the infants in Experiment 2 (n = 13) displayed total
habituation listening times comparable with those in
Experiment 1 (within 1 SD), and the other half (n = 14)
listened much longer, with levels more than 1 standard
deviation above the mean of those in Experiment 1 (M =
164.19 s, SD = 68.95). As shown in Figure 6, a reliable nov-
elty effect was observed only in the long-listener subgroup
in Experiment 2, t(12) = 2.42, p = .032. Thus, increased
listening time during habituation was clearly linked to suc-
cessful recognition of the novel vowel in this task.
Discussion
As predicted, processing demands increased when the
infant vowels were part of the stimulus set that infants
needed to encode to form a habituation category. Despite
the added demands, infants were able to track changes in
vowel quality and displayed a novelty response compa-
rable with the effect observed in Experiment 1. However,
unlike in Experiment 1, the magnitude of the novelty
score in Experiment 2 was directly related to the amount
of listening time invested during habituation. Overall,
Experiment 2 shows that including infant vowels in a mul-
titalker context increases processing demands, but the
added costs fall within the cognitive abilities of infants.
General Discussion
The perception of talker variability is a focal issue in
research on infant speech perception. Until now, infant
speech has been left out of the picture despite its rele-
vance in infant speech development. The findings reveal
that young infants ability to track vowel categories across
talkers extends to infant vowel productions. We observed
this ability in prebabbling infants who lack the motor
skills required to produce the target vowels in a con-
trolled way. Therefore, unless they engage in frequent
interactions with older babies, they will have limited
0
20
40
60
80
100
120
140
160
Experiment 1 Experiment 2
M
e
a
n

L
i
s
t
e
n
i
n
g

T
i
m
e

(
s
)
*
Fig. 5. Mean listening time of the /i/ habituation groups in Experi-
ment 1 and Experiment 2. Error bars show standard errors of the mean.
The asterisk indicates a significant difference between experiments
(p< .05).
0
5
10
15
20
25
Familiar
M
e
a
n

L
i
s
t
e
n
i
n
g

T
i
m
e

(
s
)
Trial Type
Novel Familiar Novel
Short Subgroup Long Subgroup
*
Fig. 6. Mean listening time during familiar and novel test trials for
infants in the two habituation subgroups of Experiment 2. Error bars
show standard errors of the mean. The asterisk indicates a significant
difference between trial types (p < .05).
at MCGILL UNIVERSITY LIBRARY on June 3, 2014 pss.sagepub.com Downloaded from
8 Polka et al.
exposure to infant speech. Moreover, infants displayed
this skill in the visual-habituation procedure, which relies
on infants spontaneous listening behavior with no con-
ditioning or reinforcement to support or guide them.
Thus, without explicit training or reinforcement, and with
minimal exposure, infants were able to extrapolate
beyond their immediate experience and adapt quickly to
large acoustic shifts related to the size of the talker.
The precise mechanisms underlying this perceptual
capacity are not fully understood. Previous research has
suggested that tracking vowel quality across talker differ-
ences related to physical attributes (size) may be a natural
process involving innate auditory processes (Smith,
Patterson, Turner, Kawahara, & Irino, 2005). Further work
is also needed to determine the precise information
infants used to recognize the same vowel category across
talkers; this issue is not fully resolved even for adults
(Johnson, 2005). The vowel spaces plotted in Figure 2
show that it is not possible for infants to recognize vowel
categories on the basis of raw formant values. Infants may
be relying on complex relational or configural patterns, or
more global spectral signal properties. Nevertheless, the
present findings indicate that infants can process the
who (talker) and the what (vowel quality) aspects of
the speech signal across a diverse talker array.
Our results also show that prebabbling infants have
some perceptual skills that may allow them to assess
their own vocal output well before they are accomplished
babblers. Thus, both the high-resource/imitation view
and the low-resource/interaction view of early vocal
learning remain viable. The perceptual skills documented
here show that learning via auditory imitation alone is
possible, at least when infants are targeting highly dis-
tinct phonetic categories. Learning via interaction, which
does not require highly refined perceptual skills, is also
feasible and may well be critical when the target sounds
involve more subtle phonetic differences. However, it is
important to note that although the present findings
uncover relevant perceptual skills, they do not show how
infants perceive their own productions during vocal
learning. This remains an important issue for future
research.
Across both experiments, infants recognized the novel
vowel. Yet listening time increased when the habituation
set included infant vowels, and longer habituation times
were clearly linked to recognition of the novel vowel.
This suggests that cognitive demands increased when
infant speech was part of the multitalker array during
habituation. In other studies, infants have not adapted
quickly and effectively to talker differences involving
much smaller acoustic differences. The robust, rapid
adjustments observed in this study may reflect a natural
response to the sharp shift in acoustic variability or nov-
elty tied to the infant signals, a perceptual bias favoring
infant speech, or some combination of these factors. It is
noteworthy that high voice pitch and expanded vowel
space characterize infant speech as well as infant-directed
speech, which is known to attract infant attention and
facilitate speech processing (Fernald & Kuhl, 1987). This
raises the intriguing possibility that infant-directed speech
may help prime the infant perceptual system to perceive
infant vocalizations.
In summary, the present study provides the first evi-
dence that prebabbling infants can recognize infant-
produced vowels as phonetically similar to adult and
child productions. The present study breaks new ground.
Exploring how infants perceive speech with infant vocal
properties brings researchers a step closer to understand-
ing the interplay between speech perception and pro-
duction in early language development.
Author Contributions
L. Polka developed the study concept. L. Mnard created the
auditory stimuli. Infants were tested by M. Masapollo. All
authors contributed to the data analysis and writing of the man-
uscript. All authors approved the final version of the manuscript
for submission.
Acknowledgments
We thank Leanne Ma and Sunah Jeon for assistance with infant
recruitment and testing, and Athena Vouloumanos, Susan
Rvachew, Kris Onishi, Michael Tyler, Terry Gottfried, and Janet
Werker for feedback on this manuscript.
Declaration of Conflicting Interests
The authors declared that they had no conflicts of interest with
respect to their authorship or the publication of this article.
Funding
This work was supported by Discovery Grants from the Natural
Sciences Engineering and Research Council awarded to L. Polka
(DG 105397) and to L. Mnard (DG 312395).
Supplemental Material
Additional supporting information may be found at http://pss
.sagepub.com/content/by/supplemental-data
References
Arterberry, M. E., & Bornstein, M. H. (2002). Variability and
its sources in infant categorization. Infant Behavior &
Development, 25, 515528.
Cohen, L. B., Atkinson, D. J., & Chaput, H. H. (2000). Habit X:
A new program for training and organizing data in infant
perception and cognition studies (Version 1.0). Austin:
University of Texas.
Fernald, A., & Kuhl, P. (1987). Acoustic determinants of infant
preference for motherese speech. Infant Behavior &
Development, 10, 279293.
at MCGILL UNIVERSITY LIBRARY on June 3, 2014 pss.sagepub.com Downloaded from
Infants Perception of Infant Vowels 9
Fitch, W. T., & Giedd, J. (1999). Morphology and development
of the human vocal tract: A study using magnetic resonance
imaging. Journal of the Acoustical Society of America, 106,
15111522.
Goldstein, M. H., King, A. P., & West, M. J. (2003). Social inter-
action shapes babbling: Testing parallels between bird-
song and speech. Proceedings of the National Academy of
Sciences, USA, 100, 80308035.
Houston, D. M., & Jusczyk, P. W. (2000). The role of talker-
specific information in word segmentation by infants.
Journal of Experimental Psychology: Human Perception
and Performance, 26, 15701582.
Howard, I. S., & Messum, P. (2011). Modeling the develop-
ment of pronunciation in infant speech acquisition. Motor
Control, 15, 85117.
Hunter, M. A., & Ames, E. W. (1988). A multifactor model of
infant preferences for novel and familiar stimuli. In L. P.
Lipsitt (Ed.), Advances in child development and behavior
(pp. 6995). New York, NY: Academic Press.
Johnson, K. (2005). Speaker normalization in speech perception.
In D. B. Pisoni & R. Remez (Eds.), The handbook of speech
perception (pp. 363389). Oxford, England: Blackwell.
Jusczyk, P. W., Pisoni, D. B., & Mullennix, J. (1992). Some con-
sequences of stimulus variability on speech processing by
2-month-old infants. Cognition, 43, 252291.
Kent, R. D., & Murray, A. D. (1982). Acoustic features of infant
vocalic utterances at 3, 6, and 9 months. Journal of the
Acoustical Society of America, 72, 353365.
Kuhl, P. K. (1979). Speech perception in early infancy: Perceptual
constancy for spectrally dissimilar vowel categories. Journal
of the Acoustical Society of America, 66, 16681679.
Kuhl, P. K. (1983). Perception of auditory equivalence classes
for speech in early infancy. Infant Behavior & Development,
6, 263285.
Kuhl, P. K. (2004). Early language acquisition: Cracking the
speech code. Nature Reviews Neuroscience, 5, 831843.
Kuhl, P. K., & Meltzoff, A. N. (1996). Infant vocalizations in
response to speech: Vocal imitation and developmental
change. Journal of the Acoustical Society of America, 100,
425438.
MacDonald, E. N., Johnson, E. K., Forsythe, J., Plante, P., &
Munhall, K. (2012). Childrens development of self- regulation
in speech production. Current Biology, 22, 113117.
Mnard, L., Schwartz, J.-L., & Boe, L.-J. (2004). Role of vocal
tract morphology in speech development: Perceptual
targets and sensorimotor maps for synthesized French vow-
els from birth to adulthood. Journal of Speech, Language,
and Hearing Research, 47, 10591080.
Mullennix, J. W., & Pisoni, D. B. (1990). Stimulus variability and
processing dependencies in speech perception. Attention,
Perception, & Psychophysics, 47, 379390.
Pawlby, S. J. (1977). Imitation interaction. In H. R. Schaffer
(Ed.), Studies in mother-infant interaction (pp. 203223).
London, England: Academic Press.
Polka, L., Jusczyk, P. W., & Rvachew, S. (1995). Methods for
studying speech perception in infants and children. In W.
Strange (Ed.), Speech perception and linguistic experience:
Issues in cross-language research (pp. 4989). Baltimore,
MD: York Press.
Polka, L., Rvachew, S., & Molnar, M. (2008). Speech perception
by 6- to 8-month-olds in the presence of distracting sounds.
Infancy, 13, 421439.
Rvachew, S., Mattock, K., Polka, L., & Mnard, L. (2006).
Developmental and crosslinguistic variation in the infant
vowel space: The case of Canadian English and Canadian
French. Journal of the Acoustical Society of America, 120,
22502259.
Rvachew, S., Slawinski, E. B., Williams, M., & Green, C. (1996).
Formant frequencies of vowels produced by infants with
and without early onset otitis media. Canadian Acoustics,
24, 1928.
Schmale, R., & Seidl, A. (2009). Accommodating variability in
voice and foreign accent: Flexibility of early word represen-
tations. Developmental Science, 12, 583601.
Schroeder, M., Atal, B., & Hall, J. (1979). Optimizing digital
speech coders by exploiting masking properties of the
human ear. Journal of the Acoustical Society of America,
66, 16471652.
Smith, D. R. R., Patterson, R. D., Turner, R., Kawahara, H., &
Irino, T. (2005). The processing and perception of size
information in speech sounds. Journal of the Acoustical
Society of America, 117, 305318.
Vorperian, H. K., & Kent, R. D. (2007). Vowel acoustic space
development in children: A synthesis of acoustic and ana-
tomic data. Journal of Speech, Language, and Hearing
Research, 50, 15101545.
Zlatin, M., & Koenigsknecht, R. (1976). Development of the
voicing contrast: A comparison of voice onset time in stop
perception and production. Journal of Speech and Hearing
Research, 19, 93111.
at MCGILL UNIVERSITY LIBRARY on June 3, 2014 pss.sagepub.com Downloaded from

You might also like