Acoustic Vowel Reduction As A Function of Sentence Accent, Word Stress, and Word Class

Speech Communication 12 (1993) 1 23 l
North-Holland
Acoustic vowel reduction as a function of sentence accent,

word stress, and word class
Dick R. v a n B e r g e m
Institute of Phonetic Sciences, University of Amsterdam, Herengracht 338, 1016 CG Amsterdam, The Netherlands
Received 31 March 1992

Revised 20 November 1992
Abstract. The effect of sentence accent, word stress, and word class (function words versus content words) on the acoustic
properties of 9 Dutch vowels in fluent speech was investigated. A list of sentences was read aloud by 15 male speakers. Each
sentence contained one syllable of interest. This could be a monosyllabic function word, an unstressed syllable of a content
word, or a stressed syllable of a content word. The same syllable occurred in all three conditions. Sentence accent was
manipulated with questions that preceded the sentences. A total number of 3465 vowels were segmented from the syllables
and analysed. It was found that all three factors mentioned above had a significant effect both on the steady-state formant
frequencies (F~ and F2) and on the duration of the vowels. Word stress and word class had a stronger effect on the vowels
than sentence accent. A listening experiment showed the perceptual significance of the acoustic measurements. It appeared
that spectral vowel reduction could be better interpreted as the result of an increased contextual assimilation than as the
tendency to centralize. We also studied changes in the dynamics of the formant tracks due to the experimental conditions. It
was found that formant tracks of reduced vowels became flatter which supports the view of an increased contextual assimilation.
Three simple models of vowel reduction are discussed.
Zusammenfassung. Die folgende Studie behandelt den Effekt von Satzakzent, Wortakzent und Wortklasse (Funktionsw6rter
gegen/iber Inhaltsw6rter) auf die akustichen Eigenschaften von 9 holl/indischen Vokalen in fl/issiger Sprache. Eine Liste yon
S/itzen wurde vorgelesen yon 15 m/innlichen Sprechern. Jeder Satz enthielt nur eine zu untersuchende Silbe. Das konnte ein
monosilbisches Funktionswort sein, eine betonte Silbe in einem Inhaltswort, oder eine unbetonte Silbe in einem Inhaltswort.
Der Satzakzent wurde manipuliert mit Fragen die den S/itzen vorhergingen. In total wurden 3465 Vokale segmentiert und
analysiert. Die Ergebnisse zeigten, dab die oben genannten Faktoren alle einen deutlichen Effekt hatten auf die Dauer und auf
die stabilen Formantfrequenzen (FI und F2) der Vokale. Wortakzent und Wortklasse hatten einen st/irkeren Effekt auf die
spektrale Qualit/it der Vokale als der Satzakzent. In einem H6rexperiment wurde die perzeptive Bedeutung der akustischen
Messungen aufgezeigt. Die spektrale Vokalreduktion konnte besser als das Resultat einer zugenommenen Assimilation erkl/irt
werden als als eine Tendenz von Vokalen um zu zentralisieren. Wir untersuchten auch die A.nderungen in der Dynamik der
Formantspuren, verursacht durch die experimentellen Bedingungen. Die Ergebnisse zeigten, dab die Formantspuren yon
reduzierten Vokalen flacher wurden, was die Idee einer zugenommenen Assimilation unterst/itzt. Drei einfache Modelle der
Vokalreduktion werden diskutiert.
R6sum& Cette &ude traite de l'effet de l'accent de phrase, de l'accent de mot et des classes de mots (mots fonctionnels versus
mots de contenu) sur les propri&6s acoustiques de 9 voyelles hollandaises 6nonc6es en parole continue. Une liste de phrases
a &6 lue par 15 locuteurs masculins. Nous nous sommes seulement int6ress+s 5. une syllabe dans chaque phrase. Soit un mot
fonctionnel monosyllabique, soit une syllabe inaccentu+e ou une syllabe accentu6e dans un mot de contenu. L'accent de phrase
6tait manipul6 gr~.ce ~ des questions qui pr6c~daient les phrases. Au total 3465 voyelles ont 6t6 segment6es et analysbes. Les
r&ultats ont montr6 que les facteurs linguistiques ont tousles trois eu un effet distinct sur la dur6e et les fr+quences formantiques
stables (F~ et F2) des voyelles. L'accent de mot et la classe de mot ont eu un effet plus fort que l'accent de phrase sur la qualit6
spectrale des voyelles. Une exp&ience d'6coute a dbmontr6 la signification perceptive des mesures acoustiques. I1 est apparu
que la r6duction spectrale des voyelles pouvait &re mieux interpr6t6e comme le r6sultat d'une assimilation contextuelle accrue
que comme une tendance/t la centralisation. Nous nous sommes +galement penchSs sur les changements dynamiques dans les
traces formantiques caus6es par les conditions exp&imentales. I1 est apparu que les traces formantiques des voyelles r6duites
devenaient plus planes, ce qui soutient l'idSe d'une assimilation accrue. Trois mod6les de r6duction vocalique sont discut6s.
Keywords. Acoustic vowel reduction; lexical vowel reduction; sentence accent; word stress; word class; centralization;
increased contextual assimilation.
0167-6393/93/$06.00 © 1993 E l s e v i e r S c i e n c e P u b l i s h e r s B . V . A l l r i g h t s r e s e r v e d
2 D.R. van Bergem / Acoustic vowel reduction
I. Introduction recognition strategy (Van Bergem, 1990a).

Hence stressed syllables may be more carefully
A lot of information in written or spoken lan- pronounced than unstressed syllables.
guage is usually redundant for the proper under- Another point of interest with respect to word
standing of the message. This becomes apparent if stress is that several studies have shown an
one thinks of, for instance, telegrams, the speech 'informational' advantage of stressed syllables
of young children, or the way you talk in a foreign over unstressed ones, e.g. (Carter, 1987). This
country if you have little knowledge of the lan- means that in most cases the number of word
guage. These examples show that the relevant candidates is smaller when a search in the lexicon
information in written or spoken language can be is based on stressed syllables of words than when
compressed in only a few key words. it is based on unstressed syllables of words. Thus,
For an economic transfer of information in a the uniqueness of words in terms of their
natural speech situation a speaker will probably phonemic composition is mainly determined by
strive to pronounce only the important parts of an their stressed syllables. A study by Altman and
utterance clearly and relax his articulation in the Carter (1989) has shown that this informational
less important parts, knowing that the listener is advantage of stressed syllables is caused by a
capable of filling in fuzzy or missing parts. greater variety of vowels used in these syllables.
Especially semantic and pragmatic knowledge From an informational point of view, recogni-
sources can help the listener a great deal in restor- tion of words could therefore be easier if a
ing acoustically incomplete messages. But which speaker would clearly pronounce the stressed
parts of utterances are especially important for a syllables (and especially the vowels they
proper transfer of the message and which parts are contain).
less important? The following three factors may Word class (content words versus function
play a role in this respect: words). Content words are essential to compre-
Sentence accent. The most salient feature of sen- hend the meaning of an utterance. Function
tence accent from a perceptual point of view is words, on the other hand, mainly serve to specify
a sudden rise or fall in the pitch contour ('t Hart the relations between content words and have
et al., 1990). Accentuation is used to draw the only a small contribution to the meaning of an
listener's attention to certain words 'in focus'. utterance (Bolinger, 1975). In addition, function
That is, a speaker will usually place an accent on words usually have a high frequency of occur-
a certain part of an utterance in order to high- rence. Many investigations have shown that
light the information conveyed in it (Bolinger, words with a high frequency of occurrence are
1972, 1985). This means that a speaker regards more easily recognized than words with a low
accented words as important parts of the mess- frequency of occurrence, e.g. (Morton, 1969;
age, so accented words may be more carefully Forster, 1979). The high a priori probability of
pronounced than unaccented words. function words as it were lowers the 'recognition
Word stress. In our view a word is perceived as threshold'. This suggests that function words
a whole unit and not as for instance a string of may be less carefully pronounced than content
phonemes which are recognized one by one and words.
then merged into words. Following Klatt (1989), The present experiment was restricted to the inves-
we believe that acoustic templates of words are tigation of vowels, because they lend themselves
matched with the acoustic input. The best match well to spectral analysis in the sense that their spec-
is established with a probabilistic strategy based tral quality can be represented fairly well in a
on acoustic evidence as well as on other evidence, simple two-dimensional formant space, see e.g.
such as semantic and pragmatic sources of infor- (Pols et al., 1973). Our main goal was to find out
mation. For a successful match it is probably not how the formant frequencies (F~ and 1;'2) and the
necessary that all parts of a word are pro- duration of vowels are influenced by sentence
nounced equally well. Stressed syllables may accent, word stress, and word class. Apart from
serve as a kind of anchor points in this word this major question we were also interested in the
Speech Communication
D.R. van Bergem / Acoustic vowel reduction 3
relation between spectral vowel reduction and described by phonologists that causes unstressed
coarticulation, and in differences between speakers. vowels to be realized as schwas. This process ap-
Parts of the results from this experiment were plies to most unstressed vowels of (American)
reported earlier in (Van Bergem, 1990b, 1991a). English, although there are exceptions (Ladefoged,
Phoneticians often refer to vowel reduction as 1982). However, as remarked by Nord (1986), for
"a loss of vowel quality", whereas linguists usually other languages such as Swedish there is no general
describe vowel reduction as "the substitution of a phonological rule that changes a vowel phoneme
full vowel with a schwa". Consider the following into a schwa. This is also true for Dutch where the
pairs of semantically related words in which the schwa only occasionally replaces a full vowel in
stressed syllable in the words on the left has been unstressed syllables. (It is, however, a very fre-
replaced by an unstressed syllable in the words on quently occurring phoneme in Dutch affixes.)
the right (word stress in capitals): Therefore, we prefer to use the term lexical vowel
reduction which indicates that a schwa replacement
MAjor, m_aJOrity, (1) is specific to particular words rather than to an
MA_riner maRine, (2) entire sound system. As regards the term phonetic
vowel reduction, we think that it is rather vague
Minor, miNOrity, (3) and that the term acoustic vowel reduction more
CAPture, c_apTIvity. (4) accurately describes the change in vowel quality.
Lexical vowel reduction is probably a systematic
According to the English dictionary (Collins, 1989) extension of acoustic vowel reduction. A vowel
the underlined vowels in the words on the right are that is often affected by acoustic reduction may
replaced by a s c h w a / ~ / i n word pairs (1) and (2), gradually be replaced with a schwa by the linguistic
but remain unchanged in word pairs (3) and (4) community and in the end this may lead to a per-
with respect to their counterparts on the left. This manent lexical substitution. During this process of
means that a native speaker will aim at producing a language change two variants of the same word
full vowel in the words "minority" and "c_aptivity". exist, the one with the original full vowel and the
However, the absence of primary word stress in the one with the schwa. Note that the lexical substitu-
first syllable of these words may to some extent tion of a full vowel with a schwa can only occur
cause a change in vowel quality. In general, not in languages that have accepted the schwa as a
only stress, but also many other factors such as phoneme. The distinction between acoustic and
sentence accent, word class, or speech style may lexical vowel reduction is more thoroughly dis-
influence the vowel quality. Whenever a loss of cussed in (Van Bergem, 1991b).
spectral quality occurs in vowels that were intended The present study is exclusively concerned with
to be full vowels, we will refer to this phenomenon acoustic vowel reduction. In all the test words that
as acoustic vowel reduction. In word pairs (1) and we used in the experiment that is going to be
(2), on the other hand, the full vowel in the words described, the vowel of interest could not be
on the left is simply replaced by a schwa in the replaced by a schwa without the word sounding
words on the right. That is, according to the stan- awkward and unnatural. This means that the
dards of the linguistic community of English speak- speaker's intention would almost certainly be to
ing people, the words "majority" and "m_arine" are produce a full vowel in the selected words and not
'officially' pronounced with a schwa. We will refer a schwa. All reference that is made to vowel reduc-
to this phenomenon as lexical vowel reduction, for tion in the following sections should therefore be
the schwa has become a characteristic part of the read as acoustic vowel reduction.
word. The outline of this article is as follows. The
Some researchers use the terms phonetic and experimental design and the measuring procedures
phonological vowel reduction, e.g. (Fourakis, are described in Sections 2 and 3, respectively. In
1991), which have the same meaning as our terms Section 4 the results are presented. In Section 4.1
acoustic and lexical vowel reduction. The term the effect of the experimental conditions on vowel
phonological vowel reduction refers to the process quality is discussed. In Section 4.2 we focus on the
V o l 12, NO. I, March 1993

topics centralization and contextual assimilation. The speakers heard the questions through a loud-
In Section 4.3 differences between speakers are dis- speaker and then answered by reading in an appro-
cussed. In Section 4.4 the results of a listening priate way the corresponding test sentences. Apart
experiment with the segmented vowels are given, from the 6 test conditions created in this way (3
and Section 4.5 deals with some simple reduction words with and without accent), a 7th condition
models. A general discussion and the conclusions was added in which the speakers uttered the test
are given in Sections 5 and 6, respectively. syllable in isolation. This condition was used to
obtain a kind of 'ideal' pronunciation of the vowel
in each test syllable. Test sentences for the (ficti-
2. Experimental design tious) English example could be as follows. (For
the sake of clarity word stress is printed in capitals,
The main aim of this study was to investigate and sentence accent in bold in the examples below.
how the spectral characteristics and the duration This was not done in the reading list of the actual
of vowels are influenced by sentence accent, word experiment.)
stress, and word class. We wanted to do this in 1. (What did you buy for your mother?)
speech that was as natural as possible. In order to I bought CANdy for my mother.
be able to create experimental conditions in which 2. (For whom did you buy candy?)
the effect of these three factors could be estab- I bought C A N d y for my mother
lished, we opted for sentences containing existing 3. (Where do they sell beer?)
words (no 'nonsense' words) that were read aloud In our c a n T E E N they sell beer.
by a number of speakers. Our test material con- 4. (What do they sell in our canteen?)
sisted of CVC syllables that exist as In our c a n T E E N they sell beer.
- a monosyllabic function word, 5. (What can your sister do for hours?)
- an unstressed syllable in a content word, My sister CAN talk for hours.
- a stressed syllable in a content word. 6. (How long can your sister talk?)
It should be noted that all function words in this My sister C A N talk for hours.
experiment are stressed, because monosyllabic 7. C A N (spoken in isolation).
words have by definition primary word stress. In the case of function words the sentence accent
Although in the actual experiment only Dutch was never placed directly on the function word
words were used, we will illustrate the experimental itself. In a normal speech situation accented func-
procedure with the English syllable " c a n " that tion words are rather exceptional (Kruyt, 1985). In
exists as addition, it is very difficult to think of questions
- a function word " c a n " (an auxiliary verb), that could lead to an accentuation of function
- a n unstressed syllable in the content word words. What question could for instance be posed
"canteen", that would lead in a natural way to the accented
- a stressed syllable in the content word "candy". " c a n " in the sentence " M y sister can talk for
By using identical syllables in such triplets of hours"? This problem could of course have been
words, coarticulation effects would be similar for avoided by printing test words in bold or in capitals
the vowel of interest (the /a~/ in the English in order to evoke an accentuation, but this would
example) in all experimental conditions. focus the speaker's attention too much on these
Each word was placed in a sentence that was words and would lead to a less natural pronuncia-
read aloud by 15 male speakers. The sentences were tion in our view. Instead of trying to make a condi-
constructed in such a way that each test syllable tion with accented function words, we therefore
occurred in about the same position for all experi- attempted to create the same rhythmical pattern in
mental conditions. All sentences were read twice, the word sequence containing a function word as
once with an accent on the test word and once with in content words with an unstressed test syllable.
an accent elsewhere in the sentence. The place of This was done by placing an accent on the word
the sentence accent was manipulated by means of following (or sometimes preceding) the function
questions, which had been recorded in advance. word. (Compare "can talk" and "canteen" in the
examples given above.) In this way we could find goals some authors, e.g. (Van Coile, 1987), there-
out whether vowel quality in these cases is influ- fore prefer to define function words on the basis of
enced by the rhythm in the utterance or that it is a high frequency of occurrence, regardless of the
influenced by semantic aspects, as explained below. grammatical class they belong to. For the present
The actual realization of the pitch movement experiment two inventories of function words were
causing the perception of accent, e.g. ('t Hart et consulted. Van Wijk and Kempen (1980) list all
al., 1990), occurs on the syllable with primary word the function words according to their strict defini-
stress. Thus, for the word "canteen" the syllable tion. The inventory of Baart (1987) is smaller, but
"teen" gets a pitch accent, but nevertheless the he incorporated a small number of high frequency
entire word is highlighted. A pitch accent on the verbs. We selected all the function words men-
word "talk" of the sequence "can talk" gives rise tioned in these two inventories that had a CVC
to a similar rhythmical pattern as in the accented structure as well as a frequent usage. After this
word "canteen", but only the word "talk" will be selection, the Dutch lexical database CELEX
highlighted and n o t the function word "can". Does (1985) was searched for content words in which
the similar rhythmical structure due to accentua- any of these CVC syllables occurs. For each CVC
tion in "canteen" and "can talk" influence the syllable two different types of content words had
quality of the v o w e l / ~ e / i n the same way, or does to be found, one with a stressed test syllable and
the different semantic structure cause differences in another with an unstressed test syllable. A total of
vowel quality? 33 appropriate triplets of words were gathered in
In summary, apart from the control condition this way. The content words always contained
"syllables spoken in isolation", the test conditions either two or three syllables. According to metrical
contain three different syllable types and presence phonology, at least three degrees of stress can be
or absence of sentence accent : assigned to the syllables in a word: primary stress,
I. stressed syllable in an accented secondary stress, and no stress (Booij, 1981). The
content word (S-A), 'unstressed' syllables in the present experiment did
2. stressed syllable in an unaccented not have primary stress and in almost all cases no
content word (S-NA), secondary stress either.
3. unstressed syllable in an accented For all these words test sentences and questions
content word (NS-A), were constructed. The list of questions was read
4. unstressed syllable in an unaccented aloud by a male speaker who had no knowledge
content word (NS-NA), of the aims of the experiment. These prerecorded
5. function word followed (or questions were subsequently presented to the 15
preceded) by an accented content male speakers with intervals of 5 seconds, allowing
word (F-A), them to respond by reading the test sentences. The
6. function word followed (or order of the test sentences (and the corresponding
preceded) by an unaccented content preceding questions) was random within each test
word (F-NA), condition; the order of the test conditions was also
7. syllable spoken in isolation (ISO). random. The random ordering of test sentences
Appropriate test words were collected in the fol- and test conditions was different for each speaker.
lowing way. First, all possible CVC syllables that The speech was recorded on the audio channel of
exist as a function word in the Dutch language a Panasonic video recorder in an anechoic room.
were selected. Although function words are said to The total number of vowels to be analysed was
form a closed set of words, there is no general 3465 (15 speakers × 7 test conditions × 33 test syl-
agreement about which words belong to this set. lables). Not all 12 Dutch monophthongs occurred
In a strict sense, function words are all words that in the test material and some occurred more fre-
belong to the closed grammatical classes such as quently than others (in agreement with their
articles, prepositions, auxiliary verbs, etc. How- frequency of occurrence in Dutch). The vowels that
ever, several of those words are rather formal or were absent were /oe, y , ~ : / , which are rather
archaic and are hardly every used. For practical exceptional in Dutch words. (Together they
Vol. 12, No. 1, March 1993

Table 1 shifted in steps of 2 ms. On each analysis frame a

C o m p l e t e list o f all 33 triplets o f w o r d s t h a t were used in the
10th-order LPC-analysis was performed and subse-
present e x p e r i m e n t
quently continuous formant tracks were obtained
Vowel IPA Function Unstressed Stressed with a method due to Willems (1986) in which
word syllable syllable the Split-Levinson algorithm is used. All formant
u mut moet weemoed vermoedt frequencies were transformed to a mel-scale which
kzm kom kompas komma is more in line with the hearing process than a
kzn kon consult consul linear frequency scale. In this way formant frequen-
toz toch hertog tochtig
cies were weighted in a proper way for the calcula-
a dan dan danseres pedante
a knn kan kantine kandelaar
tion of Euclidean distances in the F~-F2 plane. For
a la X lag glimlach gelach the mel transformation we used the formula
u ma X mag magneet machtig
n pas pas pastoor passer
a van van vandalen havanna
m = 2595 l ° g ( l + 7~00) ,
a was was afwas gewas
ben ben benzine bendelid where f is the formant frequency in Hz and m the
hep heb hebzuchtig hebzuchten mel-converted value (Makhoul and Cosell, 1976).
e men men mentaal mentor
The onset and the offset of the vowels, which
e per per perceel perzik
e wsl wel weldadig weldaden defined their duration, were determined by measur-
e zst zet inzet bezet ing the spectral variation over time. For this pur-
1 lI X lig voorlichting verlichting pose we used cepstral coefficients which turned out
1 wll wil knotwilgen verwilderd to be more sensitive than formant frequen-
1 zl Z zich inzichten voorzichtig
cies. From the 10 LPC-coefficients 24 cepstral
l z~t zit bijzit bezit
i lit liet getralied satelliet
coefficients were calculated (Markel and Gray,
i nit niet deugniet graniet 1976) and these 24 values were in turn transformed
o: do :r door doortastend doorsnede to 8 coefficients on a mel-like scale with a bilinear
o: v o :r voor voordelig voordelen transformation (Oppenheim and Johnson, 1972).
a: d a :r daar standaarden bedaarde
Such bilinear transformed cepstral coefficients are
a: la :t laat uitlaat verlaat
a: na:r naar evenaar (noun) e v e n a a r (verb)
successfully applied in automatic speech recogni-
a: w a :r waar waardering waardeloos tion systems, e.g. (Lee, 1989). The spectral vari-
e: he :1 heel heelkundig heelkunde ation of the speech signal at the k-th frame
e: me:r meer zeemeermin kalmeerde was calculated over a 2 L + I frame interval
e: ve :1 veel veelzijdig veelvouden
[ k - L , k + L]. We chose L = 5, so that the frame
e: we :r weer onweer geweer
interval was 22 ms. The spectral variation V(k) at
the k-th frame was defined as the within standard
constitute less than 5% of all vowel occurrences.) deviation for the frame interval,
A complete list of all triplets of words that were
/1 k+5 s
used in the experiment is given in Table 1. V(k)=x/u ~ Z(cij-6) 2,
~l i=k 5j =l
in which N denotes the total number of cepstral

3. Measuring procedures coefficients in the frame interval (8 × 11 = 88), ci~ is
the j-th mel-based cepstral coefficient of the i-th
The test sentences were lowpass filtered at frame, and g: is the mean value of the 11 j-th cep-
4.5 kHz and digitized at a sample rate of 10 kHz stral coefficients within the frame interval. The con-
with a 12-bit precision. The vowels that had to be tour V(k) was subsequently smoothed by filtering
analysed were segmented from the speech wave- it repeatedly (5 times) with a 3 point Hanning win-
form together with a considerable part of the dow. This procedure to measure spectral variation
surrounding consonants. These segments were ana- resembles the one described by Svendsen and
lysed with a 25 ms Hamming window that was Soong (1987).
D.R. van Bergem /Acoustic vowel reduction 7
KAN
0 M S 0
2000
. . . . . . . . . . . . . . . . . .I.,. . . . . . . . . . . . . . . . . .
.. ..... ,., .....

., .., ,,.., .....
~ .~ ~ ^ ] ~ , l ~ L 1
1 158
Fig. 1. Example of a vowel segmentation: the /o/ is segmented from the function word /kon/. The bottom figure shows the
oscillogram, the middle figure the spectral variation contour V(k) (the segmentation threshold at a value of 0.075 is indicated with a
dashed line), and the top figure three formant tracks on a mel scale. For an explanation of the segmentation procedure, see text.
As an example, Figure 1 shows the spectral vari- which result in peaks in the spectral variation
ation in the Dutch function word " k a n " (the auxili- contour V(k). Spectral transitions within the
ary verb "can"). In the lower part of Figure 1 the vowel are more gradual and result in lower
oscillogram is shown, in the middle part the spec- values in the spectral variation contour. Once a
tral variation contour V(k), and in the upper part starting point within the vowel has been estab-
three formant tracks on a mel-scale. The onset and lished, the procedure continues by tracing the
the offset of the vowels were found with a semi- spectral variation contour for peaks on either
automatic procedure that consisted of two steps: side of the starting point and defines these peak
1. The search for a starting point within the vowel positions as the onset and the offset of the vowel
part of the speech segment. The position in the (markers ' O ' in Figure 1). The peak value has
speech segment with the highest instantaneous to surpass a threshold value in order to be
amplitude is proposed as a starting point selected (we used a threshold value of 0.075,
(marker ' M ' (_maximum) in Figure 1). In most which was determined by trial and error). This
of the cases this position is located somewhere threshold is indicated with a dashed line in Fig-
within the vowel and is then accepted as a start- ure 1. If the threshold value is higher than all
ing point. If not, the user must overrule the occurring peaks on one side of the starting
proposed starting point and indicate any posi- point, it is lowered in small steps until a peak is
tion within the vowel part as a new starting found.
point. The proposed vowel boundaries were carefully
2. The search for the onset and the offset of the checked by listening to selected parts of the speech
vowel. Boundaries between vowels and conson- signal and by visual inspection of the speech wave-
ants are characterized by rapid spectral changes form, the contour V(k), and the formant tracks. In
Vol 12, No. l, M a r c h 1993

about 9% of all cases a proposed boundary was SHORT VOWELS
rejected. This occurred when the spectral variation F1 ( H z )

2OO 300 400 5OO 600 700
within the vowel was large enough to produce a 1650
(a)
2300
= [SO
peak above the threshold or when the transition i ¢,
•
=
=
S-A
NS-A
2100
from consonant to vowel or from vowel to conson- 1500

I • = F-A 1900
ant was too gradual to produce a clear peak and a ¢ =
&. =
S-NA
NS - NA 1700
larger peak near the edges o f the speech segment A O = F- NA
was chosen instead by the algorithm. In these cases 1300 0j•o • 1500
the markers were placed by hand at a more appro- 5"

0 1300
priate local maximum in the spectral variation con-
tour g(k). 1100
The steady-state vowel part was subsequently z~• 0 •*O 1100
defined as that part of the vowel where a minimum

value in the spectral variation contour occurred 900
z~
• °, ~00
between the onset and the offset of the vowel * O

( m a r k e r ' S ' in Figure 1). Only occasionally this U ?00
position was hand-corrected (in about 6% of all 700

300 400 500 600 700 800
cases). This occurred for vowels with a long dura-
FI (MEL)
tion that sometimes contained more than one
steady-state part. In such cases the most appropri- Fig. 2a. Formant plot (F, and Fz) in mel and in Hz of the short
vowels for 7 test conditions (see text) averaged over 15 speakers
ate steady-state part was determined by looking at
and all consonantal contexts. Formant frequencies were
global minima and maxima in the formant tracks. measured at the steady-state vowel part.
4. Results
4.1. Test conditions

LONG VOWELS
4.1.1. Steady-state formant frequencies F1 ( H z )

200 300 400 500 600 700
In Figure 2a the steady-state formant frequencies 1650
Ib} 2300
= lSO
(F~ and F2) o f the Dutch short vowels/u, o, a, ~, i, ~ R
¢, = S-A 2100
i/ in each test condition have been plotted, aver- 1500

t::. • = NS-A
• = F-A 1900
aged over all consonantal contexts and all speak- • ~ = S-NA
/', = NS-NA 1700
ers. In Figure 2b the same has been done for the O = F-NA
long vowels / o : , a:, e:/. The average positions of 1300 15130
vowels from test syllables spoken in isolation (con-

o ~. .. a: 1300
dition ISO), which we regard as 'ideal' vowels in
those specific contexts, are indicated with an aster- 1100 g 1100
isk. It can be seen that in the other test conditions
the mean formant frequencies o f vowels shift away
O 900
from the 'ideal' target position into the direction 900
of the schwa. The position of the schwa was deter-

700
mined by averaging the formant frequencies o f 300 O1
schwas uttered by the 15 speakers. For this purpose 700
300 400 500 600 700 800
the first 20 schwas that were encountered in the test F1 ( M E L )
sentences of each speaker were analysed. Since each
Fig. 2b. F o r m a n t plot (F~ and F2) in mel and in Hz o f the long
speaker had a different random ordering o f test vowels for 7 test conditions (see text) averaged over 15 speakers
sentences, the schwas were taken from different and all c o n s o n a n t a l contexts. F o r m a n t frequencies were
words for each speaker. The gathering of schwa measured at the steady-state v o w e l part.
D.R. van Bergem/ Acoustic' vowelreduction 9
items posed no problems, because the schwa is a of vowels than sentence accent, especially if we con-
frequently occurring phoneme in Dutch (especially sider that a 'spontaneous' accentuation of words by
in affixes). speakers, which is most c o m m o n in conversational
In order to determine the effect of the test condi- speech, is usually even weaker than an accentua-
tions on the formant shifts, the mel-scaled Eucli- tion evoked by questions. The interaction between
dean distances in an G F2 space were calculated "accent" and "syllable type" was also significant
between each vowel item and its 'ideal' counterpart. ( F = 11.4, p<0.001). Tests for pairwise contrasts
The mean distances for each test condition aver- between conditions showed that the conditions NS-
aged over all test syllables and all speakers are NA, F-A and F - N A were not significantly different
shown in Figure 3. This figure reveals that the from each other ( p > 0 . 0 5 ) . F r o m now on we will
distance to the 'ideal' vowel position increases call the strongly reduced vowels from these three
in the order "stressed syllable"-"unstressed test conditions together the group RED.
syllable" "function word". It can also be seen that As explained in Section 2, a similar rhythmical
the distance to the 'ideal' vowel position is larger pattern due to accentuation was created in the
for unaccented vowels than for accented vowels. word sequence containing a function word ("can
An analysis of variance with repeated measures talk") as in content words with an unstressed test
was done on these distances with the factor syllable ("canteen"). The significant difference
"speaker" as independent variable (15 levels) and between the conditions NS-A and N S - N A shows
two trials factors: "accent" (2 levels) and "syllable that the quality of the vowel in the unstressed test
type" (3 levels). The factor "speaker" which was syllable of a content word ( t h e / ~ e / i n "canteen")
significant ( F = 7.6, p < 0.001) will be discussed in is influenced by accentuation of this word (i.e. pres-
Section 4.3. The factors "accent" (F=127.0, ence or absence of an accent on the stressed syllable
p < 0.001 ) and "syllable type" ( F = 217.0, p < 0.001) "teen"). The lack of a signifcant difference between
were also significant. In Figure 3 it can be seen that the conditions F-A and F-NA, on the other hand,
the factor "syllable type" shows larger shifts of shows that presence or absence of an accent on the
formant frequencies than the factor "accent". This word following the function word ("talk") has no
means that word stress and word class ("syllable influence on the quality of the vowel in the function
type") have a greater effect on the spectral quality word ( t h e / ~ e / i n "can"). This can be explained by
the fact that the entire content word "canteen"
as a semantic unit is highlighted by accentuation,
AVERAGE FORMANT DISTANCE PER CONDITION
170
whereas accentuation in the word sequence "can
talk" only highlights the word "talk" and not the
[]
function word "can". So it appears that spectral
150
vowel reduction is governed by semantics rather
than by rhythm in this particular case.
130
~2
c..)
4.1.2. Dynamics o f the formant tracks
~ 110
Apart from shifts in the steady-state formant fre-
quencies, we also wanted to study the changes in
90 A
the dynamics of the formant tracks (G and F2) as
~ NA a result of the experimental conditions. In order to
70 , i ,
measure changes in the curvature, all formant
S NS F
SYLLABLE TYPE
tracks were modelled with a second-degree
polynomial,
Fig. 3. Average Euclidean distances (G and F2) in mel between
vowels from each test condition and their 'ideal' counterpart F(t)=Co+C,t+c2t 2, -l~<t~<l.
(vowels from syllables spoken in isolation). Syllable types are
stressed syllables (S), unstressed syllables (NS) and function The coefficient c2 is a measure of the amount of
words (F). These can occur in accented words (A), or in un- (parabolic) curvature of the tracks. For a proper
accented words (NA). comparison of the curvature in formant tracks
Vol. 12. No. I, March 1993

l0 D.R. van Bergem / A c o u s t i c vowel reduction
LONG VOWELS
from different vowels, each track was 'time normal-
200
ized' by fitting it with the polynomial on the inter- (b)
e:
val [ - 1 , 1].
In Figure 4a the coefficients c2 for the formant
tracks of the short v o w e l s / u , o, a, g, i, i / h a v e been
plotted, averaged over all consonantal contexts and ............ -, a :
all speakers. In Figure 4b the same has been done o
for the long v o w e l s / o :, a :, e :/. In order to get the
usual configuration of vowels, we plotted -c2 in
the figures. For the sake of clarity we have only
/
shown the coefficients for the condition ISO (indi- , -200 !
cated with asterisks) and the group R E D (indica- i
i
ted with dots), because the complete plots with i
curvature coefficients from all test conditions were i
somewhat fuzzy. The appearance of the 'vowel tri-
angle' in Figures 4a and 4b indicates that the curva- -400 O:
ture of formant tracks is specific to each vowel. A 100 200
similar observation was made by Van Son and Pols -c2 FOR F1
(1991). As might be expected, the curvature of F1-
Fig. 4b. Curvature of f o r m a n t tracks (F, and F2) of the long
tracks increases in more open vowels. The curva-
vowels in the conditions ISO (asterisks) and R E D (dots) aver-
ture of F2-tracks is concave downward for front aged over 15 speakers and all consonantal contexts. The curva-
vowels and concave upward for back vowels. The ture is indicated with the coefficient -c2 from the polynomial
figures show a reduction in the curvature of for- fit of the f o r m a n t tracks: F ( t ) - c o + c ~ t + c J 2.
mant tracks for the group R E D compared to the
condition ISO (in the case that c2 = 0 the track has
no curvature at all). This means that the formant
SHORT VOWELS tracks become flatter. For the schwa the curvature
(a) of F1 and F2 is almost zero, so the formant tracks
i of the schwa approach a straight line between the
too
*' I onset and the offset.
E A comparison of Figures 4a and 4b with Figures
..'"
2a and 2b suggests that the shift of steady-state
..-"
formant frequencies coincides with a reduced cur-
vature of the formant tracks. Lindblom (1963)
O refers to this phenomenon as " f o r m a n t under-
shoot". If the curvature is concave downward, we
observe a decrease in steady-state formant frequen-
-1oo
o...
cies of reduced vowels. If, on the other hand, the
curvature is concave upward, we observe an
increase in steady-state formant frequencies of
-20(] reduced vowels. For &-tracks the curvature is
always concave downward, which agrees with the
U
lOO 200
fact that steady-state Fwvalues of reduced vowels
-C2 FOR F1 do not increase, see also (Van Bergem, 1989; Van
Son and Pols, 1990, 1992). The reduction in for-
Fig. 4a. Curvature of f o r m a n t tracks (F~ and F2) of the short
mant track curvature for different vowels shown in
vowels in the conditions ISO (asterisks) and R E D (dots) aver-
aged over 15 speakers and all consonantal contexts. The curva-
Figures 4a and 4b is thus clearly related to the
ture is indicated with the coefficient -c2 from the polynomial shift of steady-state formant frequencies shown in
fit of the f o r m a n t tracks : F( t) = Co+ c~t + c2t 2. Figures 2a and 2b.
D.R. van Bergem / Acoustic vowel reduction I1
In order to determine the effect of each test con- F - N A (the group R E D ) were again not signific-
dition on the curvature, the Euclidean distances antly different from each other (p > 0.05).
in a two-dimensional c2-space were calculated
between each vowel and its 'ideal' counterpart (in 4.1.3. Durations
the same design as was used for the steady-state The average durations of vowels in each test con-
formant frequencies). The mean distances for each dition are shown in Figure 6. This figure reveals
test condition averaged over all test syllables and that the durations of vowels decrease in the
all speakers are shown in Figure 5. This figure order "stressed syllable . . . . unstressed syllable"
shows the same trend in distance for the curvature "function word". It can also be seen that the dura-
of formant tracks as Figure 3 did for steady-state tions of unaccented vowels are smaller than the
formant frequencies. The distance to the 'ideal' durations of accented vowels. The factor "syllable
vowel curvature increases in the order "stressed type" has a larger influence on vowel durations
syllable . . . . unstressed syllable . . . . function word". than the factor "accent". Notice that Figure 6 is
It can also be seen that the distance to the 'ideal' the mirror image of Figures 3 and 5.
vowel curvature is larger for unaccented vowels An analysis of variance with a similar design as
than for accented vowels. The factor "syllable in the former sections was done on vowel dura-
type" shows a greater flattening effect on the tracks tions. The factor "speaker" which was significant
than the factor "accent". ( F = 3.9, p < 0.001 ) will be discussed in Section 4.3.
An analysis of variance with repeated measures Both "accent" ( F = 60.3, p < 0.001) and "syllable
was done with the same design that was used in the type" (F=228.8, p < 0 . 0 0 1 ) were significant fac-
former section. The factor "speaker" which was tors. The interaction between "accent" and "syll-
significant ( F = 2 . 7 , p < 0 . 0 0 1 ) will be discussed in able type" was also significant ( F = 6.3, p < 0.005).
Section 4.3. Both the factors "accent" (F=23.9, Tests for pairwise contrasts between conditions
p < 0.001 ) and "syllable type" ( F = 80.0, p < 0.001) showed that the conditions NS-NA, F-A and F-
were significant. The interaction between "accent" N A (the group R E D ) were again not significantly
and "syllable type" was not significant (p >0.05). different from each other (p > 0.05).
Tests for pairwise contrasts between conditions In Table 2 mean durations in all conditions are
showed that the conditions NS-NA, F-A and given for short vowels and long vowels separately.
It appears that especially long vowels are affected
by the test conditions. An analysis of variance on
A V E R A G E C U R V A T U R E DISTANCE PER C O N D m O N
140
the durations of short vowels and long vowels
A V E R A G E D U R A T I O N PER CONDITION
I10
130
~- A
NA
120 100 -
100 A
NA 80
90 I I I
S NS F
S ~ L E TYPE 70 i I I
S NS F
Fig. 5. Average Euclidean curvature distances (F~ and b~) SYLLABLE TYPE
between vowels from each test condition and their 'ideal' coun-
terpart (vowels from syllables spoken in isolation). Syllable Fig. 6. Average durations in ms for each of the 6 'natural' test
types are stressed syllables (S), unstressed syllables (NS) and conditions. Syllable types are stressed syllables (S), unstressed
function words (F). These can occur in accented words (A) or syllables (NS) and function words (F). These can occur in
in unaccented words (NA). accented words (A) or in unaccented words (NA).
V o l 12, No. I, March 1993

12 D.R. van Bergem / Acoustic" vowel reduction
Table 2 T H E V O V ' [ E L / F_./

Mean durations in ms of short vowels and long vowels separ- F1 (Hz)
ately in the 6 'natural' test conditions averaged over 15 speakers 165G
200 300 400 500 6130 700
2300
Short vowels Long vowels 2100

150(3
Syllable type + accent - accent + accent - accent 1900
+ stress 82 82 150 131 hep ....*.-.i~* 1700

- stress 76 71 104 93 ben ...3...::_.~;,+-..,
func 70 67 91 88 1300
men" .,," zet ,,. 1500
per ,a~
1300
separately revealed that for both vowel groups the r'r~q 11130
factors "accent" and "syllable type" were signific- wel" 1100
ant (p < 0.001).

~0o
900
4.2. Centralization or contextual assimilation

U 700
Many acoustic studies on vowel reduction, e.g. 700

300 400 500 600 700 800
(Sffdhammar et al., 1973 ; K o o p m a n s - V a n Beinum, F1 ( ME L )
1980; the present study) suggest that reduced vow-

Fig. 7. Shift of formant frequencies (F1 and F2) for the vowel
els move towards a more neutral place in the vowel /e/in 6 different consonantal frames. Vowels from the condi-
plane, usually referred to as the schwa position. Is tion ISO (asterisks) and the group RED (dots) are compared.
this centralization effect a phenomenon in its own The vowels /i, u, a:/, adopted from Figures 2a and 2b, have
right, or is it the result of a contextual assimilation been added for reference only.
of the vowel with surrounding phonemes? Delattre
(1969) suggests that all vowels have a tendency to instance that the F2-value of t h e / e / f r o m the syll-
become a schwa (i.e. to centralize), regardless of a b l e / w e l / s h i f t s to a frequency of about 1 I00 Hz
the effect of surrounding phonemes. Moreover, he which gives a n / a / d i k e sound. Apparently, the for-
claims that in some languages (for instance mant pattern for this p a r t i c u l a r / ~ / i s not centraliz-
English) this tendency is stronger than in other ing. The observed effect is most likely caused by an
languages (for instance Spanish). However, Van increased assimilation of the vowel with the sur-
Bergem (1991 b) showed that Delattre confused lex- rounding c o n s o n a n t s / w / a n d / 1 / . A similar strong
ical vowel reduction with acoustic vowel reduction. assimilation effect in the s y l l a b l e / w e l / w a s found
In a study on vowel reduction in Swedish, N o r d by Pols (1977) and for English by Lindblom and
(1975) concluded that vowel reduction is caused by M o o n (1988). The F2-value of the / ~ / from the
an increased contextual assimilation of the vowel syllable / h e p / , on the other hand, shifts to a
with surrounding phonemes. In an extended ver- slightly higher F2-value which makes it a n / i / - l i k e
sion of his study this conclusion was further sup- sound. In general, both the amount and the
ported (Nord, 1986). The same view was taken by direction of the shift of steady-state formant fre-
Lindblom (1963). quencies in reduced vowels seem to be specific to
In order to get a better view on the role of the the consonantal frame in which they occur. The
surrounding consonants in the CVC syllables of clear 'centralization' observed in Figures 2a and 2b
the present study, the shift of steady-state formant is the result of the averaging of different contextual
frequencies was examined in detail for the vowel influences on the vowels; the reduced formant fre-
/ e / in the six occurring consonantal frames. In quencies of the vowels in one particular CVC syll-
Figure 7 vowels from the condition ISO (indicated able do not necessarily centralize.
with asterisks) and the group R E D (indicated with Is centralizing the same as becoming a schwa?
dots) are compared. Apparently, the shift of for- People who think it is, regard the schwa as a vowel
mant frequencies in each particular /~/-item is that is produced with a completely neutral vocal
dependent on the consonantal frame. Notice for tract (a uniform tube). For an average male such
a vowel would have an F~ and F2 of 500 Hz and model indicates the 'ideal' formant track of a vowel
1500 Hz, respectively (Fant, 1960). In this case the in a CVC syllable and the thin line the same for-
schwa position more or less coincides with the mant track in its reduced form. For a proper under-
centre of the vowel triangle. However, recently we standing of the models, it should be noted that even
have measured the formant frequencies of schwas in clearly pronounced CVC syllables (represented
in systematically varying phonemic surroundings by the condition ISO in the present experiment)
and it was found that the schwa can appear almost the vowel can to some extent be assimilated with
anywhere in the vowel plane, dependent on its con- the consonants as well as the consonants with the
text (Van Bergem, Submitted). In our view, the vowel. That is, due to contextual influences the
schwa is not a vowel with a target in the centre of vowel and consonants may not reach their (con-
the vowel triangle, but rather a vowel-like segment text-free) canonical target positions (Daniloff and
without target that is completely assimilated with Hammarberg, 1973). This is the reason that the
its phonemic context. If reduced vowels are (quasi-)target formant frequencies of for instance
regarded as vowels that have been partially assimil- the / ~ / i n / w ~ l / a r e different from those in other
ated with their phonemic context, this means that syllables containing t h e / ~ / ( s e e Figure 7).
the formant frequencies of reduced vowels are In all three models shown in Figure 8 the steady-
shifting to a position in the vowel plane that a state formant frequencies of the reduced vowels
schwa would have in the same phonemic context. show "undershoot" indicating an increased assimi-
As an illustration of this idea, we measured the F~ lation of the vowel with the surrounding conson-
and F2 of the schwa in the word "gruwel" ants. However, the interpretation of the models
( / z r y w o l / ) , where the schwa occurs in the same is quite different with respect to the role that the
consonantal context as the / ~ / from the test syl- surrounding consonants play in the reduction pro-
l a b l e / w e l / . For this purpose we asked 5 speakers cess. In model I the formant onset and offset of
who were subjects in the present experiment (and the reduced vowel shift towards the vowel target,
who were still available) to read a short sentence which means that the surrounding consonants are
containing the word "gruwel" ("gruel" in English). assimilating more with the vowel. In model II the
The average F~ for this schwa was 346 Hz and the formant onset and offset of the reduced vowel stay
average F2 was 940 Hz. As discussed above, the put, indicating a stationary assimilation of the con-
reduced / e / from the test syllable / w e l / did not
sonants with the vowel. In model III the formant
centralize, but shifted to a n / 3 / - l i k e position in the
onset and offset of the reduced vowel shift away
vowel plane (more specifically, to an F~ of 393 Hz
from the vowel target, which means that the con-
and an F2 of 1146 Hz). This shows that t h e / e / i n
sonants are breaking away from the influence of
/ w ~ l / i s reducing to the formant pattern of a schwa
the vowel and are probably moving to their own
in the same consonantal context.
canonical target position. Although the effects on
The covariation of shifts in steady-state formant
the formant onset and offset have been presented
frequencies and the flattening of the formant tracks
symmetrically in the models, they may of course in
supports the " u n d e r s h o o t " concept suggested by
practice each be subject to a different model.
Lindblom (1963). However, spectral undershoot
The plausibility of each model was tested with
can in principle be accounted for by three different
the present dataset. In order to get robust estimates
models as shown in Figure 8. The thick line in each
of the onset and the offset of the formant tracks,
the second degree polynomial approximations were
used (see Section 4.1.2). Since the polynomials
were fitted on the interval [ - 1 , 1], the onset fre-
quency could be determined by inserting t = - 1 and
Model I Model II Model lII
the offset frequency by inserting t = 1 in the poly-
Fig. 8. Schematic picture o f three " u n d e r s h o o t " m o d e l s (see nomial equation
text). The thick line in each m o d e l indicates the ~ideal' f o r m a n t
track o f a vowel in a C V C syllable a n d the thin line the same
f o r m a n t track in its reduced form. fonset=Co-Cl-]-c2, foffset ~-- Co -J- ci -]- c2.
Vol. 12, No. I, March 1993

14 D . R . van B e r g e m / A c o u s t i c v o w e l reduction
Table 3 Rabiner et al. (1983), which resembles the F-ratio,

Percentages a s s i g n m e n t s to each of the three m o d e l s in an analysis of variance,
s h o w n in F i g u r e 8 for the vowels in all test syllables
at the onset a n d the offset o f F~ a n d F2 M
Z N~d(F,, F)
F, & (y-- i I
M N}
Model onset offset onset offset
Z Z d(Fu, F,)
I 0 3 15 24 i lj l
II 39 45 21 12
III 61 52 64 64 in which M denotes the number of vowel clusters
(in our case 9), Ni is the number of vowels in cluster
i (varying between 6 for / u / and 42 for / a / ) ,
For all CVC syllables the onset and the offset of d(F/, F) is the Euclidean distance in reels (in two
/:l-tracks and F2-tracks were calculated in this way dimensions F~ and F2) between the mean formant
and subsequently they were averaged over subjects. vector of cluster i and the overall mean formant
The formant tracks of vowels from the condition vector, and d(F,j, F~) is the Euclidean distance
ISO were defined as "target" formant tracks and between thej-th formant vector in cluster i and the
those of vowels from the group R E D as "reduced" mean formant vector of cluster i. The nominator
formant tracks. For each CVC syllable the differ- of the formula denotes the (weighted) sum of
ence in formant frequency at the onset, steady-state between-cluster distances, and the denominator
part, and offset was calculated between target denotes the sum of within-cluster distances. The
tracks and reduced tracks (both for F~ and F2). If vowel separation for each speaker is shown in
the formant shift at a vowel boundary was less than Figure 9.
25 mel, it was assigned to model II. Otherwise, the It can be seen that the a-ratios of the speakers
formant shift at the steady-state part was compared do not differ very much except for speaker $1 who
with the formant shift at the vowel boundary. If has an extremely good vowel separation and
these shifts had opposite directions model I was speaker $15 whose vowel separation is rather poor.
selected; if they had the same direction model III It will not come as a surprise that speaker S~ is a
was selected. professional speaker (a newsreader) who is well
In Table 3 the percentages of assignments to known for his clear pronunciation. As an illustra-
each model are given for &-tracks and F2-tracks tion of the differences in vowel space between
both at the onset and at the offset. It appears that speakers, the clusters of short vowels from speaker
model III occurs most frequently, followed by
model II. With respect to the onset frequency of F2,
Krull (1989) had similar results. The occurrence of VOWEL SEPARATIONPER SPEAKER
each model may very well depend on the particular
vowel and consonants under consideration, but the
present dataset was too small to uncover systematic 4
tendencies.
3
4.3. Differences between speakers
2
In order to compare the vowel space of different
speakers, we wanted to measure the discrimin- 1
ability of all their vowels in the 6 'natural' test
conditions. For a clear separation of vowel clusters, 0
(phonologically) identical vowels should be close SI $2 $3 $4 S5 $6 $7 $8 $9 S10 SII S12 S13 S14 S15
SPEAKER
together in a formant space and different vowel
clusters should be far apart. We chose the following Fig. 9. Sigma ratio for each speaker, i n d i c a t i n g the a m o u n t o f
definition for vowel separation a, based on vowel s e p a r a t i o n (see text).
D.R. van Bergem / Acoustic vowel reduction t5
Sl and speaker Sis are shown in Figures 10a and his different vowel clusters are large and
10b (ellipses of one standard deviation around overlapping.
mean values). The professional speaker $1 has a In Figure 11 the average durations of all vowels
very large vowel triangle and his compact vowel from the 6 'natural' test conditions are shown for
clusters are clearly separated. Speaker &5, on the each speaker. It will be clear that there is hardly
other hand, has a rather small vowel triangle and any relation between the vowel separation and the
average vowel durations of speakers (the rank cor-
SPEAKER 1 relation is 0.23). The average vowel duration of,
F I fHz) for instance, speakers $2 and $15 is comparable,
200 300 400 500 600 700
1650 2300 but their vowel separation is clearly different.
(a)
The analyses of variance discussed in Section 4.1.
1500
C9 2100
showed that the factor "speaker" was significant
© 1900
for spectral distances ( F = 7.6, p < 0.001 ), for curva-
% ture distances ( F = 2 . 7 , p < 0 . 0 0 1 ) , and for dura-

1700
1300 1500
tions ( F = 3 . 9 , p<0.001). In order to determine if
for all speakers the vowel quality was affected by
1300 ~ the experimental conditions, an analysis of vari-
~ 1100 ance on spectral distances, curvature distances and
1100
durations was done for the vowels of each speaker
separately. The significance level was set to a value
900
0 6) 900
of 0.003 (=0.05/15) because of the large number
of simultaneous tests. At this level of significance
700 the factor "syllable type" was significant for all
700
300 400 500 600 700 800
15 speakers in the analysis of variance on spectral
F1 (MEL) distances, for 8 speakers in the analysis of variance
on curvature distances, and for 14 speakers in the
Fig. 10a. Clusters of short vowels for speaker S~ who had the analysis of variance on durations. The factor
best vowel separation according to Figure 9.
"accent", on the other hand, was significant for
only 6 speakers in the analysis of variance on spec-
SPEAKER 15 tral distances, for only 1 speaker in the analysis of
F1 (Hz)
200 300 400 500 600 700 variance on curvature distances, and for only 4
1650 2300
(b) speakers in the analysis of variance on durations.
2100
1500
1900 A V E R A G E D U R A T I O N PER SPEAKER
120
1700
13013 15666 100
1300 ~ 8O
z
~ 110~ ©
1101) 6O
"~ 40
900
90£
20
700
70( 0
300 4130 500 600 700 800 SI $2 $3 84 $5 $6 $7 $8 $9 S10 SI1 S12 S13 S14 SI5
F1 ( M E L ) SPEAKER
Fig. 10b. Clusters of short vowels for speaker S~5 who had the Fig. 11. Average duration in ms of the vowels from each
worst vowel separation according to Figure 9. speaker in the 6 'natural' test conditions.
Vol 12, No. 1, March 1993

ERRORPERCENTAGESPERCONDITION
This supports the view that sentence accent has a 60
minor influence on vowel quality compared to
word class and word stress.
50
4.4. Listening experiment

g 4o
In order to test the perceptual significance of the
acoustic measurements, we performed a listening
experiment with part of the data set. All segmented A
vowels from three speakers were used for this ~ NA
experiment. The speakers were selected on the basis 20 I I i
S NS F
of their vowel separation as shown in Figure 9.
SYLLABLETYPE
Speaker $1 and speaker S~5 were chosen, because
they represent extremes in terms of their vowel Fig. 12. Mean error percentages for the 6 'natural' test condi-
tions from a listening experiment with 24 subjects in which the
separation, and speaker $8 was added to represent
vowels from 3 speakers had to be identified.
the group of speakers with a mediocre vowel
separation.
The segmented vowels were presented to 24 lis- form (unique for Dutch) on stickers that were
teners (12 male and 12 female) in a blocked design. placed on the buttons in the following format:
They had their original duration as established O A E I U
with the semi-automatic segmentation procedure
described in Section 2. The listeners were for the OE OO AA EE IE UU EU
greater part students from the University of In this format vowels that might be confused
Amsterdam. In order to level out annoying loud- because of similar orthographic symbols or similar
ness differences between the stimuli, they were sounds, have been placed close to each other. On
scaled to a fixed maximum amplitude value. Artifi- the top row the vowels/~, u, ~, ], oe/are shown in
cial clicks were prevented by shifting the vowel their orthographic form, and on the bottom row
boundaries to the sample value closest to zero in an the v o w e l s / u , o:, a:, e:, i, y, 0:/. Although only 9
interval of 2 ms around the boundaries. All vowels monophthongs were actually present in the test
from the 7 different test conditions were placed material, all 12 Dutch monophthongs were offered
in a random order for each of the three speakers as response possibilities, because the vowels that
separately. The order of speakers was also random- were intended by the speakers could actually have
ized. The random ordering of stimuli and speakers been pronounced as any Dutch vowel. As soon
was different for each subject in the listening
experiment. In order to make the subjects familiar
Table 4
with the voice of each speaker, the 10 stimuli at the Mean error percentages for all test conditions from a
end of a speaker block were added at the beginning listening experiment with 24 subjects in which the vowels
of the series as trial items (without feedback). In from 3 speakers had to be identified. The error percen-
this way a total number of 723 stimuli were tages are given for each speaker separately
presented to the 24 listeners (3 speakers×7 Speaker

conditions × 33 test syllables + 30 trial items).
Condition S~ $8 S~5 Average
The listening test was done on line by each sub-
ject separately at the terminal of a ~tVAX com- ISO 13 18 23 18
S-A 21 22 28 24
puter. One listening session took about 45 minutes. S-NA 21 26 41 30
Subjects heard a stimulus and had to respond by NS-A 29 41 57 42
pressing one of 12 buttons on the keyboard of the NS-NA 35 54 71 53
terminal, representing the 12 Dutch monoph- F-A 31 50 66 49
F-NA 40 51 70 54
thongs. These were indicated in their orthographic
as subjects had responded, the next stimulus was unaccented vowels than for accented vowels. The
presented to them. This design allowed listeners to factor "syllable type" causes larger differences in
establish their own pace and it prevented listeners error percentages than the factor "accent".
from skipping responses. An analysis of variance with repeated measures
In Figure 12 the mean error percentages of the was done on the error percentages with the same
24 subjects for the 6 'natural' test conditions are design as used before. Since the data consisted of
shown, averaged over test syllables and speakers. proportions which are not normally distributed,
A response was defined as incorrect if it did not an inverse sine transformation was applied (Kirk,
agree with the vowel that was (or should have 1982). These transformed scores are more suitable
been) intended by the speaker. Notice the similarity for an analysis of variance than the original scores.
of Figure 12 with Figures 3, 5 and 6. The error Both the factor "accent" ( F = 34.1, p < 0.001) and
percentages increase in the order "stressed "syllable type" ( F = 58.0, p < 0.001) were signifi-
syllable . . . . unstressed syllable"-"function word". cant. Tests for pairwise contrasts between condi-
Furthermore, the error percentages are higher for tions showed that the conditions NS-NA, F-A and
Table 5
C o n f u s i o n m a t r i x in percentages for all vowels in the test syllables from the g r o u p R E D p o o l e d over three speakers. C o r r e c t scores
are i n d i c a t e d in italics
Intended Vowel responses

syllable u ~ a ~ I i y oe o: a: e: 0:
mut 77 2 1 0 0 2 8 7 0 0 0 l
kam 20 64 2 0 2 6 2 2 1 0 0 0
kan 20 73 1 0 1 1 0 0 3 0 0 0
t~)~ 1 62 13 1 0 0 0 12 9 0 0 2
dan l 8 68 0 1 0 0 8 3 7 0 3
ktan 2 19 63 2 0 0 0 9 0 2 0 1
la~( 0 21 74 0 0 0 0 3 0 2 0 0
ma Z 2 10 84 0 0 0 0 1 0 4 0 0
pas 2 48 23 0 1 0 0 20 0 1 0 3
van 8 8 62 1 0 2 5 13 0 0 0 0
was 6 51 24 1 0 0 0 10 5 2 0 2
ben 0 0 1 49 8 6 6 22 0 3 1 4
hep 0 0 0 42 49 0 3 4 0 0 2 0
men 3 0 2 62 2 2 15 11 0 2 0 0
per 0 1 1 13 3 0 0 75 0 0 0 7
wel 0 48 34 7 1 0 0 6 2 0 0 0
zet 0 0 1 47 18 0 0 27 0 0 1 6
h)~ 0 1 0 6 49 1 6 29 0 0 3 5
wIl 2 6 0 0 19 1 9 53 1 1 2 6
z~)~ 0 0 0 6 70 0 4 13 0 0 6 0
z~t 0 0 0 0 86 7 1 2 0 0 3 0
lit 0 0 0 2 25 56 6 5 0 0 2 2
nit 2 0 0 0 6 85 5 1 0 0 0 0
do:r 23 14 0 0 0 0 2 19 38 0 0 3
vo:r 38 21 0 0 1 0 5 17 15 0 0 1
da:r 0 0 5 3 0 0 1 7 0 70 0 13
la:t 0 4 46 1 0 0 0 7 0 41 0 0
na:r 2 0 19 2 1 0 1 7 0 63 0 4
wa:r 0 7 41 2 0 0 0 19 1 18 0 10
he:l 2 0 13 33 25 0 1 12 0 4 10 0
me: r 0 0 0 7 29 20 8 4 0 0 31 0
ve:l 0 5 6 7 19 0 1 37 2 0 12 10
we:r 0 0 0 9 34 2 16 3 0 0 28 7
Vol. 12, No, I, March 1993

F-NA (the group R E D ) were not significantly frequencies with respect to those of 'ideal' vowels
different from each other (p > 0.05). In addition, (condition ISO) will be considered in the following
the condition NS-A was not significantly different four conditions which turned out to be significantly
from the condition F-A (p > 0.05) for this limited different in Section 4.1.1:
data set (only 3 speakers). 1. Stressed syllables in accented content
The factor "speaker" was also significant ( F = words (S-A),
9.5, p<0.001). The error percentages in the test 2. Stressed syllables in unaccented
conditions for each speaker separately are shown content words (S-NA),
in Table 4. As might be expected, the vowels from 3. Unstressed syllables in accented
speaker S~ are more often correctly identified than content words (NS-A),
those from speaker $8 and speaker S~5, for whom 4. Unstressed syllables in unaccented
the error percentages are highest The rather high content words, and all function
error percentages for the condition ISO are mainly words (RED).
caused by confusions of long vowels with their The first model that could be used to predict spec-
short counterparts (and vice versa). tral vowel reduction is described with the following
Despite the fact that the acoustic reduction can formula:
be considerable, especially in the vowels from the
AF = a + bFtarget.
group RED, listeners are yet often capable to cor-
rectly identify them. Table 5 shows a confusion The input to this model is the steady-state target
matrix in percentages for the vowels in the test formant frequency Ftarget (either F~ or F2) of a
syllables from the group R E D pooled over the vowel in a specific consonantal context. Formant
three speakers. Most of the errors that are made in frequencies of vowels from the condition ISO are
the vowel identifications are confusions with neigh- considered to be target formant frequencies. The
bouring vowels or confusions of long vowels with output of the model is the amount of formant shift
their short counterpart (and vice versa). The vowel for this vowel (in the same consonantal context)
/ ~ / f r o m the sylable/w~l/was frequently confused due to one of the four experimental conditions
with the vowels / 3 / and /a/, and the vowel / e / mentioned above,
from the syllable / h e p / was frequently confused AF= Ftarget -- Freduced,
with the vowel / l / . This is in agreement with the where Ftarg~t denotes the steady-state target for-
observed shifts of formant frequencies discussed in mant frequency and Freauced the steady-state for-
Section 4.2. The 'centralization' effect is shown by mant frequency of the reduced vowel in a particular
the confusions with the central vowels /oe/ and test condition. The constants a and b of this 'static'
/¢ :/, which were not present in the actual experi- model were obtained from the present data set
ment, but only existed as response possibility. through a least square fit. They are given in Table
About 15% of all vowels from the group RED were 6 for F~ or F2 in each condition. The fit of the
confused with either t h e / c e / o r t h e / ¢ :/. various versions of the model indicated with the
correlation coefficients turns out to be fairly good.
4.5. Models of spectral vowel reduction
Table 6
Parameters for the 'static' reduction model in 4 different condi-
The present dataset, which is based on fairly
tions (see text). The intercept of the model is a, and the slope
natural speech, can be used to find models that is b. The fit of each model is indicated with the correlation
predict the amount of spectral reduction in vowels coefficient r
(i.e. the shift of steady-state formant frequencies).
In order to eliminate noise caused by physiological F~ F2
differences between speakers and idiosyncratic Condition a b r a b r
peculiarities, the average formant values (and cur- S-A -62 0.13 0.61 -138 0.13 0.77
vature values) of the 15 speakers will be used. In S-NA -161 0.38 0.88 -241 0.22 0.87
this way we can model the vowels of an 'average NS-A -158 0.39 0.79 -388 0.33 0.85
male speaker'. The shifts of steady-state formant RED -202 0.50 0.87 -483 0.41 0.87
The second model that could be used to predict determine the amount of formant shift independent
spectral vowel reduction is described with the of linguistic conditions, like, for instance, stress or
formula sentence accent.
The model parameters a and b were estimated
A F = a + b cUrtarget.
by Lindblom for three different consonantal con-
The input to this 'dynamic' model is the parabolic texts/b-b/,/d-d/and/g-g/with the data of one
target curvature cUrtarget for the formant track male speaker. These data consisted of 24 CVC non-
(either F~ or F2) of a vowel in a specific consonantal sense syllables (8 short vowels × 3 consonantal con-
context, calculated as described in Section 4.1.2. texts) uttered either at the beginning or at the end
Curvature values of vowels from the condition ISO of a carrier sentence and either with or without
are considered to be target curvature values. The a 'sentence accent'. The experimental conditions
output is again the amount of formant shift for this generated vowel durations in the range of approxi-
vowel (in the same consonantal context) due to one mately 100-250msec. Consequently, Lindblom's
of the four experimental conditions. Table 7 gives models are based on this range of durations. In
the model parameters for F~ and F2 in each condi- more natural speech, however, the durations of
tion. Since formant curvature and formant fre- vowels are usually shorter. In the present experi-
quency shifts are relative measures without ment, for instance, the durations of the short vow-
reference to absolute formant values, this model els in the 6 'natural' conditions varied in the range
can also be fitted to F~ and F2 simultaneously. The 50 95 msec (one standard deviation around the
model parameters for this combined model are mean). It is interesting to see that the F2-shifts pre-
given in the last column of Table 7. The dynamic dicted by Lindblom's equations can become rather
approach of this model may be more realistic than extreme as the vowel durations get shorter. The
the static approach of the former one, but the per- predicted F2-shifls for the / u / and the / a / i n the
formance of both models is comparable. s y l l a b l e s / d u d / a n d / g a g / , for instance, are larger
The third model we want to discuss is Lind- than 1000 Hz for vowel durations of 30 msec. Such
blom's (1963) model, which is described by the durations were not unusual for the short vowels in
following formula: the present experiment, but the formant shifts we
observed were never larger than about 500 Hz.
A F = a (Ftarget - Fonset) e -b dur.
Krull (1989), who investigated vowel reduction in
This model has two different inputs. The first input spontaneous speech, also reports F2-shifts that do
is the "locus-target" distance F t a r g e t - - F o n s e t , which not become larger than about 500 Hz. So it seems
is the difference between the target formant fre- unlikely that Lindblom's models, based on vowels
quency and the formant frequency at the initial with a rather long duration in nonsense syllables,
vowel boundary. This is a measure for the coarticu- can simply be extrapolated to vowels with shorter
latory effect between the vowel and the preceding durations that occur in more natural speech.
consonant and resembles the curvature measure of In the present experiment we did not systemat-
the 'dynamic' model mentioned above. The second ically vary the vowels in one particular consonantal
input is the total duration dur of the vowel seg- context as Lindblom did, so it was not possible
ment. According to Lindblom, these two measures to reliably determine the parameters of his model.
Table 7
Parameters for the 'dynamic' reduction model in 4 different conditions (see text). The intercept of the
model is a and the slope is b. The fit of each model is indicated with the correlation coefficient r
F1 F2 Fi and F2
Condition a b r a b r a b r
S-A -15 0.18 0.56 19 -0.24 0.83 4 -0.14 0.56
S-NA -23 0.52 0.77 28 -0.40 0.92 14 -0.33 0.80
NS-A -24 0.61 0.79 24 -0.63 0.92 6 -0.52 0.85
RED -28 0.77 0.86 30 -0.78 0.95 7 -0.65 0.89
Vol, 12, No. l, March 1993

Table 8
R a n k c o r r e l a t i o n s per vowel between steady-state f o r m a n t frequencies (F~ a n d F2) in mel a n d vowel
d u r a t i o n s a v e r a g e d over test syllables
u ~ u c [ i o: a: e:
F~ 0.56 0.80 0.81 0.80 0.50 0.57 0.16 0.82 0.50
F2 0.71 0.80 0.36 0.84 0.82 0.80 0.92 0.59 0.85
Nevertheless, we could check if there exists a mon- words' in a text that was read aloud by the same
otonic relation between vowel duration and spec- speaker. Apparently, the lack of a clear pronuncia-
tral reduction for each of the 33 CVC syllables tion of unstressed syllables can only naturally occur
separately. For this purpose we calculated the rank in existing words that were uttered thousands of
correlation between the duration of a vowel and its times by a speaker.
steady-state formant frequency (FI or F2) for each Apart from stress, it was found that spectral
CVC syllable. Each rank correlation was based on reduction is influenced by surrounding phonemes
average measurements from all test conditions (n = (see Figure 7). In general, vowels that are sur-
7). In Table 8 the rank correlations per vowel rounded by 'compatible' phonemes will retain a
(averaged over consonantal contexts) are given for rather high quality under various different linguis-
both F~ and F2. In most cases there seems to be a tic and extralinguistic circumstances. Vowels that
fair relation between vowel duration and formant are surrounded by 'incompatible' phonemes, on the
undershoot. However, we do not agree with Lind- other hand, will show effects of contextual assimila-
blom's view that spectral reduction is caused by a tion that can rapidly increase when the speech gets
decreasing vowel duration. We believe that both sloppier. This view on the relation between com-
duration and spectral quality usually decrease at patibility and the amount of spectral reduction is
the same time as the pronunciation gets sloppier. supported by the fairly good fit of the 'dynamic'
model discussed in Section 4.5 (if we regard the
amount of curvature of formant tracks in clearly
pronounced vowels as a measure of compatibility
5. Discussion
between the vowel and the surrounding
consonants).
In the present experiment it was found that the The factor word class can also play an important
spectral quality of vowels can be influenced by a role. In the present experiment it was found that
variety of factors. An important one is stress. In the vowels from monosyllabic function words with
polysyllabic words the vowels from stressed syl- primary stress had the same poor quality as
lables are generally more clearly pronounced and unstressed vowels from content words. This shows
hence closer to their target form than the vowels that the effect of stress on vowel reduction can be
from unstressed syllables. Stressing and destressing overruled by the factor word class. It is not clear
are properties that can only be assigned to existing whether the poor articulation of the vowels in func-
words; it is difficult to simulate the natural tion words is mainly related to their small 'semantic
destressing of syllables in nonsense words. A load', or to their high frequency of occurrence.
demonstration of this point is the attempt to gather Both factors may play a role, but we suspect that
'reduced' diphones for a Dutch speech synthesis frequency of occurrence is of greater importance,
system through the simulation of natural stress pat- because the frequency effect is also clearly present
terns in nonsense words (Drullman and Collier, across word classes. Van Coile (1987) found, for
1991). It appeared that the spectral quality of the instance, that vowel durations in read aloud speech
vowels from 'reduced' and 'unreduced' diphones could be better predicted by a word dichotomy
hardly differed. In addition, Koopmans-Van based on frequency of occurrence than by a word
Beinum (1992) showed that the vowels from these dichotomy based on word class (in its strict sense).
'reduced' diphones formed a larger vowel triangle Another factor that can influence the amount of
than the vowels from stressed syllables of 'focus spectral vowel reduction is sentence accent, which
is used to highlight parts of utterances. However, and the duration of his vowels (Section 4.3). Some
the results of the present experiment suggest that speakers produce vowels with relatively long dura-
this effect is of minor importance compared to tions and have a poor vowel separation, whereas
word stress and word class. This may be explained others produce vowels with relatively short dura-
by the fact that sentence accent is only assigned tions and have a clear vowel separation. This is
temporarily to words in specific situations, whereas most likely dependent on the particular speech style
word stress, word class, and frequency of occur- a certain speaker uses and not on physiological
rence are fixed properties of words. constraints of his articulatory organs. In our view
A final factor of importance with respect to spectral reduction is not caused by a decreasing
vowel reduction, that was not explicitly investiga- vowel duration, but both duration and spectral
ted in the present study, is speech style which covers quality usually decrease at the same time if the
many related phenomena. Each person speaks in pronunciation gets sloppier. This covariation may
his own way due to idiosyncratic features, and to occur frequently, but not necessarily. The sloppy
his geographic and sociolinguistic background. All pronunciation can in turn be caused by a great
speakers from the present experiment spoke 'stan- variety of factors such as stress, word class,
dard Dutch'. However, some had a rather acting emotion, etc.
style of reading, whereas others read in a more The amount of vowel reduction is dependent on
neutral style. Different speech styles can also be the specific combination of factors mentioned
used by one and the same speaker in different situ- above. However, these factors are only related to
ations; a conversation with friends requires a vowel reduction in a statistical sense. No combina-
different speech style than, for instance, an application of factors is able to block the occurrence of
tion for a job or a lecture. A less formal speech a clear vowel articulation; each speaker has the
style will usually lead to a sloppier pronunciation possibility to pronounce any speech sound as
and more vowel reduction (Koopmans-Van clearly as he likes, regardless of reduction factors.
Beinum, 1980; Van Bergem and Koopmans-Van For all the combinations of reduction factors there
Beinum, 1989; Krull, 1989; Duez, 1991). We only exists a certain probability that sloppy speech
regard differences in tempo also as differences in will occur and that the vowels will be reduced to
speech style. Tempo is often influenced by the emo- some extent. In general, this will be the case if the
tional state of the speaker. Someone who is, for particular speech part is of minor importance for
instance, excited about something may speak fast a proper understanding of the message by the
and rather sloppily, resulting in strongly reduced listener.
vowels. However, someone who is, for instance, The present study was exclusively concerned
angry may speak very fast and yet have a clear with acoustic vowel reduction. In future research
articulation. Thus, a higher speaking rate can cause we are planning to investigate the relation between
vowel reduction, but only if it evokes a sloppy acoustic and lexical vowel reduction, especially
pronunciation. with regard to word stress and frequency of occur-
According to Lindblom (1963) the degree of rence of words.
spectral reduction of a vowel is, apart from the
influence of surrounding phonemes, determined by
the duration of the vowel segment. His hypothesis
was that the articulators are unable to reach an 6. Conclusions
'ideal' vowel target position if there is not enough
time. Although superficially attractive, the clear In the present experiment it was found that
relation between vowel duration and spectral I. Sentence accent, word stress, and word class
quality has been contradicted by several other have a significant effect on the steady-state for-
investigations, e.g. (Gay, 1978; Nord, 1986; mant frequencies and on the duration of vowels.
Engstrand, 1988; Van Son and Pols, 1990). In the The effect of sentence accent is of minor import-
present study it was found that there is no clear ance compared to the effect of word stress and
relation between the vowel separation of a speaker word class.
Vol. 12, No. 1, March 1993

2. Spectral vowel reduction is not a tendency of R. Drullman and R. Collier (1991), "On the combined use of
vowels to centralize, but rather the result of an accented and unaccented diphones in speech synthesis", J.
Acoust. Soc. Amer., Vol. 90, pp. 1766 1775.
increased contextual assimilation. The observed D. Duez (1991), "Some factors affecting F2-patterns in spon-
centralization of formant patterns is a fre- taneous speech in French", Proc. ESCA Workshop
quently occurring and probably natural conse- Phonetics and Phonology o f Speaking Styles. Reduction and
quence of this assimilation process. In reduced Elaboration in Speech Communication, Barcelona,
vowels formant tracks become flatter. pp. 21:1 21:5.
O. Engstrand (1988), "Articulatory correlates of stress and
3. The perceptual significance of the acoustic
speaking rate in Swedish CVC utterances", J. Acoust. Soc.
measurements was confirmed in a listening Amer., Vol. 83, pp. 1863-1875.
experiment. There is a clear influence of the fac- G. Fant (1960), Acoustic' theory of speech production (Mouton,
tors sentence accent, word stress, and word class The Hague).
on the identification of vowels. K. I. Forster (1979), "Levels of processing and the structure of
the language processor", in Psycholinguistic Studies
Presented to Merrill Garrett, ed. by W.E. Cooper and
E.C.T. Walker, pp. 27-85.
M. Fourakis (1991), "Tempo, stress, and vowel reduction in
Acknowledgments American English", J. Acoust. Soc. Amer., Vol. 90,
pp. 1816-1827.
I would like to thank Louis Pols, Florien T. Gay (1978), "Effect of speaking rate on vowel forrnant move-
ments", J. Acoust. Soc. Amer., Vol. 63, pp. 223 230.
Koopmans-van Beinum, and Gitta Laan for their R.E. Kirk (1982), Experimental Design." Procedures for the
critical discussions about the experimental design Behavioral Sciences (Wadsworth, Belmont, CA).
of this investigation and for their useful comments D.H. Klatt (1989), "Review of selected models of speech per-
on earlier versions of this article. I would also like ception", in Lexical Representation and Process, ed. by
to thank Bj6rn Lindblom for his very thorough W.D. Marslen-Wilson (MIT Press, Cambridge, MA),
pp. 169 226.
review and many valuable suggestions.
F.J. Koopmans-Van Beinum (1980), Vowel contrast reduction:
An acoustic and perceptual study of Dutch in various
speech conditions, Doctoral Dissertation, University of
Amsterdam.
References F.J. Koopmans-Van Beinum (1992), "The role of focus words
in natural and in synthetic continuous speech: Acoustic
G. Altman and D.M. Carter (1989), "Lexical stress and lexical aspects", Speech Communication, Vol. 11, Nos. 4 5,
discriminability: Stressed syllables are more informative, pp. 439-452.
but why?", Comput. Speech Language, Vol. 3, pp. 265-275. D. Krull (1989), "Second formant locus patterns and consona-
J.L.G. Baart (1987), Focus, syntax, and accent placement, Doc- nt-vowel coarticulation in spontaneous speech", PER-
toral Dissertation, University of Leiden. 1LUS X, pp. 87 108.
D. Bolinger (1972), "Accent is predictable (If you're a mind- J.G. Kruyt (1985), Accents from speakers to listeners, Doctoral
reader)", Language, Vol. 48, pp. 633 644. Dissertation, University of Leiden.
D. Bolinger (1975), Aspects of Language (Harcourt Brace P. Ladefoged (1982), A Course in Phonetics (Harcourt, Brace,
Jovanovich, New York). Jovanovich, New York).
D. Bolinger (1985), "Two views of accent", J. Linguistics, Vol. K.F. Lee (1989), Automatic Speech Recognition: The Develop-
21, pp. 79 123. ment of the SPHINX System (Kluwer Academic Publish-
G.E. Booij (1981), Generatieve Fonologie van bet Nederlands ers, Boston).
(Her Spectrum, Utrecht), B.E.F. Lindblom (1963), "Spectographic study of vowel reduc-
D.M. Carter (1987), "An information-theoretic analysis of tion", J. Acoust. Soe. Amer., Vol. 35, pp. 1773-1781.
phonetic dictionary access", Comput. Speech Language, B. Lindblom and S.J. Moon (1988), "Formant undershoot in
Vol. 2, pp. I 11. clear and citation-form speech", PERILUS VIII,
CELEX-report (1985), Proposal for creating a national, multi- pp. 21 33.
lingual, lexical database, University of Nijmegen. J. Makhoul and L. Cosell (1976), "LPCW: An LPC vocoder
Collins Dictionary of the English Language (1989), Second Edi- with linear predictive spectral warping", Proc. Internat.
tion (Collins, London). Conf. Acoust. Speech Signal Process. 1976, pp. 466-469.
R.G. Daniloff and R.E. Hammarberg(1973),"On defining co- J.D. Markel and A.H. Gray (1976), Linear Prediction o f Speech
articulation", J. Phonetics, Vol. 1, pp. 239 248. (Springer, New York).
P. Delattre (1969), "An acoustic and articulatory study of vowel J. Morton (1969), "Interaction of information in word recogni-
reduction in four languages", IRAL, Vol. 7, pp. 295-325. tion", Psychological Rev., Vol. 2, pp. 165 178.
SpeechCommunication
D.R. van Bergem / Acoustic' vowel reduction 23
L. Nord (1975), "Vowel reduction Centralization or contex- D.R. Van Bergem (1991a), "The influence of sentence accent,
tual assimilation?", in Speech Communication, Vol. 2, ed. word stress, and word class on the quality of vowels",
by G. Fant (Almqvist & Wiksell, Stockholm), Proc. Eurospeech '91, Genova, Vol. 3, pp. 1455 1458.
pp. 149 154. D.R. Van Bergem (1991b), "Acoustic and lexical vowel reduc-
L. Nord (1986), "Acoustic studies of vowel reduction in tion", Proc. ESCA Workshop Phonetics and Phonology of
Swedish", KTH-QPSR, Stockholm 4, pp. 19 36. Speaking Styles." Reduction and Elaboration in Speech
A.V. Oppenheim and D.H. Johnson (1972), "Discrete represen- Communication, Barcelona, pp. 10 : 1 10:5.
tation of signals", Proc. IEEE, Vol. 60, pp. 681 691. D.R. Van Bergem, Submitted, to appear in Speech
L.C.W. Pols (1977), Spectral analysis and identification of Communication.
Dutch vowels in monosyllabic words, Doctoral disserta- D.R. Van Bergem and F.J. Koopmans-Van Beinum (1989),
tion, Free University of Amsterdam. "Vowel reduction in natural speech", Proc. Eurospeech
L.C.W. Pols, H.R.C. Tromp and R. Plomp (1973), "Frequency '89, Paris, Vol. 2, pp. 285-288.
analysis of Dutch vowels from 50 male speakers", J. B.M. Van Coile (1987), "A model of phoneme durations based
Acoust. Soc. Amer., Vol. 53, pp. 1093 1101. on the analysis of a read Dutch text", Proc. European
L.R. Rabiner, S.E. Levinson and M.M. Sondhi (1983), "On the Conf. on Speech Technology, Edinburgh~ Vol. 2~
application of vector quantization and Hidden Markov pp. 233- 236.
Models to speaker-independent, isolated word recogni- R.J.J.H. Van Son and L.C.W. Pols (1990), "Formant frequen-
tion", Bell Syst. Tech. J., Vol. 62, pp. 1075-1105. cies of Dutch vowels in a text, read at normal and fast
U. Sfftlhammar, I. Karlsson and G. Fant (1973), "Contextual rate", J. Acoust. Soc. Amer., Vol. 88, 1683 1693.
effects on vowel nuclei", KTH-QPSR, Stockholm 4, R.J.J.H. Van Son and L.C.W. Pols (1991), "The influence of
p p . l 18. speaking rate on vowel formant track shape as modelled
T. Svendsen and F.K. Soong (1987), "On the automatic seg- by Legendre polynomials", Proc. Institute of Phonetic
mentation of speech signals", Proc. Internat. Conf. Acoust. Sciences Amsterdam, Vol. 15, pp. 43 61.
Speech Signal Process, pp. 77 80. R.J.J.H. Van Son and L.C.W. Pols (1992), "Formant move-
J. 't Hart, R. Collier and A. Cohen (1990), A Perceptual Study ments of Dutch vowels in a text, read at normal and fast
of Intonation. An Experimental-phonetic Approach to rate", J. Acoust. Soc. Amer., Vol. 92, pp. 121 127.
Speech Melody (Cambridge Univ. Press, Cambridge). C. Van Wijk and G. Kempen (1980), "Funktiewoorden Een
D.R. Van Bergem (1989), "Phonetic and linguistic aspects of inventarisatie voor het Nederlands", ITL, Rev. Appl. Lin-
vowel reduction", Proc. Institute of Phonetic Sciences guistics, Vol. 47, pp. 53 68.
Amsterdam, Vol. 13, pp. 97 105. L.F. Willems (1986), "Robust formant analysis", IPO-APR,
D.R. Van Bergem (1990a), "In defense of a probabilistic view Vol. 21, pp. 34 40.
on human word recognition", Proc. Institute of Phonetic
Sciences Amsterdam, Vol. 14, pp. 53 66.
D.R. Van Bergem (1990b), 'The influence of linguistic factors
on vowel reduction", Proe. Linguistics and Phonetics '90,
Charles University, Prague, pp. 427 436.
V o l 12, No. 1, March 1993

Acoustic Vowel Reduction As A Function of Sentence Accent, Word Stress, and Word Class

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Acoustic Vowel Reduction As A Function of Sentence Accent, Word Stress, and Word Class

Uploaded by

Copyright:

Available Formats

Speech Communication 12 (1993) 1 23 l

Acoustic vowel reduction as a function of sentence accent,

Received 31 March 1992

I. Introduction recognition strategy (Van Bergem, 1990a).

V o l 12, NO. I, March 1993

Vol. 12, No. 1, March 1993

Table 1 shifted in steps of 2 ms. On each analysis frame a

in which N denotes the total number of cepstral

.. ..... ,., .....

Vol 12, No. l, M a r c h 1993

about 9% of all cases a proposed boundary was SHORT VOWELS

rejected. This occurred when the spectral variation F1 ( H z )

from consonant to vowel or from vowel to conson- 1500

the markers were placed by hand at a more appro- 5"

The steady-state vowel part was subsequently z~• 0 •*O 1100

defined as that part of the vowel where a minimum

between the onset and the offset of the vowel * O

position was hand-corrected (in about 6% of all 700

4.1. Test conditions

4.1.1. Steady-state formant frequencies F1 ( H z )

i/ in each test condition have been plotted, aver- 1500

long vowels / o : , a:, e:/. The average positions of 1300 15130

vowels from test syllables spoken in isolation (con-

of the schwa. The position of the schwa was deter-

Vol. 12. No. I, March 1993

V o l 12, No. I, March 1993

Table 2 T H E V O V ' [ E L / F_./

Short vowels Long vowels 2100

+ stress 82 82 150 131 hep ....*.-.i~* 1700

ant (p < 0.001).

4.2. Centralization or contextual assimilation

Many acoustic studies on vowel reduction, e.g. 700

1980; the present study) suggest that reduced vow-

Vol. 12, No. I, March 1993

Table 3 Rabiner et al. (1983), which resembles the F-ratio,

% ture distances ( F = 2 . 7 , p < 0 . 0 0 1 ) , and for dura-

13013 15666 100

Vol 12, No. 1, March 1993

4.4. Listening experiment

presented to the 24 listeners (3 speakers×7 Speaker

Intended Vowel responses

Vol. 12, No, I, March 1993

Vol, 12, No. l, March 1993

Vol. 12, No. 1, March 1993

V o l 12, No. 1, March 1993

You might also like

+ stress 82 82 150 131 hep .....-.i~ 1700