Professional Documents
Culture Documents
www.elsevier.nl/locate/specom
Abstract
The present tutorial paper is addressed to a wide audience with dierent discipline backgrounds as well as variable
expertise on intonation. The paper is structured into ®ve sections. In Section 1, ``Introduction'', basic concepts of in-
tonation and prosody are summarised and cornerstones of intonation research are highlighted. In Section 2, ``Functions
and forms of intonation'', a wide range of functions from morpholexical and phrase levels to discourse and dialogue
levels are discussed and forms of intonation with examples from dierent languages are presented. In Section 3,
``Modelling and labelling of intonation'', established models of intonation as well as labelling systems are presented. In
Section 4, ``Applications of intonation'', the most widespread applications of intonation and especially technological
ones are presented and methodological issues are discussed. In Section 5, ``Research perspective'' research avenues and
ultimate goals as well as the signi®cance and bene®ts of intonation research in the upcoming years are outlined. Ó 2001
Elsevier Science B.V. All rights reserved.
Zusammenfassung
Dieser Uberblicksartikel richtet sich an eine breite Leserschaft aus unterschiedlichen Disziplinen und mit unter-
schiedlichem Kenntnisstand hinsichtlich der Intonationsforschung. Der Beitrag ist in f unf Abschnitte gegliedert. In
Abschnitt 1, ``Introduction'', werden die grundlegenden Konzepte der Prosodie und Intonation und die wichtigsten
Richtungen der Intonationsforschung skizziert. Eine Bandbreite von Funktionen der Intonation, von der morpho-
lexikalischen und Phrasenebene bis zur Ebene des Diskurses und Dialogs, werden in Abschnitt 2, ``Functions and forms
of intonation'', diskutiert, und Formen der Intonation werden anhand von Beispielen aus verschiedenen Sprachen il-
lustriert. In Abschnitt 3, ``Modelling and labelling of intonation'', werden etablierte Intonationsmodelle sowie Systeme
der Annotation und Etikettierung intonatorischer Merkmale pr asentiert. Die am weitesten verbreiteten Anwendungen
der Intonation, mit einem Schwerpunkt auf der Sprachtechnologie, werden in Abschnitt 4, ``Applications of intonation'',
vorgestellt, und es werden einige methodische Probleme diskutiert. Schlielich werden in Abschnitt 5, ``Research per-
spectives'', mogliche Richtungen und Ziele f ur zuk
unftige Forschungsarbeiten auf dem Gebiet der Intonation aufge-
zeigt. Ó 2001 Elsevier Science B.V. All rights reserved.
Resume
Cet article de synthese s'adresse
a un large public provenant de dierentes disciplines mais egalement aux specialistes
de l'intonation. Il comprend cinq parties. Dans la premiere partie, ou Introduction, sont brievement rappeles les con-
cepts de base de l'intonation et de la prosodie et mises en lumiere les periodes charnieres de la recherche intonative.
*
Corresponding author. Tel.: +46-500-448-915; fax: +46-500-448-949.
E-mail address: antonis.botinis@isp.his.se (A. Botinis).
0167-6393/01/$ - see front matter Ó 2001 Elsevier Science B.V. All rights reserved.
PII: S 0 1 6 7 - 6 3 9 3 ( 0 0 ) 0 0 0 6 0 - 1
264 A. Botinis et al. / Speech Communication 33 (2001) 263±296
Dans la deuxieme partie, Fonctions et formes de l'intonation, un large eventail des fonctions des niveaux morpholexical,
phrastique, discursif ou dialogal est examine; des formes intonatives provenant de langues dierentes sont presentees et
comparees. Dans la troisieme partie, Modelisation et transcription de l'intonation, on fait reference aux modeles courants
de l'intonation et aux systemes d'etiquetage operationnels. Dans la quatrieme partie, Les applications de l'intonation,
sont presentees les applications les plus courantes de l'intonation, plus particulierement les applications technologiques;
une discussion est engagee sur les problemes methodologiques. Dans la cinquieme partie, Perspectives de recherche, des
directions de recherche sont tracees; on precise les buts a atteindre et on souligne le sens et l'inter^et de la recherche
intonative pour les annees a venir. Ó 2001 Elsevier Science B.V. All rights reserved.
extralinguistic (or non-linguistic) function with further widely acknowledged with reference to the
reference to personal characteristics and indexing, organisation of text, discourse and dialogue as well
such as sex, age and socio-economic status. as various interactive functions and considerable
research is being carried out with reference to these
1.2. Background of intonation areas (Brazil et al., 1980; Brown et al., 1980;
Brown and Yule, 1983; Brazil et al., 1997).
Aspects of intonation and tonal distinctions Experimental phonetics, although dating from
have been studied throughout man's literate his- earlier times in Europe and the USA (e.g.
tory. In classical Greece e.g., Plato and Aristotle Wheatstone, 1837; Helmholtz, 1877; Bell, 1879),
discuss the prosodic system and basic questions has had a crucial turning point with the historic
about accentual distinctions are raised. The term invention of the spectrograph in the 1940s (re-
itself is derived from the Greek ``tonos'' (tension) ported in Potter, 1945; Koenig et al., 1946; Potter
through the Latin ``intonatio'' and old French et al., 1947). Ever since, the steady development of
``intonation''. In modern times, intonation has sophisticated laboratory devices and the increased
been studied extensively from both a theoretical interest on prosodic phenomena have contributed
and experimental point of view. to an upstepping of research on the main aspects
In the framework of structural linguistic theory, of intonation. Thus, signi®cant research has been
particularly in the ®rst half of the 20th century, conducted on the physiology of intonation (La-
formal descriptions of phonological systems and defoged and McKinney, 1963; Ladefoged, 1967;
distinctions among dierent languages are estab- Lieberman, 1967; Collier, 1975; Atkinson, 1978;
lished and the role of prosody in linguistic analysis Ohala, 1978), the acoustics of intonation (Had-
and theory is discussed (Bloom®eld, 1933; Tru- ding-Koch, 1961; Lehiste, 1970; Cooper and
betzkoy, 1939; Martinet, 1954; Hockett, 1955; Sorensen, 1981), as well as the perception of in-
Malmberg, 1967). Stress, pitch, and juncture tonation (Fry, 1958; Hadding-Koch and Studdert-
variations are classi®ed and up to four pitch levels Kennedy, 1964; Rossi, 1978; t'Hart et al., 1990).
are distinguished (cf. Pike, 1945; Trager and Although the main bulk of research has been
Smith, 1951). The role of intonation in linguistic conducted on functional aspects of intonation,
theory is also emphasised in the framework of microprosodic eects, i.e., the eects of dierent
generative (transformational) grammar, especially segments on the realisation of F0 , have also been
in the second half of the 20th century, with studied extensively (see Lehiste, 1970; Di Cristo
mainstream work on the relation between intona- and Hirst, 1986; Fischer-Jùrgensen, 1990; Whalen
tion and syntax as well as semantics (e.g., Bresnan, and Levitt, 1995; Fourakis et al., 1999). Reports
1971; Chomsky, 1972; Jackendo, 1972; Stock- and publications with intonation data have grown
well, 1972). Tonal analysis also has a central place immensely during the past two or three decades
in autosegmental phonology, i.e., post-generative and the repertoire of knowledge on a considerable
phonology (see Goldsmith, 1976a,b, 1990). Into- number of dierent languages is getting steadily
nation and information structure relations are also bigger. Contrastive and dialectal studies are also
investigated and basic thematic notions with in- accelerating research areas and language-depen-
tonation correlates denoting the most important dent as well as language-independent intonation
part of the utterance are brought to light (e.g., features are continuously brought into light
Bolinger, 1958; Danes, 1960; Halliday, 1967; (Garding, 1977a; Bruce and G arding, 1978;
Lambrecht, 1996). The ultimate interpretation of Garding et al., 1982; Beckman and Pierrehumbert,
an utterance into a given context, i.e., the prag- 1986; Vaissiere, 1995).
matics of intonation, as well as the relation of in- Accumulated knowledge on the nature of
tonation to the intended meaning, in the broad intonation, mostly on forms and functions, re-
framework of the speech act theory (see Searle, sulted to the development of intonation models
1969, 1976, 1979), has also drawn considerable with predictive power which, formalised mainly
attention in the study of intonation. Intonation is in the 1970s (e.g. Bruce, 1977; Thorsen, 1978;
266 A. Botinis et al. / Speech Communication 33 (2001) 263±296
Pierrehumbert, 1980), have been tested and ap- lands), Department of Linguistics and Phonetics at
plied to a considerable number of languages Lund University (Lund, Sweden), Department of
(Garding et al., 1982; Cutler and Ladd, 1983; Hirst Linguistics at the Ohio State University (Colum-
and Di Cristo, 1998a). The modelling of intona- bus, USA), and AT&T Research Labs and Lucent
tion, in addition to forms and functions, is closely Technologies Bell Labs (New Jersey, USA) ought
related to labelling and transcription of intonation to be mentioned. Signi®cant research is however
for which several systems have been proposed, being carried out in phonetics laboratories and
among them Tone and Break Indices (ToBI), ap- linguistics departments in most of Europe and the
plied initially in American English (see Silverman USA as well as by individual researchers and re-
et al., 1992; also Beckman and Ayers, 1997) and search groups worldwide.
INternational Transcription System for INTona- Intonation is a central concern for many es-
tion (INTSINT), applied in several languages (see tablished as well as emerging disciplines ranging
Hirst and Di Cristo, 1998a). from theoretical linguistics and experimental
The signi®cance of intonation is also widely phonetics to computer sciences and signal pro-
acknowledged with reference to speech and lan- cessing. As any phonetics area, intonation has in-
guage technology, speech pathology and phoniat- terdisciplinary dimensions with reference to speech
rics, as well as applied linguistics and language physiology, speech acoustics, and speech percep-
education. In speech synthesis, e.g., the contribu- tion, which are related to human anatomy, physics
tion of intonation is not only con®ned to the in- and auditory sciences, respectively. On the other
telligibility of tonal and prosodic distinctions such hand, intonation applications are related to lan-
as stress, focus and phrase boundaries but has a guage technology, language pathology and lan-
decisive eect to the naturalness of the system as a guage education. In short, intonation studies and
whole (for evaluation of speech synthesis see, research may be mostly found in phonetics and
among others, Monaghan and Ladd, 1990; Van linguistics departments and, to a considerable de-
Santen, 1993; Veronis et al., 1998; Tatham and gree, in various language, technology, and medi-
Morton, 2000; Di Cristo et al., 2000; for an over- cine university departments. In addition, research
view of speech synthesis systems in Europe see and applications of intonation are carried out in
Monaghan, 1998). research institutes and industrial companies with
reference to high technology speech and language
1.3. Study and research of intonation products.
complex structures with fairly continuous forms 2.1.1.1. Tone distinctions. Tone is associated with
into local and global features, which may have dierent tonal patterns within a syllable, which
discrete linguistic functions. Although much re- may have a distinctive function with reference to
mains to be done, there is considerable knowledge static and dynamic tones (e.g., low versus high or
on basic aspects of functions and forms of into- rising versus falling) in lexicon/morphology. Chi-
nation at lexical, phrase, and sentence levels as nese, e.g., regardless of the neutral tone, has four
well as steadily accelerating knowledge at dis- distinctive tones, H, R, L, and F, also referred to
course and dialogue levels. as Tones 1, 2, 3, 4, respectively, and determined by
the height and shape of the tonal patterns. These
2.1. Functions of intonation tones are described as high-level (H), mid-rising
(R), low-dipping (L), and high-falling (F). Polysyl-
The main functions of intonation are centred labic words may have dierent combinations of
round the notions of prominence, grouping and contrastive tones in a row whereas monosyllabic
discourse, which are related to various grammati- words may have all four contrastive tones with the
cal components as well as linguistic levels. Prom- corresponding transcription such as m a `mother',
inence is related to weight structuring of linguistic ma `hemp', ma `horse', and ma `scold'. Thus, tone,
units such as syllables and words. Grouping is much like segmental but not prosodic distinctions
related to coherence and segmentation structuring in general, is in a paradigmatic contrast. Tones in
of speech units into prosodic units. Discourse is languages where the contrast is based on the height
related to structuring of prosodic units with ref- of the tones are known as register tones whereas
erence to topics of discussion and turn regulations tones involving contrastive shapes are known as
between speakers involved in a conversation. In- contour tones.
tonation is thus involved in linguistic structuring
with variable distinctive functions in accordance 2.1.1.2. Stress distinctions. Stress is associated with
with the level of application. syllabic prominence which, in combination with
other prosodic features (i.e., duration, intensity,
2.1.1. Lexical functions and vowel quality), may have a distinctive function
Intonation may have a distinctive function at in lexicon/morphology. The word 0 nomos `law' in
lexical level with reference to the prosodic cate- Greek e.g., with stress in the ®rst syllable, has a
gories of tone, stress and accent and thus languages dierent lexical meaning from no0 mos `county',
with corresponding distinctions are often referred with stress in the ®nal syllable, and stress distri-
to and classi®ed as tone languages (such as Chi- bution is thus more or less free, i.e., has a dis-
nese, Thai and Vietnamese), (dynamic) stress lan- tinctive function. Greek has a relatively simple
guages (such as Greek, Italian, Russian and stress pattern, i.e., one stress per word at lexical
Spanish) and (pitch) accent languages (such as level in principle, whereas other languages such as
Japanese and Swedish). Although the over- Germanic may have a complex stress pattern
whelming majority of the European languages are re¯ecting the lexical and morphological constitu-
stress languages, tone languages are widespread in ency of the words. Traditionally, the more
many parts of the world including Asia, Australia, prominent stress is referred to as main (or pri-
Africa and America whereas accent languages are mary) stress and the less prominent stress as sec-
found in many language families with either stress ondary stress. In some languages, however, stress
or tone dominant distinctions. It should be noted, distribution is bound on a syllable and has thus a
however, that a prosodic taxonomy is mainly in- demarcative function, i.e. the lexical boundaries
dicative rather than restrictive. In Chinese, e.g., in may be predicted from the position of stress.
addition to tones there are stress distinctions Finish and Polish e.g. have stress distribution on
whereas, in Swedish, there are accent as well as the ®rst and second last (penultimate) syllable,
stress distinctions. On the other hand, Japanese respectively. Stress represents a typical syntag-
has only accent distinctions. matic contrast, i.e., a syllable stands out and is
268 A. Botinis et al. / Speech Communication 33 (2001) 263±296
more prominent in relation to other syllables at foreground/background terminology and thus fo-
word domain. cus, somewhat simpli®ed, has a highlighting
function associated with the most important in-
2.1.1.3. Accent distinctions. Accent is also associ- formation in a speech unit. On the other hand,
ated with syllabic tonal prominence in lexicon/ although focus traditionally refers to information
morphology. The tonal pattern is however the structuring, in much of the current international
distinctive factor, in comparison to stress the tonal literature as well as here focus simply refers to the
pattern of which may have a large variability. In prosodic distribution. Emphasis and contrast, with
Japanese, accent may have a three-way distinction the corresponding prosodic terminology emphatic
in series of words like 0 kaki `oyster', ka0 ki `hedge' stress and contrastive stress, have also a high-
and kaki with ®rst syllable versus last syllable lighting function with more or less similar use to
versus accentless distribution. In Swedish, on the the focus one (for relevant discussion see Hirst and
other hand, the words tanken `tank' and tanken Di Cristo, 1998a).
`thought' do have accent distinction on the ®rst
syllable, i.e., acute (accent 1) versus grave (accent 2.1.2.2. Tonal phrasing. Phrasing (also referred to
2). Furthermore, the ®rst syllable is more promi- as grouping) is associated with the segmentation of
nent than the second one and thus, in Swedish, utterances into variable prosodic units and pro-
stress distribution is a basic condition for the ap- sodic theory and phonological studies refer to
plication of accent. Accent has a distinct classi®- several prosodic categories and units ranging from
cation in relation to tone and stress with regards to syllable to utterance (Silkerk, 1984; Nespor and
prosodic typology, although a syntagmatic con- Vogel, 1986). For intonation description and
trast, much like stress, is the most usual descrip- analysis however the syllable, stress group and in-
tion (cf. Garde, 1968). tonation phrase are mostly acknowledged (cf.
Thorsen, 1978; Botinis, 1989; Hirst, 1993; Hirst
2.1.2. Phrase and sentence functions and Di Cristo, 1998b).
At phrase and sentence levels intonation may be The syllable may be an anchoring point of tonal
associated with prominence relations and phrasing structures associated with prosodic distinctions at
as well as sentence type variations and distinctions. lexical level, i.e., tone, stress or accent. The basic
distinction between stressed and unstressed sylla-
2.1.2.1. Tonal prominence. Tonal prominence, bles, e.g., may determine the distribution of tonal
apart from prosodic distribution at lexical level ``turning points'', i.e., an abrupt tonal change as-
(see above), is mainly associated with focus (and sociated with the stressed syllable, and may thus
more or less synonymous terms such as nucleus, have a substantial eect on the structure of into-
sentence stress and focal accent) distribution at nation.
phrase and/or sentence levels. Focus has been The stress group (and fairly synonymous terms
studied from a wide range of perspectives with such as foot, tonal unit, prosodic word, etc.) is the
reference to linguistic theory and thus syntactic, immediate prosodic unit above the syllable and
semantic, pragmatic and information structures refers to a stressed syllable and any unstressed
and representations (see, among others, Gus- syllable(s) up to, but not including, the next
senhoven, 1984; Rossi, 1985; Ladd, 1996; Lambr- stressed syllable. A prosodic sequence of stress and
echt, 1996; Cruttenden, 1997; Hirst and Di Cristo, unstressed syllables is thus structured into stress
1998a). Focus has mostly been related to presup- groups, which may correlate with distinct tonal
position, according to which focus designates new gestures the onset and oset of which are aligned
information and presupposition shared or old in- with the beginning and end of the corresponding
formation in a speaker±listener relation (cf. Jack- stress groups. The distribution and relation among
endo, 1972). The focus/presupposition concept stress groups within an utterance de®ne the
may be found in a similar or closely related use rhythm, which is a basic characteristic of lan-
with rheme/theme, comment/topic, new/given and guages with dierent prosodic structures.
A. Botinis et al. / Speech Communication 33 (2001) 263±296 269
The intonation phrase (also referred to as in- related to the analysis of read texts, within the
tonation unit, prosodic sentence, breath group, etc.) subject-area of text linguistics, referred to as text
refers to coherent intonation structures with no intonation in current literature.
major prosodic break. An intonation phrase may
consist of a single syllable up to syntactic phrases, 2.2. Forms of intonation
clauses and sentences and a larger utterance may
thus be pronounced with one or several intonation Intonation is based on the vibration of the vocal
phrases, which may not have any predictive one- folds, which is an inherent characteristic of the
to-one correspondence with syntactic or semantic speech production process and thus, in other
units. Accordingly, the correlation of intonation words, once there is speech there is normally in-
phrase boundaries and syntactic boundaries is tonation too. Monotonous intonation would be
casual rather than causal. On the other hand, in- laborious to maintain from a physiological point
tonation phrases may be associated with dierent of view, as there are variations of subglottal
sentence types, which may de®ne as statements, pressure due to biological reasons such as breath-
questions, commands, etc. Intonation phrasing ing. On the other hand, from a perceptual point of
may also be related to information units, with view, monotonous intonation would be tiresome
reference to which speech units should be marked and uninteresting, which is not compatible with
as more or less autonomous information units on the fundamental function of speech to open and
behalf of the speaker. maintain a channel of information exchange. Once
In between the stress group and the intonation intonation variations are inherently related to
phrase, acknowledged for the description and speech production, an attribution of distinctive
analysis of dierent languages, an intermediate functions ful®ls a basic principle of speech com-
category has been suggested, i.e. intermediate munication economy, i.e., to produce the maxi-
phrase, according to which an intonation phrase mum linguistic information with the least eort.
may be decomposed into several intermediate The forms of intonation are the merger of
phrases (see Beckman and Pierrehumbert, 1986; various physiological, linguistic, paralinguistic and
Pierrehumbert and Beckman, 1988). extralinguistic contributions into any speech unit
in principle. The physiology of voiced sounds is
2.1.3. Discourse and dialogue functions associated with measurable tonal production, as a
At discourse and dialogue levels intonation may result of vocal folds vibrations, whereas voiceless
structure larger speech units above sentence level sounds are missing tonal production. There is
in dierent ways, in accordance with intraspeaker however perceptual concatenation and thus into-
as well as interspeaker interactive functions be- nation is perceived in a continuous rather than a
tween/among speakers. Discourse intonation and gap-like way. Furthermore, microprosodic vari-
dialogue intonation are more or less overlapping ability is considerable and thus high vowels may
terms and may be found rather interchangeably in have higher tonal realisation than low vowels,
the international literature. Somewhat simpli®ed, voiceless stops may trigger higher tonal onset of
discourse and dialogue intonation may structure the succeeding vowel than voiced stops, etc. Lin-
thematic units such as topics and sub-topics, i.e., guistic categories such as stress and focus may be
what the discussion is about and aspects of the associated with higher tonal patterns and/or tonal
discussion respectively, as well as turn units be- changes whereas ®nality of variable speech units
tween speakers such as turn-taking, turn-keeping such as phrase, sentence and discourse may be
and turn-leaving interplay, i.e., the contribution of associated with a lowered intonation and/or tonal
each speaker to the development of spoken dis- changes. Paralinguistic factors such as excitement,
course. In phonetic studies, both terms usually involvement and aggressiveness may increase the
refer to the study of spontaneous speech in contrast tonal range whereas sadness, boredom and indif-
to controlled read speech in a laboratory condi- ference may decrease tonal variations (although
tion, i.e. lab speech. The study of intonation is also there is large variability among dierent speakers
270 A. Botinis et al. / Speech Communication 33 (2001) 263±296
and languages). Extralinguistic features such as tinuous form irrespective of word boundaries.
age and sex have a physiologically determined ef- Third, tonal distribution at lexical level is in a
fect on the height of intonation, which mainly trade-o relation with higher level intonation such
depends on the size and form of the vocal folds, as phrase, sentence and discourse and thus a de-
i.e., smaller vocal folds produce higher intonation, composition of intonation associated with each
and hence the higher intonation of children versus level is a standard procedure for intonation de-
women versus men. Furthermore, hierarchical re- scription and analysis. Tonal analysis of lexical
lations, cultural attitudes and socioeconomic sta- words in citation forms may be found in the lit-
tus, among other extralinguistic factors, may have erature whereas key words in simple, declarative
considerable eects on intonation. sentence carriers, pronounced with no special fo-
In prosodic studies, including intonation, the cus or emphasis, is a standard context for into-
``isolation method'', i.e. the analysis of a phe- nation analysis at lexical level, as there is limited
nomenon at a time is a standard method and the interference from higher level intonation. Intona-
choice of the speech materials is in accordance tion forms associated with the latter context are
with the objectives of the analysis. Even ``non- usually referred to as a ``neutral'' intonation (for
sense'' materials, i.e. speech productions with no factors and principles of tonal distribution see
meaning are fairly common, much like other as- Monaghan, 1993).
pects of experimental phonetics. In spontaneous
speech and dialogues, however, which are the most 2.2.1.1. Tone patterns. Fig. 1 shows tonal distri-
authentic types of speech production, the speech bution of four contrastive tones in (Mandarin)
material is more or less unrestrictive and the de- Chinese in one-word declarative utterances con-
composition of intonation and the factoring out of sisting of a segmentally homophonous syllable,
dierent contributions on a speech unit are much i.e., m
a, ma, ma and ma.
more complicated. No matter what type of speech The tonal pattern of the four contrastive tones
material or what particular aspects of intonation is in fairly good accordance with traditional clas-
are analysed, the decomposition of intonation is si®cations. The high, rising, low and falling tones
usually based on a few dimensions and parame- have relatively high-level, rising, convex falling-
ters. Two dimensions are most relevant, i.e. a local rising and falling tonal shapes respectively. Thus,
and a global one, whereas the parameters are the high tone is typically static whereas the rising
mostly related to tonal change events and tonal and falling tones, and even the low tone, are more
range magnitudes. The alignment of intonation or less dynamic.
with the segmental realisation of the speech ma- Fig. 2 shows tonal distribution of four con-
terial is also an important aspect of intonation trastive tones in (Mandarin) Chinese in a simple
analysis (see Bruce, 1977; Botinis, 1989; House, declarative utterance context (Xu, 1999). The
1990).
speech material consists of three words, two di- 2.2.1.2. Stress patterns. Fig. 3 shows tonal distri-
syllabic and one intervening monosyllabic, i.e., ®ve bution of stress distinctions of Greek in a simple
syllables, pronounced in a neutral way. All sylla- declarative utterance context. The speech material
bles have a high tone except for the ®rst word's consists of the test words 0 nomos `law' and no0 mos
second syllable, which has all four contrastive `county' in the context i ma0 ria 0 iksere to ±± ka0 la
tones, i.e., HHm aomõfHRm aomõ HLm aomõ `Maria knew the ±± well' pronounced in a neutral
HFm aomõ} Hm o Hm aomõ `Kitty' {`cat-fan' way (Botinis, 1989, 1998).
`cat-rice' cat-honey'} `touches' `Kitty'. The stressed syllables of the test words are in-
The basic tonal patterns of the four contrastive cluded in a tonal gesture, i.e. a tonal hop, which
tones are also reasonably maintained in the ut- consists of three phases: a rise, a plateau, and a
terance context, with reference to the high versus fall. The onset of the tonal rise is aligned with the
low tonal levels in combination with the falling very beginning of the stressed syllable, i.e. the
versus rising tonal movements. However, whereas consonant, and the oset is completed at the end
the sequence of the high tones is associated with a of the stressed syllable. The tonal plateau spans the
rather even tonal pattern throughout the utter- poststressed syllable whereas the tonal fall is
ance, the alternation of consecutive tones triggers completed by the beginning of the next stressed
two main tonal coarticulation eects: anticipatory syllable, although, in this context, the tonal fall is
and carryover. These eects are related to the fairly suppressed. This suppression depends
falling and low tones, which trigger a tonal raising mainly on tonal coarticulation eects related to the
as well as a tonal lowering of the preceding and next and ®nal stressed syllable, which is subjected
following high tones respectively. The rising tone to the ®nal juncture interference, triggering a tonal
has also a similar eect with regard to the pre- lowering. Thus, the tonal gesture spans the whole
ceding syllable but hardly the following one (see stress group, irrespective word boundaries. The
Xu and Wang, 2001, this issue). The raising of the division of the speech material into stress groups is
preceding tone and lowering of the following tone evident across the test sentences, which consist of
are related to the notion of downstep, i.e., the four lexical words each and thus four stressed
tendency of consecutive tones to form a rightward syllables and the respective stress groups.
lowering pattern (see also Section 2.2). The tonal rise, whenever associated with a
The later part of the syllable, which exhibits the stressed syllable, is aligned with the very beginning
maximum contrast in tone production, is also as- of the stressed syllable as a rule in Greek whereas
sumed to be the most relevant part in tone per- the tonal plateau and the tonal fall may show
ception as the tonal pattern of the earlier part of considerable variability, which depends on the
the syllable is subject to both perturbation by the immediate prosodic context. The tonal plateau
initial consonants (cf. Hombert, 1978) and carry- mainly depends on the size of the interstress in-
over in¯uence by the preceding tone (Xu, 1999; Xu terval whereas the tonal fall may be truncated, e.g.
and Wang, 2001, this issue).
in tonal upsteping patterns where the tonal rise Tonal distribution has traditionally been as-
and the tonal plateau, but not the tonal fall, are sumed as the main perceptual factor for stress
the tonal correlates for consecutive stress groups. distinctions. Even a hierarchy has been proposed
In Greek, the entire syllable rather than the vocalic according to which: (1) a change in F0 , (2) in-
part is the relevant unit of stress as, apart from creased duration, (3) increased intensity, and (4) a
tonal evidence, there is duration and intensity ev- change in vowel quality (or timbre), in this ranking
idence, according to which there is considerable order, constitute the unmarked prosodic univer-
augmentation of both consonantal and vocalic sality (see Fry, 1958; Bolinger, 1958; Lehiste, 1970;
parts (see Botinis, 1989; Fourakis et al., 1999; Hyman, 1977; Berinstein, 1979; Hirst and Di
Botinis et al., 1999). Cristo, 1998a). In a series of perceptual experi-
The tonal rise of the stressed syllable as well as ments in Greek (Botinis, 1989, 1998) tonal align-
the tonal pattern of the stress group in Greek is ment as well as tonal height of stressed syllables
also fairly regular in dierent languages such as were manipulated and resynthesised speech stimuli
Danish and German (see Thorsen, 1982; Bannert, were subjected to perceptual analysis. Only F0
1985; Bannert and Thorsen, 1988; M obius, 1993). manipulations were carried out whereas duration
However, instead of, or in addition to a tonal rise, and intensity were constant (in an LPC environ-
languages may have a tonal fall, which may also ment) and the conclusions are based on 10 listen-
depend on the prosodic context, and thus a tonal ers' responses. Fig. 4(a) shows tonal displacement
change, rather than a tonal rise or tonal fall, as- in 8 equal steps, from 0 nomo to no0 mo and vice
sociated with stress distinctions (see G arding, versa, respectively, and thus 16 stimuli in total for
1977b; Hyman, 1977; G arding et al., 1982; Thor- the tonal alignment experiment. Fig. 4(b), on the
sen, 1982). The ``hat pattern'', introduced for the other hand, shows displacement of tonal height in
analysis of Dutch intonation, is a typical example, 7 equal steps in 0 nomo and no0 mo, respectively, and
according to which consecutive stressed syllables thus 14 stimuli in total for the tonal height ex-
may be correlated with an alternation of tonal rises periment.
and falls whereas the corresponding interstress Tonal alignment displacements comprising to-
interval may form a tonal plateau (t'Hart and tally 16 stimuli divided the listeners' responses into
Collier, 1975; t'Hart et al., 1990; t'Hart, 1998). fairly categorical groups. Rightward displacements
In several prosodic contexts, such as the post- (Fig. 4(a), over) divided the corresponding eight
focal one, stress distinctions may not correlate stimuli into two main groups: a group comprising
with any tonal change but a low tonal ¯attening stimuli 1±5, identi®ed as 0 nomo (over 90%), and
which is a regular tonal pattern in Greek as well as another group comprising stimuli 7±8, identi®ed as
many languages (see Section 2.1.4). On the other no0 mo (over 90%) whereas stimulus 6 was ambig-
hand, unstressed syllables may correlate with a uous. Leftward displacements (Fig. 4(a), under)
tonal change, either a rise or a fall, which are not also divided the corresponding eight stimuli into
associated with stress distinctions at lexical level two groups: a group comprising stimuli 9±12,
but with prosodic distinctions at higher levels such identi®ed as no0 mo (over 90%), and another group
as sentence and discourse (see Sections 2.2 and comprising stimuli 14±16, identi®ed as 0 nomo (over
2.3). 90%) whereas stimulus 13 was ambiguous. Thus,
In summary, a stressed syllable may correlate displacements of tonal alignment may cause a
with a tonal change whereas an unstressed syllable complete identi®cation change provided that a
is usually the carrier of a tonal change already critical point is crossed over. It should be noted
started on the stressed syllable (see the stress group however that this critical point is not the same for
notion above). An unstressed syllable may also the two reference words as duration and intensity
correlate with a tonal change but this is for into- are presumably in a trade-o relation with tonal
nation boundary distinctions associated with alignment.
higher level intonation structuring and not stress Tonal height displacements comprising totally
distinctions at lexical level. 14 stimuli (Fig. 4(b)) had no or negligible percep-
A. Botinis et al. / Speech Communication 33 (2001) 263±296 273
Fig. 4. Greek reference tonal patterns (solid lines) and manipulated synthetic stimuli (broken lines) with regards to stress perception in
tonal alignment (a) and tonal range (b) dimensions (adopted from Botinis, 1989, 1998).
2.2.1.3. Accent patterns. Fig. 5 shows tonal distri- 2.2.1.4. Lexical and focus interplay. Fig. 6 shows
bution of the Swedish word accents in one-word tonal distribution of focus as well as focus and
274 A. Botinis et al. / Speech Communication 33 (2001) 263±296
Fig. 9. Greek (left) and Swedish (right) tonal contours of test sentence productions with focus-neutral (0) as well as focus-initial (1),
focus-medial (2) and focus-®nal distribution (3). In Swedish, Mona has grave accent whereas Molly and London have acute accents
(adopted from Botinis and Bannert, 1997).
276 A. Botinis et al. / Speech Communication 33 (2001) 263±296
First, tonal range manipulations had a substantial are related to intonation phrase boundaries which
eect but did not cause a complete perceptual may be associated with a tonal change, either a fall
change in accordance with the focus-target ma- or a rise. Duration patterns, most usually length-
nipulation in either Greek or Swedish. Second, ening of the boundary material (as well as silent
poststressed tonal ¯attening manipulations had a pauses), may also be correlated with intonation
major perceptual eect on both Greek and Swed- phrasing.
ish and caused a complete perceptual change in Fig. 10 shows one aspect of intonation phrasing
Greek but not in Swedish. Third, tonal shift ma- in Swedish, i.e., the tonal structuring of two con-
nipulations had a major perceptual eect on both secutive intonation phrases with a distinctive
Greek and Swedish and caused a complete per- function. The speech material consists of the sen-
ceptual change in Greek but not in Swedish. tences fast man orade bonden, och loparen halsade
Fourth, tonal neutralisation manipulations had a kungen `but we sacri®ced the pawn, and the bishop
major perceptual eect and caused a defocalisation greeted the king' and fast man orade bonden och
in both Greek and Swedish. In summary, Greek is loparen, halsade kungen `though we sacri®ced the
more sensitive to tonal manipulations than Swed- pawn and the bishop, the king greeted us'.
ish with reference to focus perception. This is in The main dierence between the two sentences
accordance with the acoustics of focus where du- is con®ned to a deep versus a shallow tonal valley
ration is a constant correlate of focus in Swedish at the boundary of the word bonden, which de®nes
but not in Greek (Botinis et al., 1999). Thus, it intonation phrasing in accordance with the alter-
seems that tonal and duration patterns are in a native syntactic structures.
trade-o relation for focus perception in Swedish Fig. 11 shows another aspect of intonation
but less in Greek, the tonal pattern of which is by phrasing in Swedish, which have been extracted
far the critical perceptual correlate of focus. from a larger speech context. This extract consists
of the speech material ``. . . om det s a ar b
onor,
2.2.2. Phrase and sentence forms eller malet kae . . .'' (. . . whether (coee) beans, or
Apart from, and in combination with, mor- ground coee . . .).
phological and/or syntactic markers, intonation The main tonal correlate of phrasing is a tonal
may de®ne phrasing structures as well as dierent rise at the ®nal boundary of the word bonor which
types of sentences. Aspects of intonation phrasing reaches the top of the tonal variations within the
as well as sentence types and intonation interplay phrase. The distribution of this tonal rise is not
will be discussed below.
The stress groups in Danish have a regular bell- context of larger discourse and dialogue units.
like pattern with (1) a tonal change (rise) aligned Aspects of topic and dialogue boundaries associ-
with the stress syllable, (2) a tonal top at the post- ated with intonation forms will be presented be-
stressed syllable and (3) a tonal fall to the end of low.
the stress group much like other languages such as
German and Greek (see Section 2.2.1.2). The dif-
2.2.3.1. Topic boundaries and intonation forms.
ferent sentence types are correlated with dierent
Topic boundaries, apart from syntax and mor-
declination slopes with reference to the stressed
phology, may be marked by a variety of local and
syllables (but also tonal range variations). Thus,
global tonal patterns. Most usually, higher tonal
the syntactically unmarked questions do not show
patterns are associated with topic (or aspects of
any declination pattern and ®nal declarative
topic, i.e., subtopics) initiality and lower tonal
statements show maximum declination whereas
patterns with topic ®nality. Fig. 14 shows a topic
other types of sentences are in between. This tax-
initiality in Greek consisting of the spontaneous
onomy is related to the development of an into-
speech material li0 pon kseki0 name a0 mesos ``well, we
nation model in Danish (an elaborated version of
start right away''.
the model is reported in (Grùnnum, 1995); see also
The word li0 pon is the topic initiality of the
Section 3.2).
speech material in Fig. 14. There is a tonal rise
Fig. 13 shows intonation forms and sentence
aligned with the ®rst (unstressed) syllable as if that
functions in French. The sentence functions cor-
syllable were stressed. However, this tonal rise is
respond to (1) question, (2) continuation, (3) com-
most likely a discourse marker related to the onset
mand and (4) vocative of the one-word utterance
of the topic rather than lexical prominence and
``Anne Marie''.
stress. Presumably, lexical prominence at the sec-
The distinct sentence functions in Fig. 13 are
ond (stressed) syllable is correlated with an aug-
related to both global and local tonal features.
mentation of duration and intensity. It should be
Question (a) and continuation (b) functions have a
noted that there is no word 0 lipon (with stress on
®nal tonal rise whereas command (c) and vocative
the ®rst syllable) in the Greek lexicon.
(d) have a ®nal tonal fall. On the other hand,
Fig. 15 shows intonation forms of topic mark-
command has a left tonal dominance whereas
ing in a spontaneous dialogue. The speech mate-
vocative has a right one (see Di Cristo, 1998).
rial, apart from the reorganisation repetition
0
pezis ja to . . . `you play for', consists of four
2.2.3. Discourse and dialogue intonation forms intonation phrases: (1) 0 pezis ja to pade0 loni 0 dzin
Intonation forms encountered in isolated sen- `you play for a trouser jean, (2) a0 ksias 0
eka
tence productions may be heavily modi®ed in the xi0 ljadon
rax0 mon tis ka0 rera `worth ten thousand
drachmas of Karera, (3) 0 pezis ja to 0 futer tis
Fig. 13. Question (a), continuation (b), command (c) and Fig. 14. Tonal marking of topic initiality in Greek realised in
vocative (d) intonation forms in French produced in one-word the ®rst (unstressed) syllable of the word li0 pon produced in a
utterance (adopted from Di Cristo, 1998). spontaneous dialogue context (adopted from Botinis, 1992).
A. Botinis et al. / Speech Communication 33 (2001) 263±296 279
Fig. 15. Intonation phrases with continuation rises as well as turn ®nality tonal marking in Greek produced in a spontaneous dialogue
context. Arrows indicate main points of discourse and dialogue intonation forms (adopted from Botinis, 1992).
0
tri 0 gaiz `you play for the blouse of Three Gaiz and usually found in isolated sentence productions and
(4) ke 0 pezis ke ja 0 ena
er0 matino xarto0 filaka `and stress group realisations. This is an indication that
you play for a leather briefcase'. concrete tonal distribution may be related to dif-
The speech material is structured in intonation ferent levels of abstraction and have dierent
phrases with a ®nal boundary tonal rise (often functions in accordance to the level of application.
referred to as ``continuation rises'' in the interna-
tional literature), regardless of the stressed/un- 2.2.3.2. Dialogue boundaries and intonation forms.
stressed distribution. The tonal rises are Among other factors, such as type of speech ma-
presumably ``turn-keeping'' markers, in the sense terial and linguistic as well as situation context,
that the speaker wants to keep his turn within a intonation realisation is related to tonal patterns
topic, whereas the tonal fall is a ``turn-leaving'' across dierent speakers. Thus, there is a tonal
and/or topic ®nality marker, in the sense that the interplay between speakers which is evident,
speaker has concluded his turn and/or his topic. among other domains, at turn-unit boundaries.
There are hardly any declination tendencies, either Fig. 16 shows tonal patterns of turn-unit bound-
at the onset or the oset of the intonation phrases, aries in a spontaneous dialogue between two
which is an indication that declination may not be speakers. One speaker is leading the development
evident in certain types of speech material and of the dialogue and another speaker follows in a
contexts. On the other hand, there is a tonal fall cooperative way. Fig. 16(a) consists of three turn-
aligned with the last stressed syllable of the ®nal units: (1) sefxari0 sto `thanks' (®rst speaker), (2)
word xarto0 filaka `briefcase', which is most prob- 0
ela ja ke xa0 ra su `well, by-by' (second speaker)
ably a (sub)topic/turn ®nality rather than a and (3) ja ja `by-by' (®rst speaker). Fig. 16(b)
prominence marker. This tonal pattern has an consists of two turn-units: (1)
ila0
i 0 nane kli0 sto
interspeaker dialogue eect, according to which `which means o' (®rst speaker) and (2) ne `yes'
the second conversationalist resumes his turn (second speaker).
with e . . . In Fig. 16(a) and (b) the turn-units between the
Thus, discourse-triggered tonal rises may be two speakers form a coherent intonation structure.
aligned with unstressed syllables and tonal falls These patterns are rather regular between speakers
with stressed syllables. These patterns are not involved in a cooperative dialogue even in cases of
280 A. Botinis et al. / Speech Communication 33 (2001) 263±296
Fig. 16. Interspeaker pitch-concord tonal patterns. Solid and broken lines correspond to two dierent speakers (adopted from Botinis,
1992).
interspeaker personal dierences such as age and emphasises the functional aspect and shows how
sex (see Botinis, 1992). The tonal choices of a phonetic details, such as the location of an F0 peak
speaker may thus have interspeaker eects and, relative to the segmental structure, can change the
accordingly, the ultimate intonation form in dia- meaning of the utterance. Finally, acoustic styli-
logues and spontaneous discourse is a constant sation approaches aim at a robust computational
interplay between/among speakers, much like vo- analysis and synthesis of F0 contours.
cabulary and other grammatical components.
3.1. Phonological models of intonation
movement between a pitch accent and a boundary temporal alignment of the tones with the accented
tone. syllables.
Boundary tones, denoted by the ``%'' symbol Pierrehumbert's intonation model is predomi-
(H%, L%). Boundary tones are aligned with the nantly sequential; what is treated in other frame-
edges of an intonational phrase. The initial and works as the correlates of the phrase structure of a
®nal boundary tones control the onset and oset sentence or as global trends, such as question or
pitch, respectively, of the intonational phrase. declarative intonation patterns, is conceptualised
The model thus introduces a three-level hierar- as elements of the tonal sequence and their (local)
chy of intonational domains which obey the strict interaction. In this model, the English question
layer hypothesis: An intonational phrase consists intonation is embodied in the tonal sequence
of one or more intermediate phrases; each inter- L H ÿ H%, and there is no separate phrase-level
mediate phrase is composed of one or more pro- ``question intonation contour'' that these tones are
sodic words. The intonation contour of an superimposed on. Similarly, the downward trend
utterance is described as a sequence of relative (H observed in some types of sentences, particularly in
and L) tones. Well-formed sequences are predicted list intonation, is accounted for by the downstep
by a ®nite-state grammar (Fig. 17). eect triggered by certain accents, such as H L,
This abstract tonal representation is converted rather than being attributed to a phrase-level in-
into F0 contours by applying a set of phonetic tonation that aects all pitch accents.
realisation rules. The phonetic rules determine the There are a few aspects of the model that are
F0 values of the H and L tones, based on the hierarchical or non-local. The model is situated at
metric prominence of the syllables they are as- the interface between intonation and metrical
sociated with, and on the F0 values of the pre- phonology and inherits the hierarchical organisa-
ceding tones. Calculation of the F0 values of tones tion of the metrical stress rules. Another element
is performed strictly from left to right, depending whose eect is global is declination, onto which the
exclusively upon the already processed tone se- linear sequence of tones is overlaid. Given these
quence and not taking into account the subse- properties, Ladd (1988) characterised Pierrehum-
quent tones. The phonetic rules also compute the bert's model as a hybrid between the superposition
Fig. 17. Phonological model with pitch accent, phrase accent and boundary tone labelling (adopted from Pierrehumbert, 1980).
282 A. Botinis et al. / Speech Communication 33 (2001) 263±296
and the tone sequence approach. Furthermore, ToBI have by now been adopted to develop tran-
discourse structure is hierarchically organised, and scription systems for a large number of languages,
the information is used to control F0 scaling so that including Japanese, German, Italian, and Bulgar-
the pitch height of discourse segments re¯ects the ian. ToBI labels, in conjunction with F0 generation
discourse hierarchy (Hirschberg and Pierrehum- rules, are also frequently used in the intonation
bert, 1986; Silverman, 1987). The strongest posi- components of text-to-speech (TTS) synthesis
tion with respect to the local nature of tone scaling systems.
was taken by Liberman and Pierrehumbert (1984)
who concluded that most of the observed down- 3.2. Acoustic±phonetic models of intonation
ward trend is attributed to downstep and that
there is no evidence of declination in English. Grùnnum (Thorsen) developed a model of
Ladd's phonological intonation model (Ladd, Danish intonation (sumarised in Grùnnum, 1992,
1983) is based on Pierrehumbert's work but inte- 1995) that is conceptually quite dierent from the
grates some aspects of the IPO approach (t'Hart tone sequence approach. Her intonation model is
et al., 1990) and the Lund intonation model hierarchically organised and includes several si-
(Bruce, 1977; G arding, 1983; see also Section 3.2). multaneous, non-categorical components of dif-
Like Pierrehumbert, Ladd applies the framework ferent temporal scopes. The components are
of autosegmental and metrical phonology. He at- layered, i.e., a component of short temporal scope
tempts to extend the principles of feature classi®- is superimposed on a component of longer scope.
cation from segmental to suprasegmental Grùnnum's model integrates the following
phonology, which would also facilitate cross-lin- components. The highest level of description is the
guistic comparisons. In Ladd's model, F0 contours text or paragraph, which requires a discourse-de-
are analysed as a sequence of structurally relevant pendent intonational structuring (text contour).
points, viz., accent peaks and valleys, and bound- Beneath the text there are in¯uences of the sen-
ary end points, each of which is characterised by a tence or the utterance (sentence intonation contour)
bundle of features. Acoustically, each tone is de- and of the prosodic phrase (phrase contour). The
scribed in terms of its height and its position rel- lowest linguistically relevant level is represented by
ative to the segmental chain. Tones are connected stress group patterns. The four components are
by straight-line or smoothed F0 transitions. Key language-dependent and actively controlled by the
elements of this model are also presented in Ladd's speaker. The model also includes a component
more recent ``Intonational Phonology'' (Ladd, that describes microprosodic eects, such as vowel
1996). intrinsic and coarticulatory F0 variations, which
The tone sequence theory of intonation has are generally assumed not to be under the con-
been formalised into the tone and break indices scious control of the speaker. Finally, a Danish-
(ToBI) transcription system (Silverman et al., speci®c component models the stùd, a creaky voice
1992). ToBI was originally designed for tran- phenomenon at the end of phonologically long
scribing the intonation of three varieties of spoken vowels, or on the post-vocalic consonant in the
English, viz., general American, standard Austra- case of short vowels.
lian and southern British English, and the authors All components of the model are highly inter-
were sceptical about the possibility of using it to active and jointly determine the F0 contour of an
describe the intonation systems of other dialects of utterance. Therefore, for the interpretation of ob-
English, let alone other languages. After all, the served natural F0 curves, a hierarchical concept is
tone sequence theory provides an inventory of needed that allows the analytical separation of the
phonological entities; the ToBI system may thus eects of a multitude of factors on the intonation
be characterised as a broad phonemic system. The contour.
phonetic details of F0 contours in a given language Similar to Grùnnum's work, the Lund intona-
have to be established independently. These con- tion model (Bruce, 1977; G arding, 1983) analyses
siderations notwithstanding, the basic principles of the intonation contour of an utterance as the
A. Botinis et al. / Speech Communication 33 (2001) 263±296 283
complex result of the eects of several factors. A commands in the case of the accent component.
tonal grid, whose shape is determined by the sen- These functions are generated by two dierent sets
tence mode and by pivots at major syntactic of parameters: (1) amplitudes and timing of phrase
boundaries, serves as a reference frame for local F0 commands, and damping factors of the phrase
movements. It is thus implicitly assumed that the control mechanism; (2) amplitudes and timing of
speaker pre-plans the global intonation contour. the onsets and osets of accent commands, and the
At ®rst glance, the Lund model also includes ele- damping factors of the accent control mechanism.
ments of the tone sequence approach in that it The values of these parameters are constant for
represents accents by sequences of high and low a de®ned time interval: the parameters of the
tones. But in the Lund model position and height phrase component within one prosodic phrase; the
of the tones are determined by the tonal grid, parameters of the accent component within one
which is, by de®nition, a non-local component. accent group; and the basic value Fmin within the
Yet, the Lund model suggests that it is possible to whole utterance.
integrate aspects of the superpositional and the The F0 contour of a given utterance can be de-
tone sequence approaches. composed into the components of the model by
The classical superpositional intonation model applying an analysis-by-synthesis procedure. This
has been presented by Fujisaki (Fujisaki, 1983, is achieved by successively optimising the param-
1988). It can be characterised as a functional eter values, eventually yielding a close approxi-
model of the production of F0 contours by the mation of the original F0 curve. Thus, the model
human speech production apparatus, more spe- provides a parametric representation of intonation
ci®cally by the laryngeal structures; the approach contours.
is based on work by Ohman and Lindqvist (1966). The model has been applied to a number of
The model represents each partial glottal mecha- languages, including Japanese, Swedish, Mandarin
nism of fundamental frequency control by a sep- Chinese, French, Greek, German, and English.
arate component. Although it does not include a With the exception of English, where the model
component that models intrinsic or coarticulatory failed to produce certain low or low-rising accen-
F0 variations, such a mechanism could easily be tual contours (Liberman and Pierrehumbert, 1984;
added in case it is considered essential for, e.g., Taylor, 1994), very good approximations of nat-
natural-sounding speech synthesis. ural F0 contours were generally obtained. The
Fujisaki's model additively superimposes a ba- compatibility of several key assumptions of the
sic F0 value (Fmin), a phrase component, and an tone sequence approach with a Fujisaki-style
accent component, on a logarithmic scale (Fig. 18). model has been discussed and, to some extent,
The control mechanisms of the two components experimentally shown in M obius's work on Ger-
are realised as critically damped second-order man intonation (M obius, 1993, 1995), in which a
systems responding to impulse commands in the linguistic motivation and interpretation of the
case of the phrase component, and rectangular phrase and accent commands has been attempted.
Fig. 18. Superpositional model with phrase and accent components (adopted from Fujisaki, 1988).
284 A. Botinis et al. / Speech Communication 33 (2001) 263±296
The concept of superposition is also exploited in Predicted accent curve shapes can thus be con-
the Bell Labs intonation model (Van Santen and sidered as time-warped versions of a common
M obius, 1997, 2000), which focuses on the tem- template. It is stipulated that pitch accents of the
poral aspects of intonation. In this model an F0 same class sound the same because they are
contour is computed by adding up three types of aligned in the same way with the segmental
time-dependent curves: a phrase curve, which de- structure of the accent group they are associated
pends on the type of phrase, e.g., declarative ver- with. Conversely, two accent curves are phono-
sus interrogative; accent curves, one for each logically distinct if they cannot be generated from
accent group; and segmental perturbation curves. the same template using the same alignment pa-
The model incorporates results from (Van Santen rameter matrix.
and Hirschberg, 1994) who had shown that there is This model is used in the Bell Labs TTS system
a relationship between accent group duration and for English, French, German, Italian, Spanish,
F0 peak location. Another important factor is the Russian, Romanian, and Japanese (Van Santen et
segmental structure of onsets and codas of stressed al., 1998; Venditti et al., 1998).
syllables.
In the Bell Labs model the phonological unit of 3.3. Perceptual models of intonation
a pitch accent is the accent group, not the accented
syllable. An accent group is de®ned as an entity The starting point of the best-known perceptual
that consists of an accented syllable followed by model of intonation, the model developed at IPO
zero or more unaccented syllables. The time course (Institute of Perception Research, Eindhoven), was
of an accent curve depends on the segmental and the observation that certain F0 movements are
temporal structure of the entire accent group, not perceptually relevant whereas others are not. In-
only on the properties of the accented syllable. tonation analysis according to the IPO method
This dependency is complicated but regular: all (t'Hart et al., 1990) consists of three steps. First,
else being equal, the pitch peak, as measured from the perceptually relevant movements are stylised
the start of the accented syllable, is shifted to the by straight lines. The procedure results in a se-
right as any part of the accent group is lengthened. quence of straight lines, a close copy contour, that
It is shown that this regularity can be captured by is perceptually indistinguishable from the original
a simple linear alignment model. intonation contour: the two contours are percep-
Based on these ®ndings, the model predicts F0 tually equivalent. The motivation for stylising the
peak location in a given accent group by com- original intonation is that the enormous variability
puting a weighted sum of the onset and rhyme of raw F0 curves presents a serious obstacle for
durations of the stressed syllable, and the duration ®nding regularities.
of the remainder of the accent group. It is assumed In a second step, common features of the close
that the three factors exert dierent degrees of copy contours, expressed in terms of duration and
in¯uence on peak location. For any given seg- range of the F0 movements, are standardised and
mental structure, the ensemble of regression collected as an inventory of discrete, phonetically
weights is called an alignment parameter matrix, de®ned types of F0 rises and falls. These move-
and for each given pitch accent type the alignment ments are categorised according to whether or not
parameter matrix characterises how accent curves they are accent lending; for example, in both Dutch
are aligned with accent groups. The accent curves and German, F0 rises occurring early in a syllable
also have to be scaled to some appropriate pitch cause this syllable to be perceived as stressed, while
range, re¯ecting the relative prominence of the rises of the same duration and range, but late in
pitch accent. the syllable, are not accent lending. Similarly, late
F0 curves of pitch accents that belong to the falls produce perceived syllabic stress but early
same perceptual or phonological class are gener- ones do not. The notion of accent lending adds a
ated from a common basic shape or template by functional aspect to the otherwise purely melodic
applying a common set of alignment parameters. character of the model.
A. Botinis et al. / Speech Communication 33 (2001) 263±296 285
In the third and ®nal step, a grammar of possible and speech synthesis tasks (Mertens, 1989; Mal-
and permissible combinations of F0 movements is frere et al., 1998).
written. The grammar describes both the group-
ing of pitch movements into longer-range contours 3.4. A functional model of intonation
and the sequencing of contours across prosodic
phrase boundaries. The contours must comply with The Kiel intonation model (KIM; Kohler,
two criteria: they are required to be perceptually 1991) can be characterised as a generative, func-
similar to, and as acceptable as, naturally pro- tional model of German intonation, based on re-
duced contours. Thus, the complete model de- search on F0 production and perception. Unlike
scribes the melodic possibilities of a language. many other intonation models, ®rst, KIM does not
The IPO model was originally developed for ignore microprosodic F0 variations and, second, it
Dutch, but it was later also applied to English (de integrates syntactic, semantic, pragmatic, and ex-
Pijper, 1983), German (Adriaens, 1991), and pressive functions (meaning functions). The model
Russian (Ode, 1989). It has been implemented in applies two types of rules. Input information to the
speech synthesis systems for Dutch (Terken, 1993; symbolic feature rules is a sequence of segmental
Van Heuven and Pols, 1993), English (Willems symbols annotated for stress as well as pragmatic
et al., 1988), and German (Van Hemert et al., and semantic features. The rules convert this input
1987). into sequences of binary features, such as [late]
Stylisation in the IPO model is performed by a or [terminal]. Finally, parametric rules generate
human experimenter. The method can therefore duration and F0 values and control the alignment
produce inconsistent results when the same origi- of the F0 contour elements with the segmental
nal contour is stylised more than once, either by structure of the target utterance. The parametric
the same or by dierent researchers, which may rules include rules for the downstepping of accent
yield dierent parameter values. It is claimed, peaks during the course of the utterance as well as
however, that in practice any inconsistencies microprosodic rules.
are below perceptual thresholds (Adriaens, 1991, One of the starting points in the development of
p. 38) and thus negligible. KIM was the study of accent peak shifts (Kohler,
Methods for automatic stylisation of intonation 1987, 1990), which discovered three distinct loca-
contours on perceptual grounds have been pro- tions of the F0 peak relative to the segmental
posed by Mertens and d'Alessandro (Mertens, structure of the stressed syllable: early peaks signal
1987; D'Alessandro and Mertens, 1995). The au- established facts that leave no room for discussion;
thors base their approach on two assumptions. medial peaks convey new facts or start a new ar-
First, perception studies have provided evidence gument; late peaks put emphasis on a new fact and
that F0 contours should always be interpreted contrast it to what may already exist in the
along with co-occurring segmental and prosodic speaker's (or listener's) mind. Thus, shifting a peak
properties of the speech signal, not in isolation ± a backwards from the early location causes a cate-
point also emphasised by Kohler (1991). Second, it gory switch from given to new information. KIM
is hypothesised that the syllable may be the ap- has been implemented in the German version of
propriate domain for the perception of intonation, the INFOVOX speech synthesis system (Carlson
and that perceived pitch contours within a syllable et al., 1990).
can be further decomposed into elementary con-
tours, viz. tonal segments. During the stylisation 3.5. Acoustic stylization models of intonation
process the tonal segments that make up a syllabic
pitch contour are determined by applying thresh- The Tilt intonation model (Taylor, 2000) was
olds on the slopes of rising and falling F0 curves. designed to provide a robust computational anal-
Finally, a pitch target is assigned to each tonal ysis and synthesis of intonation contours. The
segment. This approach has been applied to Dutch model analyses intonation as a sequence of pho-
and French, in both automatic speech recognition netic intonation events, such as pitch accents and
286 A. Botinis et al. / Speech Communication 33 (2001) 263±296
boundary tones. Whereas in customary terminol- downstep or declination, and the timing of the
ogy pitch accents and boundary tones are assumed event may have linguistic function (cf. Kohler's
to be phonological entities, in the Tilt model they early and late peaks).
are events that are characterised by continuous Automatic analysis and generation of intona-
acoustic±phonetic parameters ± an approach that tion contours is also performed by a collection of
has been criticised by some authors (e.g., Ladd, tools that were developed in Aix-en-Provence (see
1996) as being paralinguistic. Each of these events Hirst et al., 1994; Veronis and Campione, 1998;
consists of a rising and a falling component of Veronis et al., 1998; Di Cristo et al., 2000) in the
varying size; rise or fall may also be absent. The context of the MULTEXT project (Multilingual
mid point of an event is de®ned as the end of the Text Tools and Corpora). The toolkit allows the
rise and the start of the fall. automatic modelling of the F0 curve from the
The events are described by the so-called Tilt speech signal following the method described in
parameters: (a) amplitude or F0 excursion of the (Hirst et al., 1991). The output of this approach is
event; (b) its duration; (c) a dimension-less shape a sequence of target points, speci®ed in time and
parameter that is computed as the ratio of rise and frequency, that represents a stylisation of the F0
fall and ranges between 1 (rise only) and ÿ1 (fall curve. For F0 synthesis the target points are in-
only), a value of 0 indicating that the rising and the terpolated by a quadratic spline function. The
falling component are of the same size; (d) F0 po- target points also serve as input to the symbolic
sition, expressing the distance (in Hz) between a coding of intonation according to the INTSINT
baseline and the mid point of the event; (e) posi- system (International Transcription System for
tion of the event in the utterance. Intonation; Hirst and Di Cristo, 1998b). Finally,
The model proposed by M ohler (1998a) imple- the symbolic coding of intonation is automatically
ments an F0 parameterisation procedure that is aligned with the segmental annotation of the ut-
similar to the Tilt model. It uses parameters that terance (Fig. 19).
express the shape and steepness of intonation The INTSINT system provides a narrow pho-
events. Further parameters control the alignment netic transcription of intonation, which has been
of the F0 curve with the syllable and the scaling of shown to be applicable to a number of languages
the event within the speaker's local pitch range. with quite diverse intonation systems. In fact, as
The perceptual adequacy of the parameterisation Hirst and Di Cristo (1998b) point out, the attempt
was tested and con®rmed in a series of perception to design a transcription system that would be
experiments. The model has been implemented in equally suitable for both English and French in-
the German version of the Festival TTS system tonation was one of the original motivations for
(M ohler, 1998b). For synthesis or prediction of the development of INTSINT. Unlike ToBI,
intonation contours, the model oers an interface which presupposes that the inventory of F0 pat-
to syntactic and semantic analysis by way of in- terns of the language in question is already known,
terpreting prosodic labels and pitch range in its INTSINT can be used to explore and analyse
input. the intonation of languages whose tonal systems
Both M ohler's and the Tilt parameters are ap- have not already been described in acoustic
propriate for the synthesis of F0 contours because, detail.
other than in ToBI-based approaches, no rules for
the realisation of F0 from abstract units are re-
quired. Both models may be characterised as F0 4. Applications of intonation
coding or F0 data reduction, with potential prob-
lems concerning their linguistic interpretability. Since intonation forms such a central part of
The authors argue, however, that the parameters human speech communication, not only conveying
of their models are meaningful; for instance, the diverse linguistic information, but also informa-
amplitude parameter is related to perceived tion about the speaker, the speaker's mood and
prominence, F0 position may be used to model attitude, it certainly ought to be useful in many
A. Botinis et al. / Speech Communication 33 (2001) 263±296 287
prosody model within the KTH text-to-speech problem with acceptable variability was solved by
system. This makes it possible to use the system heavily smoothing the student's F0 curve, nor-
without a teacher on-line or to use training mate- malising it to the model utterance and allowing
rial that is not in the teacher database. This ma- some variation within a so called ``pitch tunnel''
terial could be supplied as text by the teacher, by between the ``pitch anchor points'' of the F0 curve
the student or by a program that is monitoring the (Rooney et al., 1992). The segmentation that is
student's progress. In the training, model governed needed to correctly identify these points in the
variation in intonation could be introduced student utterances is performed with HMM
through dierent commands to the synthesiser, techniques.
such as emphasis, speaking style or even dialect.
The visual auditory display could show both the 4.3.3. Providing feedback for deaf students
production of the ``teacher'', the student and the For many years, intonation displays have been
combined student production with mapped pros- developed for use in speech and language training.
ody. No evaluation with real students has been Simple analogue instruments with lights and dials
performed yet. were used to provide feedback for deaf persons in
an eort to place F0 within an appropriate range
(Risberg, 1976).
4.3.2. Foreign language training An early, classical example of using computers
For persons with normal hearing, auditory is reported by Nickerson et al. (1976). In this case
feedback is obviously ecient in acquiring the the training was designed as a play, a precursor of
mother tongue. However, we know from experi- computer games. The F0 of a subject was con-
ence that correct prosody is notoriously dicult to trolling the vertical position of a dot (ball) that
master in a foreign language. Could visual feed- travelled from left to right on the screen. A stylised
back improve this situation? obstacle should be avoided and a target point (the
Even if the literature is not in full agreement on basket) should be reached by varying the F0 . Many
this point, there is a strong indication that visual of these early devices never found their way into
displays together with the normal auditory feed- real life applications and their pedagogical impact
back results in more ecient intonation training. has been questioned.
Weltens and de Bot (1984) describe a series of Around 1985 a PC-based speech training aid
experiments that show clear results of improve- was developed at IBM in Paris. It contained a
ment in the audio-visual condition compared to variety of programs aimed at teaching deaf stu-
auditory feedback only. Some indication is given dents dierent aspects of pronunciation. Both
that the advantage is stronger for inexperienced game-like programs and more analytic modules
learners. A surprising result is that delaying the were contained in the package, many aimed at
feedback by 250 ms or until the sentence is com- intonation. This device has been further developed
pleted did not diminish the eect compared to and commercialised as the IBM Speech Viewer.
immediate feedback. For deaf and severely hard-of-hearing persons the
In a recent study by Oster (1997), the Speech visual feedback of their intonation is essential in
Viewer has been used in an experimental teaching establishing acceptable pronunciation habits. The
class for grown-up immigrants learning Swedish. expectation is that associated tactile and proprio-
The addition of the computer supported pronun- ceptive feedback could help in maintaining the
ciation training contained both prosodic and seg- skills.
mental training. The attitude to this method was Clearly, better theories, descriptions and mod-
very positive and many of the students show quite els of intonation, and prosody in general, could be
striking improvements. Long term retention of the pro®tably exploited in many areas, with reference
improvement still needs to be studied. to interdisciplinary technological, medical, and
In the EU Spell project several alternative in- educational applications. Signi®cant work is be-
tonation training strategies were investigated. The ing carried on not only in established areas of
A. Botinis et al. / Speech Communication 33 (2001) 263±296 291
intonation applications but also in areas where Intonation research is steadily widening and
segmental phonetics has been the main concern. new issues as well as investigation areas are being
developed. Well-established research paradigms in
controlled speech environments are standard
5. Research perspectives sources of new knowledge production and spon-
taneous speech analyses as well as discourse stud-
Before concluding this tutorial, some general ies are well on the way with promising
considerations are to be made, namely, the expectations in the immediate future. Apart from
meaning as well as the bene®ts of intonation basic knowledge about aspects of discourse into-
studies is in a wider perspective. Intonation, as nation such as accentuation, boundary signalling
well as phonetics and linguistics in general, is par and topic changes, much is to be learned on the
excellence a humanities area and the relevant nature and meaning of intonation in spontaneous
knowledge concerns the human as a social being. discourse. A basic taxonomy of tonal categories
Apart from knowledge as such, linguistic expertise and tonal units as well as distinctive functions with
may however have multi-dimensional eects with regards to the organisation of spoken discourse are
regards to social relations and quality of life. eminent issues.
Speech and language technology, e.g., require in- More fundamental questions about the nature
tegrated linguistic knowledge of both theoretical of intonation such as the contribution of intona-
and application aspects which have given a boost tion to the speech communication process as well
to linguistic research during the last years. as the relation of intonation to information units
There is a circular relation between linguistic ought also to be considered. Intonation is part of
knowledge and linguistic applications: there is by the linguistic system and thus a natural question is
far much more knowledge than diverse applica- its degree of contribution. In other words, what
tions may adequately respond to and, at the same would be the eects of intonation neutralisation
time, applications require more and more thor- and lack of tonal distinctions in speech commu-
ough linguistic knowledge. Furthermore, the rep- nication? And, subsequently, what would the
ertoire of linguistic applications grows steadily compensation from other linguistic components?
bigger, which puts new demands on linguistic On the other hand, language, and thus intonation,
knowledge. In addition, applications may have is the means to a communicative end which is the
empirical implications for existing hypotheses and indented message in, presumably, an organised
theoretical aspects as well as lead to new questions, structure of information units. What is the relation
methodological procedures and theory develop- between abstract messages and concrete language
ment. forms? How can we study the way messages and
As outlined in previous sections, the main ap- thus higher order thoughts are organised and
plications of intonation are related to technology, realised through language? This type of question
medicine and education areas. Technological ap- may be related to invariance and variability issues,
plications and especially speech technology con- in the sense that linguistic forms may have several
stitute the most wide-spread among the degrees of variability, in accordance with the levels
applications of intonation. However, in recent of abstraction up to and including message and
years, speech technology does have a direct ap- information structure units in speech communi-
plication on language education as well as lan- cation.
guage pathology within the broad areas of In summary, the way the study of intonation is
educational technology and medical technology carried out encompasses a wide-range of method-
respectively. Thus the closer link between intona- ological approaches and theoretical backgrounds.
tion research and technological applications opens New horizons are continuously opening and more
new perspectives to technological, educational and and more disciplines are involved which are ex-
medical aspects of interdisciplinary research in the pected to give further boosting of intonation
future years. studies in the upcoming years.
292 A. Botinis et al. / Speech Communication 33 (2001) 263±296
Acknowledgements Botinis, A., 1998. Intonation in Greek. In: Hirst, D., Di Cristo,
A. (Eds.), Intonation Systems: A Survey of Twenty
Languages. Cambridge University Press, Cambridge,
For comments and much useful feedback our pp. 288±310.
thanks go to Robert Bannert, G
osta Bruce, Albert Botinis, A., Bannert, R., 1997. Tonal perception of focus in
Di Cristo, Gunnar Fant, Nina Grùnnum, Carlos Greek and Swedish. In: Proceedings of the ESCA Work-
Gussenhoven, Alex Monaghan, Mark Tatham, shop on Intonation. Athens, Greece, pp. 47±50.
Jean Veronis and Yi Xu. Botinis, A., Erkenborn, S., Isacsson, C., Westin, P., 1999.
Prosodic variability and segmental durations in Greek and
Swedish. In: Proceedings of the Swedish Phonetics Con-
ference Fonetik 99. Gothenburg, Sweden, pp. 41±44.
Brazil, D., Coulthard, M., Johns, C., 1980. Discourse Intona-
References tion and Language Teaching. Longman, London.
Brazil, D., Hewings, M., Cauldwell, R., 1997. The Communi-
Adriaens, L.M.H., 1991. Ein Modell Deutscher Intonation. cative Value of Intonation in English. Cambridge Univer-
Ph.D. Dissertation, Technical University Eindhoven. sity Press, Cambridge.
Atal, B.S., 1972. Automatic speaker recognition based on pitch Bresnan, J., 1971. Sentence stress and syntactic transforma-
contours. J. Acoust. Soc. Amer. 52, 1687±1697. tions. Language 47, 257±280.
Atkinson, J.A., 1978. Correlation analysis of the physiological Brown, G., Yule, G., 1983. Discourse Analysis. Cambridge
factors controlling fundamental voice frequency. J. Acoust. University Press, Cambridge.
Soc. Amer. 63, 211±222. Brown, G., Currie, K., Kenworthy, J., 1980. Questions of
Bannert, R., 1985. Towards a model for German prosody. Intonation. Croom Helm, London.
Folia Linguistica XIX, 321±341. Bruce, G., 1977. Swedish Word Accents in Sentence Perspec-
Bannert, R., 1990. Pa Vag mot Svenskt Uttal. Studentlitteratur, tive. Gleerup, Lund.
Lund. Bruce, G., 1998. Allm an och Svensk Prosodi. In: Practical
Bannert, R., Thorsen, N., 1988. Empirische Studien zur Linguistics, Vol. 16. Department of Linguistics, Lund
Intonation des Deutschen und D
anischen: Ahnlichkeiten University.
und Unterschiede. Kopenhagener Beitr age zur Germanis- Bruce, G., G arding, E., 1978. A prosodic typology for Swedish
tischen Linguistik 24, 26±50. dialects. In: Garding, E., Bruce, G., Bannert, R. (Eds.),
Beckman, M.E., Ayers, M., 1997. (3rd version). Guidelines for Nordic Prosody. Gleerup, Lund, pp. 219±228.
ToBI labelling. Department of Linguistics, The Ohio State Bruce, G., Granstr om, B., Gustafson, K., House, D., 1993.
University (URL:http://www.ling.ohio-state.edu/phonetics/ Prosodic phrasing in Swedish. In: Proceedings of the
E_ToBI). ESCA Workshop on Prosody, Lund, Sweden, pp. 180±
Beckman, M.E., Pierrehumbert, J.B., 1986. Intonation struc- 183.
ture in Japanese and English. In: Phonology Yearbook, Carlson, R., Granstr om, B., Hunnicutt, S., 1990. Multi-
Vol. 3, pp. 255±309. language text-to-speech development and applications. In:
Bell, A.G., 1879. Vowel theories. American Journal of Otology Ainsworth (Ed.), Advances in Speech, Hearing, and
1, 163±180. Reprinted in: Bell, A.G. (Ed.), 1916. The Language Processing. JAI Press, London, pp. 269±296.
Mechanisms of Speech. 8th ed. Funk and Wagnalls, New Chomsky, N., 1972. Deep structure, surface structure and
York, pp. 117±129. semantic interpretation. In: Chomsky, N. (Ed.), Studies on
Berinstein, A.E., 1979. A cross linguistic study of perception Semantics on Generative Grammar. Mouton, The Hague,
and production of stress. UCLA Working Papers in pp. 62±119.
Phonetics, pp. 1±59. Collier, R., 1975. Physiological correlates of intonation pat-
Bertenstam, J., Granstr om, B., Gustafson, K., Hunnicutt, S., terns. J. Acoust. Soc. Amer. 58, 249±255.
Karlsson, I., Meurlinger, C., Nord, L., Rosengren, E., Cooper, W.E., Sorensen, J.M., 1981. Fundamental Frequency
1997. The VAESS communicator: a portable communica- in Sentence Production. Springer, New York.
tion aid with new voice types and emotions. Phonum 4, Cruttenden, A., 1997. Intonation, 2nd ed. Cambridge Univer-
Department of Phonetics, Ume a University, Sweden, sity Press, Cambridge.
pp. 57±60. Cutler, A., Ladd, D.R. (Eds.), 1983. Prosody: Models and
Bloom®eld, L., 1933. Language. Holt, Rinehart and Winston, Measurements. Springer, Heidelberg.
New York. D'Alessandro, C., Mertens, P., 1995. Automatic pitch contour
Bolinger, D.L., 1958. A theory of pitch accent in English. Word stylisation using a model of tonal perception. Comput.
14, 109±149. Speech Language 9, 257±288.
Botinis, A., 1989. Stress and Prosodic Structure in Greek. Lund Danes, F., 1960. Sentence intonation from a functional point of
University Press, Lund. view. Word 16, 34±54.
Botinis, A., 1992. Accentual distribution in Greek discourse. de Pijper, J.R., 1983. Modelling British English Intonation.
Travaux de l'Institut de Phonetique d'Aix 14, 13±52. Foris, Dordrecht.
A. Botinis et al. / Speech Communication 33 (2001) 263±296 293
Di Cristo, A., 1998. Intonation in French. In: Hirst, D., Di Grùnnum (Thorsen), N., 1995. Superposition and subordina-
Cristo, A. (Eds.), Intonation Systems: A Survey of Twenty tion in intonation ± a non-linear approach. In: Proceedings
Languages. Cambridge University Press, Cambridge, of the 13th International Congress ± Phon. Sc. Stockholm,
pp. 195±218. pp. 124±131.
Di Cristo, A., Hirst, D., 1986. Modelling French micromelody: Grùnnum (Thorsen), N., 1998. Intonation in Danish. In: Hirst,
analysis and synthesis. Phonetica 43, 11±30. D., Di Cristo, A. (Eds.), Intonation Systems: A Survey of
Di Cristo, A., Di Cristo, Ph., Campione, E., Veronis, J., 2000. Twenty Languages. Cambridge University Press, Cam-
A prosodic model for text-to-speech synthesis in French. bridge, pp. 131±151.
In: Botinis, A. (Ed.), Intonation: Analysis, Modelling and Gussenhoven, C., 1984. On the Grammar and Semantics of
Technology. Kluwer Academic Publishers, Dordrecht (in Sentence Accents. Foris, Dordrecht.
press). Hadding-Koch, K., 1961. Acoustico-phonetic Studies in the
Fant, G., Kruckenberg, A., Liljencrants, J., 2000. acoustic± Intonation of Southern Swedish. Gleerup, Lund.
phonetic prominence in Swedish. In: Botinis, A. (Ed.), Hadding-Koch, K., Studdert-Kennedy, M., 1964. An experi-
Intonation: Analysis, Modelling and Technology. Kluwer mental study of some intonation contours. Phonetica 1,
Academic Publishers, Dordrecht (in press). 175±185.
Fischer-Jùrgensen, E., 1990. Intrinsic F0 in tense and lax vowels Halliday, M.A.K, 1967. Notes on transitivity and theme in
with speci®c reference to German. Phonetica 47, 99±140. English. J. Linguist. 3, 199±244.
Fourakis, M., Botinis, A., Katsaiti, M., 1999. Acoustic char- Hamon, C., Moulines, E., Charpentier, F., 1989. A diphone
acteristics of Greek vowels. Phonetica 56, 28±43. system based on time-domain prosodic modi®cations of
Fry, D.B., 1958. Experiments in the perception of stress. speech. In: Proceedings of the IEEE International Confer-
Language and Speech 1, 126±152. ence on Acoustics, Speech, and Signal Processing 89,
Fujisaki, H., 1983. Dynamic characteristics of voice fundamen- pp. 238±241.
tal frequency in speech and singing. In: MacNeilage, P.F. Helmholtz, H., 1877. On the Sensations of Tone (translated by
(Ed.), The Production of Speech. Springer, New York, Ellis, A.J., 1885, Dover, New York).
pp. 39±55. Hirschberg, J., Pierrehumbert, J., 1986. The intonational
Fujisaki, H., 1988. A note on the physiological and physical structuring of discourse. In: Proceedings of the 24th
basis for the phrase and accent components in the voice Annual Meeting of the Association Computational Lin-
fundamental frequency contour. In: Fujimura, O. (Ed.), guistics. New York, pp. 136±144.
Vocal Physiology: Voice Production, Mechanisms and Hirst, D., 1993. Detaching intonation phrases from syntactic
Functions. Raven, New York, pp. 347±355. structure. Linguist. Inquiry 24, 781±788.
Garde, P., 1968. L'Accent. Press Universitaires, Paris. Hirst, D., Di Cristo, A. (Eds.), 1998a. Intonation Systems: A
Garding, E., 1977a. The Scandinavian Word Accents. Gleerup, Survey of Twenty Languages. Cambridge University Press,
Lund. Cambridge.
Garding, E., 1977b. The importance of turning points for Hirst, D., Di Cristo, A., 1998b. A survey of intonation systems.
the pitch patterns of Swedish accents. In: Hyman, L.M. In: Hirst, D., Di Cristo, A. (Eds.), Intonation Systems: A
(Ed.), Studies in Stress and Accent. Southern California Survey of Twenty Languages. Cambridge University Press,
Occasional Papers in Linguistics 4, Los Angeles, pp. 27± Cambridge, pp. 1±44.
35. Hirst, D.J., Ide, N., Veronis, J., 1994. Coding fundamental
Garding, E., 1983. A generative model of intonation. In: Cutler, frequency patterns for multi-lingual synthesis with INT-
A., Ladd, D.R. (Eds.), Prosody: Models and Measure- SINT in the MULTEXT project. In: Proceedings of the
ments. Springer, Berlin, pp. 11±25. Second ESCA/IEEE Workshop on Speech Synthesis. New
Garding, E., Botinis, A., Touati, P., 1982. A comparative study Paltz, NY, pp. 77±80.
of Swedish, Greek and French intonation. Working Papers Hirst, D.J., Nicolas, P., Espesser, R., 1991. Coding the F0 of a
22, Department of Linguistics & Phonetics, Lund Univer- continuous text in French: an experimental approach. In:
sity, pp. 137±153. Proceedings of the 12th International Congress ± Phon. Sc.
Gimson, A.C., 1962. An Introduction to the Pronunciation of Aix-en-Provence. France, pp. 234±237.
English. Arlond, London. Hockett, C.F., 1955. A Manual of Phonology. Waverley Press,
Goldsmith, J.A., 1976a. Autosegmental Phonology. Ph.D. Baltimore.
Dissertation, MIT (distributed by IULC and published Hombert, J.M., 1978. Consonant types, vowel quality, and
1979 by Garland Press, New York). tone. In: Fromkin, V.A. (Ed.), Tone: A Linguistic Survey.
Goldsmith, J.A., 1976b. An overview of autosegmental pho- Academic Press, New York, pp. 77±111.
nology. Linguistic Analysis 2, 23±68. House, D., 1990. Tonal Perception in Speech. Lund University
Goldsmith, J.A., 1990. Autosegmental and Metrical Phonolo- Press, Lund.
gy. Basil Blackwell, Oxford. Hyman, L.M., 1977. On the nature of linguistic stress. In:
Grùnnum (Thorsen), N., 1992. The Groundworks of Danish Hyman, L.M. (Ed.), Studies in Stress and Accent. Southern
Intonation: An Introduction. Museum Tusculanum Press, California Occasional Papers in Linguistics 4, Los Angeles,
Copenhagen. pp. 37±82.
294 A. Botinis et al. / Speech Communication 33 (2001) 263±296
Jackendo, R., 1972. Semantic Interpretation in Generative Malmberg, B., 1967. Structural Linguistics and Human Com-
Grammar. MIT Press, Cambridge, MA. munication. Springer, New York.
Jones, D., 1956. Outline of English Phonetics, 8th ed. Heer, Martinet, A., 1954. Accent et tons. Miscellanea Phonetica 2,
Cambridge. 13±24.
Jun, S.-A., Fougeron, C., 2000. A phonological model of Matsui, T., Furui, S., 1990. Text-independent speaker recogni-
French intonation. In: Botinis, A. (Ed.), Intonation: tion using vocal tract and pitch information. In: Proceed-
Analysis, Modelling and Technology. Kluwer Academic ings of the International Conference on Spoken Language
Publishers, Dordrecht (in press). Processing 90. Kobe, Japan, pp. 137±140.
Koenig, W., Dunn, H.K., Lacy, L.Y., 1946. The sound Meron, Y., Hirose, K., 1996. Language training system utilising
spectrograph. J. Acoust. Soc. Amer. 18, 19±49. speech modi®cation. In: Proceedings of the International
Kohler, K.J., 1987. The linguistic functions of F0 peaks. In: Conference on Spoken Language Processing 96. Philadel-
Proceedings of the 11th International Congress ± Phon. Sc. phia, PA, pp. 1449±1452.
Tallinn, pp. 149±152. Mertens, P., 1987. L'intonation du Franaßis: de la description
Kohler, K.J., 1990. Macro and micro F0 in the synthesis of linguistique
a la reconnaissance automatique. PhD Disser-
intonation. In: Kingston, J., Beckman, M.E. (Eds.), Papers tation, Katholieke Universiteit Leuven, Leuven.
in Laboratory Phonology I: Between the Grammar and Mertens, P., 1989. Automatic recognition of intonation in
Physics of Speech. Cambridge University Press, Cam- French and Dutch. In: Proceedings of the European
bridge, pp. 115±138. Conference on Speech Comminication and Technology
Kohler, K.J., 1991. Prosody in speech synthesis: the interplay Eurospeech Eurospeech 89. Paris, pp. 46±49.
between basic research and TTS application. J. Phonetics Meyer, E.A., 1937. Die Intonation im Schwedischen I: Die
19, 121±138. Sveamundarten. Studies Scand. Philol. 10, University of
Ladd, D.R., 1983. Phonological features of intonational peaks. Stockholm.
Language 59, 721±759. Meyer, E.A., 1954. Die Intonation im Schwedischen II: Die
Ladd, D.R., 1988. Declination `reset' and the hierarchical orga- norrlandischen Mundarten. Studies Scand. Philol. 11,
nization of utterances. J. Acoust. Soc. Amer. 84, 530±544. University of Stockholm.
Ladd, D.R., 1996. Intonational Phonology. Cambridge Uni- Mobius, B., 1993. Ein quantitatives Modell der deutschen
versity Press, Cambridge. Intonation: Analyse und Synthese von Grundfrequenz-
Ladefoged, P., 1967. Three Areas of Experimental Phonetics. verl
aufen. Niemeyer: T ubingen.
Oxford University Press, London. Mobius, B., 1995. Components of a quantitative model of
Ladefoged, P., McKinney, N.P., 1963. Loudness, sound pres- German intonation. In: Proceedings of the 13th Interna-
sure and sub-glottal pressure in speech. J. Acoust. Soc. tional Congress ± Phon. Sc. Stockholm. Sweden, pp. 108±
Amer. 35, 454±460. 115.
Lambrecht, K., 1996. Information Structure and Sentence Mohler, G., 1998a. Theoriebasierte Modellierung der deutschen
Form. Cambridge University Press, Cambridge. Intonation fur die Sprachsynthese. University of Stuttgart,
Lea, W.A., 1980. Prosodic aids to speech recognition. In: Lea, Stuttgart.
W.A. (Ed.), Trends in Speech Recognition. Prentice-Hall, Mohler, G., 1998b. IMS Festival (http://www.ims.uni-stutt-
Englewood Clis, NJ, pp. 166±205. gart.de/phonetik/synthesis/index.html).
Leben, W., 1976. The tones in English intonation. Linguist. Monaghan, A.I.C., 1993. What determines accentuation?.
Anal. 2, 69±107. J. Pragmat. 19, 559±584.
Lehiste, I., 1970. Suprasegmentals. MIT Press, Cambridge, Monaghan, A.I.C., 1998. State-of-the-art summary of Europe-
MA. an synthetic prosody R&D (http://www.compapp.dcu.ie/
Liberman, M.Y., Pierrehumbert, J., 1984. Intonational invar- alex/PUB/soap.html).
iants under changes in pitch range and length. In: Arono, Monaghan, A.I.C., Ladd, D.R., 1990. Symbolic output as the
M., Oehrle, R.T. (Eds.), Language Sound Structure. MIT basis for evaluating intonation in text-to-speech systems.
Press, Cambridge, MA, pp. 157±233. Speech Communication 9, 305±314.
Liberman, M.Y., Prince, A., 1977. On stress and linguistic Morlec, Y., Bailly, G., Auberge, V., 2001. Generating prosodic
rhythm. Linguistic Inquiry 8, 249±336. attitudes in French: Data, model and evaluation. Speech
Lieberman, Ph., 1967. Intonation, Perception, and Language. Communication 33 (4), 357±371.
MIT Press, Cambridge, MA. Morton, K., Tatham, M., 1995. Pragmatic eects in speech
Lieske, C., Bos, J., Gamb ack, B., Emele, M., Rupp, C.J., 1997. synthesis. In: Proceedings of the European Conference on
Giving prosody a meaning. In: Proceedings of the Euro- Speech Communication and Technology Eurospeech 95.
pean Conference on Speech Communication and Technol- Madrid, Spain, pp. 1819±1822.
ogy Eurospeech 97. Rhodes, Greece, pp. 1431±1434. Nespor, M., Vogel, I., 1986. Prosodic Phonology. Foris,
Malfrere, F., Dutoit, T., Mertens, P., 1998. Automatic prosody Dordrecht.
generation using suprasegmental unit selection. In: Pro- Nickerson, R.S., Kalikow, D.N., Stevens, K.N., 1976. Com-
ceedings of the Third ESCA Workshop on Speech puter-aided speech training for the deaf. J. Speech Hearing
Synthesis. Jenolan Caves, Australia, pp. 323±328. Disorders 41, 120±132.
A. Botinis et al. / Speech Communication 33 (2001) 263±296 295
Niemann, H., N oth, E., Kiebling, A., Kompe, R., Batliner, A., Searle, J.R., 1976. A classi®cation of illocutionary acts.
1997. Prosodic Processing and its use in Verbmobil. In: Language in Society 5, 1±23.
Proceedings of the IEEE International Conference on Searle, J.R., 1979. Expression and Meaning. Cambridge Uni-
Acoustics, Speech, and Signal Processing 97. M unchen, versity Press, Cambridge.
Germany, pp. 75±789. Silkerk, E.O., 1984. Phonology and Syntax: The Relation
Nilsonne, A:, 1987. Speech in depression: a methodological between Sound and Structure. MIT Press, Cambridge, MA.
study of prosody. Ph.D. Dissertation, Karolinska Institute, Silverman, K., 1987. The Structure and Processing of Funda-
Stockholm. mental Frequency Contours. Ph.D. Dissertation, Univer-
Nord, N., Hammarberg, B., Lundstr om, E., 1995. Laryngecto- sity of Cambridge, Cambridge.
mee speech in noise ± voice eort, speech rate and Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M.,
intelligibility. Scand. J. Logopedics Phoniatrics 20, 107± Wightman, C., Price, P., Pierrehumbert, J., Hirschberg,
112. J., 1992. ToBI: A standard for labelling English prosody.
Ode, C., 1989. Russian Intonation: A Perceptual Description. In: Proceedings of the International Conference on
Rodopi, Amsterdam. Spoken Language Processing 92. Ban, Alberta, pp. 867±
Ohala, J.J., 1978. Production of tone. In: Fromkin, V.A. (Ed.), 870.
Tone: A Linguistic Survey. Academic Press, New York, Stockwell, R.P., 1972. The role of intonation: reconsiderations
pp. 5±39. and other considerations. In: Bolinger, D.L. (Ed.), Into-
Ohman, S.E.G., 1967. Word and sentence intonation: a nation. Penguin Books, Harmondsworth, pp. 87±109.
quantitative model. STL-QPSR 2±3, 20±54. Sundstr om, A., 1997. Speech technology in language learning.
Ohman, S.E.G., Lindqvist, J., 1966. Analysis-by-synthesis of M.Sc. thesis, KTH, Stockholm (in Swedish).
prosodic pitch contours. STL-QPSR 4, 1±6. Tams, A., Tatham, M., 1995. Describing speech styles using
Oster, A.-M., 1997. Auditory and visual feedback in spoken L2 prosody. In: Proceedings of the European Conference on
teaching. Phonum 4, Department of Phonetics, Ume a Speech Communication and Technology Eurospeech 95.
University, Sweden, pp. 145±148. Madrid, Spain, pp. 2081±2084.
Parris, E., Carey, M. 1996. Language independent gender Tatham, M., Morton, K., 1995. Speech synthesis in dialogue
identi®cation. In: Proceedings of the IEEE International systems. In: Dalsgaard, P. (Ed.), Spoken Dialogue Systems.
Conference on Acoustics, Speech, and Signal Processing (Visgo, Denmark, ESCA, 1995), pp. 221±225.
96. Atlanta GE, pp. 685±688. Tatham, M., Morton, K., 2000. Speech prosodics for synthe-
Pierrehumbert, J.B., 1980. The Phonology and Phonetics of sis - perspectives. In: Proceedings of the Swedish
English Intonation. PhD Dissertation, MIT, MA (Pub- Phonetics Conference Fonetik 2000. Sk ovde, Sweden,
lished in 1988 by IULC). pp. 133±136.
Pierrehumbert, J.B., Beckman, M.E., 1988. Japanese Tone Taylor, P.A., 1994. A Phonetic Model of Intonation in English.
Structure. MIT Press, Cambridge, MA. Indiana University Linguistics Club, Bloomington.
Pike, K.L., 1945. The Intonation of American English. Taylor, P.A., 2000. Analysis and synthesis of intonation using
University of Michigan Press, Ann Arbor. the Tilt model. J. Acoust. Soc. Amer. 107, 1697±1714.
Potter, R.K., 1945. Visible patterns of science. Science 102, Terken, J., 1993. Synthesizing natural ± sounding intonation for
463±470. Dutch: rules and perceptual evaluation. Comput Speech
Potter, R.K., Kopp, G.A., Green, H.C., 1947. Visible Speech. Language 7, 27±48.
Van Nostrand, New York. t'Hart, J., 1998. Intonation in Dutch. In: Hirst, D., Di Cristo,
Risberg, A., 1976. Visual aids for speech correction. Amer. A. (Eds.), Intonation Systems: A Survey of Twenty
Ann. Deaf, 178±194. Languages. Cambridge University Press, Cambridge,
Rooney, E., Hiller, S., Laver, J., Jack, M., 1992. Prosodic pp. 96±111.
features for automated pronunciation improvement in the t'Hart, J., Collier, R., 1975. Integrating dierent levels of
SPELL system. In: Proceedings of the International intonation analysis. J. Phonetics 3, 235±255.
Conference on Spoken Language Processing 92. Ban, t'Hart, J., Collier, R., Cohen, A., 1990. A perceptual Study of
Alberta, pp. 413±416. Intonation. Cambridge University Press, Cambridge.
Rossi, M., 1978. Interaction of intensity glides and frequency Thorsen (Grùnnum), N., 1978. An acoustical analysis of
glissandos. Language and Speech 21, 284±396. Danish intonation. J. Phonetics 6, 151±175.
Rossi, M., 1985. L'intonation et l'organisation de l' enonce. Thorsen (Grùnnum), N., 1982. On the variability of F0
Phonetica 42, 135±153. patterning and the function of F0 timing in languages
Rossi, M., 1999. L'Intonation, le Systeme du Francais: where pitch cues stress. Phonetica 39, 302±316.
Description et Modelisation. Editions Ophrys, Paris. Thyme-Gobbel, A., Hutchins, S., 1996. On using prosodic cues
Rossi, M., 2000. Intonation: past, present, future. In: Botinis, in automatic language identi®cation. In: Proceedings of the
A. (Ed.), Intonation: Analysis, Modelling and Technology. International Conference on Spoken Language Processing
Kluwer Academic Publishers, Dordrecht (in press). 96, Philadelphia, PA, pp. 1768±1771.
Searle, J.R., 1969. Speech Acts. Cambridge University Press, Trager, G.L., Smith, H.L., 1951. An Outline of English
Cambridge. Structure. Battenburg Press, Norman, Oklahoma.
296 A. Botinis et al. / Speech Communication 33 (2001) 263±296
Trubetzkoy, 1939. Grundz uge der Phonologie (translated 1969 Venditti, J.J., Maeda, K., van Santen, J.P.H., 1998. Modelling
by Baltaxe, C.A.M. Principles of Phonology, University of Japanese boundary pitch movements for speech synthesis.
California Press, Berkley). In: Proceedings of the Third ESCA Workshop on Speech
Vaissiere, J., 1995. Phonetic explanations for cross-linguistic Synthesis. Jenolan Caves, Australia, pp. 317±322.
similarities. Phonetica 52, 123±130. Veronis, J., Campione, E., 1998. Towards a reversible symbolic
Van Geel, R., 1983. Pitch in¯ection in electrolaryngeal speech. coding of intonation. In: Proceedings of the International
Ph.D. Dissertation, University of Utrecht. Conference on Spoken Language Processing 98. Sidney,
Van Hemert, J.P., Adriaens-Porzig, U., Adriaens, L.M., 1987. pp. 2899±2902.
Speech synthesis in the SPICOS project. In: Tillmann, Veronis, J., Di Cristo, Ph., Courtois, F., Chaumette, C., 1998.
H.G., Willee, G. (Eds.), Analyse und Synthese Gesproch- A stochastic model of intonation for text-to-speech syn-
ener Sprache. Olms, Hildesheim, pp. 34±39. thesis. Speech Communication 26, 233±244.
Van Heuven, V.J., Haan, J., 2000. Phonetic correlates of Wang, H.D., Degryse, D., Carraro, F., 1993. A prosody
statement versus question intonation in Dutch. In: Botinis, modi®cation approach for auditory user feedback in the
A. (Ed.), Intonation: Analysis, Modelling and Technology. SPELL pronunciation teaching system. In: Proceedings of
Kluwer Academic Publishers, Dordrecht (in press). the European Conference on Speech Communication and
Van Heuven, V.J., Pols, L.C.W. (Eds.), 1993. Analysis and Technology Eurospeech 93. Berlin, pp. 991±994.
Synthesis of Speech. Mouton de Gruyter, Berlin and New Weltens, B., de Bot, K., 1984. The visualisation of pitch
York. contours: some aspects of its eectiveness in teaching
Van Santen, J.P.H., 1993. Perceptual experiments for diagnostic foreign languages. Speech Communication 3, 157±163.
testing of text-to-speech systems. Comput. Speech Lan- Whalen, D.H., Levitt, A.G., 1995. The universality of intrinsic
guage 7, 49±100. F0 of vowels. J. Phonetics 23, 349±366.
Van Santen, J.P.H., Hirschberg, J., 1994. Segmental eects on Wheatstone, C., 1837. Reed organ pipes, speaking machines,
timing and height of pitch contours. In: Proceedings of the etc. Westminster Review 27, 30±37 (reprinted in Scienti®c
International Conference on Spoken Language Processing Papers of Sir Charles Wheatstone, 1879. Taylor and
94, Yokohama, pp. 719±722. Francis, London, pp. 48±367).
Van Santen, J.P.H., M obius, B., 1997. Modelling pitch accent Willems, N., Collier, R., t'Hart, J., 1988. A synthesis scheme for
curves. In: Proceedings of the ESCA Workshop on British English intonation. J. Acoust. Soc. Amer. 84, 1250±
Intonation. Athens, Greece, pp. 321±324. 1261.
Van Santen, J.P.H., M obius, B., 2000. A qualitative model of F0 Xu, Y., 1997. Contextual tonal variations in Mandarin.
generation and alignment. In: Botinis, A. (Ed.), Intonation: J. Phonetics 25, 61±83.
Analysis, Modelling and Technology. Kluwer Academic Xu, Y., 1999. Eects of tone and focus on the formation and
Publishers, Dordrecht (in press). alignment of F0 contours. J. Phonetics 27, 55±105.
Van Santen, J.P.H., M obius, B., Venditti, J., Shih, C., 1998. Xu, Y., Wang, E., 2001. Pitch targets and their realization:
Description of the Bell Labs intonation system. In: Evidence from Mandarin Chinese. Speech Communication
Proceedings of the Third ESCA Workshop on Speech 33 (4), 319±337.
Synthesis. Jenolan Caves, Australia, pp. 293±298.