Representing Intonational Variation: Julia Hirschberg CS 4706

Representing Intonational Variation
Julia Hirschberg CS 4706
4/30/2012
Today
How can we represent meaningful speech variation so we can assign in TTS? Expanded vs. compressed pitch range? Louder vs. softer speech? Faster vs. slower speech? Differences in intonational prominence? Differences in intonational phrasing? Differences in pitch contours?
4/30/2012
Schemes for Representing Intonational Variation

An early proposal: Joshua Steele Language Learning Approaches / IS it INteresting / / dyou feel ANGry? / / WHATS the PROBlem? / (McCarthy, 1991:106) What aspects of speech to capture? Continuous or categorical? If categorical, what are the possible classes?
4/30/2012 3
Many Models
Auditory: Language teachers: what representations can learners understand Acoustic: Examine the speech signal for critical vs. accidental variation Experimental approaches Identify potential meaningful variation
Design production or perception studies to test

E.g. what does a contour mean?
4/30/2012 4
F0 Models
Basic division into linear and superpositional models Linear models: sequence of independent choices from an intonation lexicon Superpositional models: hierarchy of phonological components (utterance to prosodic phrase)
4/30/2012
Intonation Models
Linear or Tone sequence models British school (Kingdon 58, OConnor & Arnold 73, Cruttenden 97): based on auditory analysis American School (Pierrehumbert 80, ToBI): mainly acoustic analysis Dutch school (t Hart, Collier and Cohen 1990): perceptual data Superpositional models (Fujisaki 1983, Mbius et al. 1993): acoustic/physiological
4/30/2012 6
Superpositional models
Pitch pattern of intonation modeled with two components: phrase component and accent component. Phrase has basic shape, and pitch movements for individual accents are superimposed over basic shape:
plus
=
4/30/2012
Apples, oranges and tomatoes
Good for modeling utterance-level trends

Declination: downtrend in f0 over the course of an utterance Best seen as statistical abstraction: if one takes f0 measurements from enough utterances, over time, a downtrend in f0 will emerge
Lily and Rosa thought this was divine. Prince William was gorgeous and he was looking for a bride. They dreamed of wedding bells.
4/30/2012 8
Superpositional models
Advantages
Good at modeling declination in intonation languages

Successful in speech synthesis for languages like Japanese (little variation in accent type, e.g.)
Capture prosodic structure in languages which have both tone and intonation (e.g. Mandarin)
Disadvantages Too rigid: All contours must be modeled with an accent and a phrase component Many SAE contours cannot be captured easily
4/30/2012 9
No account of different accent types, or variations in phrase endings No notation system which allows users to share observations from large speech corpora or to compare contours Used primarily for synthesis
4/30/2012
10
Tone sequence models

Claim: Intonation is generated from sequences of (possibly) categorically different and phonologically distinctive tones
4/30/2012
11
Types of Tone-sequence Models

Type 1: based on pitch movements
tar
et
Type 2: based on pitch levels
The British School The Dutch School
H t a rg
e tL
The American School

4/30/2012 12
The British School

Tone sequence model and pitch movement analysis (e.g. falling vs. rising intonation) Basic unit of intonational description: intonation phrase (tone unit) Delimited by pauses, phrase-final lengthening, pitch movement Syllables within a tone unit can be stressed or accented telephone Accented syllables are stressed and pitch prominent
4/30/2012
13
An example
and I think its
HOrriblerrible
...a
POINT where
you have to
CLEAN it
Theres a point where you have to clean it and I think its horrible...
4/30/2012 14
Intonation Phrases
Internal structure
Determined by location of accents in an IP

Each accent defines the beginning of a prosodic constituent
4/30/2012
15
Intonation phrase structure
Prenuclear accent unit
Nuclear accent unit Nucleus Stressed syllable
Prehead
Head
But JOHNs never BEEN to Jamaica

4/30/2012 16
Six nuclear choices in English
a m a ic J
falling
J a ma
ca i
rising
a ic Ja m a
rising-falling
a m a ic a J
falling-rising
a ic a Ja m a
4/30/2012
Ja
m a ic a
level
17
Rising-falling-rising
The American School

American school-type models make a distinction between accents (what makes a particular word prominent) and boundary tones (how a phrase ends) Autosegmental metrical or two-tone models Only two tones, which may be combined H = high target L = low target
4/30/2012 18
Pierrehumbert 1980
Contours = pitch accents, phrase accents, boundary tones
Pitch Accents* Phrase Accents* Boundary Tone
H*
L*
L-
H-
L%
H%
L*+H L+H* H*+L H+L*
4/30/2012
19
Price, Ostendorf et al
Break indices: degree of juncture between words 0 8 (none to a lot) What Id like is a nice roast beef sandwich.
4/30/2012
20
To(nes and)B(reak)I(ndices)
Developed by prosody researchers in four meetings over 1991-94
Putting Pierrehumbert 80 and Price, Ostendorf, et al together

Goals:
devise common labeling scheme for Standard American English that is robust and reliable promote collection of large, prosodically labeled, shareable corpora
4/30/2012 21
ToBI standards also proposed for Japanese, German, Italian, Spanish, British and Australian English,....
Minimal ToBI transcription: Recording of speech F0 contour ToBI tiers:
orthographic tier: words break-index tier: degrees of junction (Price et al 89) tonal tier: pitch accents, phrase accents, boundary tones (Pierrehumbert 80)
miscellaneous tier: disfluencies, non-speech sounds, etc.
4/30/2012
22
Sample ToBI Labeling
4/30/2012
23
Online training material,available at: http://anita.simmons.edu/~tobi/index.html Evaluation Good inter-labeler reliability for expert and naive labelers: 88% agreement on presence/absence of tonal category, 81% agreement on category label, 91% agreement on break indices to within 1 level (Silverman et al. 92,Pitrelli et al 94)
4/30/2012
24
Pitch Accent/Prominence in ToBI

Which items are made intonationally prominent and how: tonal targets/levels not movement
Accent type:
H* simple high (declarative) L* simple low (ynq) L*+H scooped, late rise (uncertainty/ incredulity) L+H* early rise to stress (contrastive focus) H+!H* fall onto stress (implied familiarity)
4/30/2012 25
Downstepped accents: !H*, L+!H*, L*+!H Degree of prominence:

within a phrase: HiF0 (~nuclear accent) across phrases ??
4/30/2012
26
Prosodic Phrasing in ToBI

Levels of phrasing: intermediate phrase: one or more pitch accents plus a phrase accent, Hor L intonational phrase: 1 or more intermediate phrases + boundary tone, H% or L%
ToBI break-index tier

0 1 no word boundary word boundary
4/30/2012
27
2 strong juncture with no tonal markings 3 intermediate phrase boundary 4 intonational phrase boundary
4/30/2012
28
L-L%
L-H%
H-L%
H-H%
H*
L*
L*+H
4/30/2012
29
L-L%
L-H%
H-L%
H-H%
L+H*
H+!H*
H* !H*
4/30/2012
30
Next Class
Predicting prosodic assigments from text
4/30/2012
31

Representing Intonational Variation: Julia Hirschberg CS 4706

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Representing Intonational Variation: Julia Hirschberg CS 4706

Uploaded by

Copyright:

Available Formats

Representing Intonational Variation

Julia Hirschberg CS 4706

Schemes for Representing Intonational Variation

Design production or perception studies to test

Apples, oranges and tomatoes

Good for modeling utterance-level trends

Good at modeling declination in intonation languages

Tone sequence models

Types of Tone-sequence Models

Type 2: based on pitch levels

The British School The Dutch School

The American School

The British School

and I think its

Determined by location of accents in an IP

Intonation phrase structure

Prenuclear accent unit

Nuclear accent unit Nucleus Stressed syllable

But JOHNs never BEEN to Jamaica

Six nuclear choices in English

The American School

L*+H L+H* H*+L H+L*

Putting Pierrehumbert 80 and Price, Ostendorf, et al together

miscellaneous tier: disfluencies, non-speech sounds, etc.

Sample ToBI Labeling

Pitch Accent/Prominence in ToBI

Downstepped accents: !H*, L+!H*, L*+!H Degree of prominence:

Prosodic Phrasing in ToBI

ToBI break-index tier

You might also like

L+H L+H H+L H+L

Downstepped accents: !H, L+!H, L*+!H Degree of prominence: