Ieee 99

Analysis and Modeling of 3D Jaw Motion in Speech and Mastication
Eric Vatikiotis-Bateson* and David J. Ostry** *ATR Human Information Processing Research Laboratories 2-2 Hikaridai, Seika-cho, Kyoto 619-0288, Japan bateson@hip.atr.co.jp ** Department of Psychology, McGill University Montreal, PQ, Canada ostry@motion.psych.mcgill.ca
translations around and along the three principal coordinate axes, as shown in Figure 1. For both speech and mastication, the jaw rotates downward (pitch) and translates both forward (horizontal) and downward (vertical) during jaw opening. During closing, the pattern is reversed. Significant lateral motions are observed, primarily in jaw closing movements during mastication. The mapping between jaw muscle actions and the mechanical degrees of freedom of jaw motion is complex. All muscles contribute essentially to both rotation and translation. Thus, in order to produce movements involving rotation or translation either alone or in combination, the control signals to jaw muscles must be coordinated.
ABSTRACT
This paper examines the jaw's contribution to functionally distinct orofacial activities such as mastication and speech. We focus on the degrees-of-freedom (df) of jaw motion, the coordination of the jaw with other orofacial articulators, and modeling studies of jaw motion control. Jaw df were obtained from the six component rotations and translations which fully describe rigid body jaw motion. Mastication and speech were characterized by independent motion in four df. However the relative prominence of the constituent movements differed in the two behaviors: for speech, pitch rotation, vertical and horizontal translation were most prominent; for mastication, pitch, yaw, and sagittal plane translation were dominant [1]. Jaw motion is coordinated with motion of the tongue. Even though the jaw is usually thought of as an indirect or remote speech articulator, it mirrors the behavior of proximal vocal tract constrictions made by the tongue: 1) consonant-specific differences in the motion paths and positional variability are essentially the same for the jaw and the tongue region creating the constriction; and 2) the same relative positions of midsagittal vowel targets are maintained by the jaw and tongue. Our empirical studies have contributed to the development of vocal tract models. In a control model of the tongue, jaw, hyoid bone and larynx [2], motor command inputs to modeled muscles are transformed into functionally appropriate movements. In other modeling, the mapping between muscles for jaw opening (anterior digastric ABD) and closing (medial pterygoid MPT) and subsequent jaw motion has been successfully estimated using linear and nonlinear techniques [3, 4]. We have also demonstrated that the jaw is a major component of the correlation between the time-varying behavior of the vocal tract and facial motion [5]. Its contribution to speech-related facial motion is so strong that motions of the tongue tip, for example, can be reliably estimated from the 3D motion of the facial surface.
Vertical Pitch Roll
Horizontal Lateral
Yaw
Figure 1. T the jaws 3D orientation and position. In earlier sagittal plane analyses of 2D jaw motion [7], it was observed that the sagittal plane jaw orientation (pitch) and horizontal position may vary independently. Qualitative observations were also reported of non-sagittal plane motions for the mathematically reconstructed 3D jaw during speech [6]. Finally, a quantitative three-dimensional analysis of jaw motion has shown that four of the six degrees-of-freedom (df) of jaw motion may be independently varied in speech and mastication: sagittal plane jaw orientation (pitch), vertical jaw position, horizontal jaw position and coronal plane jaw orientation (yaw) [1]. These components are not equally prominent in the two behaviors. Speech primarily involves movements in the midsagittal plane that consist of changes to the pitch angle, horizontal and vertical position. Mastication, on the other hand, primarily involves pitch, yaw and vertical translation. In speech, for example, sagittal plane orientation and horizontal
JAW DEGREES OF FREEDOM
In a series of studies of human jaw movements in speech and mastication we have examined the three orientation angles and three positions which fully characterize the 3D motion of the jaw [1, 6]. In addition to comparing the jaws motion in mastication and speech, our aim has been to determine the independent dimensions of jaw motion control. Human jaw motion involves a combination of six rotations and
Vertical Position (mm)
position may vary independently or in combination depending on the phonetic context of the spoken material (see Figure 2). The position of the jaw may also shift vertically, without affecting the overall shape of its motion path in the sagittal plane (see Figure 4, right panel). The magnitude of yaw motions in speech is typically small, yet systematic. Finally, analogous to the phonetic specification of the relation between pitch and horizontal position, plots of pitch and yaw show that both the slopes and intercepts are varied, suggesting that yaw is also independently controlled in speech [for details, see 1].
2 0 /p, f/ 0
speech but nevertheless differ statistically with factors such as bolus material and size. Jaw vertical and horizontal position were observed to vary independently, particularly as a consequence of changes to the size of the food bolus (see Figure 4). The timing of pitch and yaw movements was also independent with the phasing of movement involving motion in either one degree of freedom, the other, or the two combined.
2 0 -2 -4 -6 -8 -10 -5 0 5 10 15
Mastication
2 0 -2 -4 -6 -8 -10 -5 0 5
Speech
A
/s, s&, t/
-2 -4
Pitch (deg)
-2 -4 -6 -8 -10 -12 0 2 0 2 4
/k/ /l/
-6 -8
10
15
Horizontal Position (mm)

0 2 4 6 8 10
/r/
6 8 10 12 -10 2
Pitch (deg)
0 -2 -4 -6 -8 -10
-2 -4 -6 -8
Figure 4. Sagittal plane vertical position varies independently of horizontal position, due to changes of bolus size and food material (left panel) and loudness of speaking voice (right panel).
2
0 1 2 3 4 5
CONTROL OF JAW MOTION
-10 -12 1 2 3 4 5 6 7
-12 -1
Figure 2. Sagittal motion paths during speech showing pitch angle versus horizontal jaw position. Panels A and B give data for the same speaker producing the same utterances 18 months apart using different prosthetic appliances on the jaw. The paths show loud volume speech for different vowel-consonant-vowel sequences where the vowel is i and the consonant is as shown. The only sagittal component of ili utterances is horizontal translation. Panel C shows trials involving the consonant-vowel sequences so, ro, lo and to. The paths showing pure rotation are for to. Panel D is for trials containing sh. The two sets of paths involving almost pure rotation are for sho in normal (shorter paths) and loud conditions. The remaining set of paths are for loud volume trials with sha.
5 0 2
Because the jaw is a rigid structure, indeed one of only two rigid structures that move in the orofacial system, it is possible to obtain a complete and accurate description of the jaws geometric degrees of freedom. This also suggests that the control of jaw motion may be described for mastication and speech. We and our colleagues have approached the issue of control from two distinct, but related, directions: 1) direct mapping between jaw motion and the simultaneously observed activity of jaw muscles; and 2) models that use hypothesized control parameters to emulate observed jaw motion. These two approaches are briefly described below.
2.1. Jaw motion and physiology

In a series of speech production studies, midsagittal motion of jaw, lip, and tongue was recorded along with muscle activity (EMG). Speakers of Japanese, English, and French produced sentence utterances with varying degrees of spontaneity. Using linear and nonlinear correlation techniques [3, 4], sagittal plane jaw motion could be accurately estimated from the activity of just two muscles the jaw opener, ABD, and one of two closing muscles, MPT. The other jaw closing muscle, masseter, is easy to record but is active in mastication, not speech. When the sagittal plane motion was decomposed into sagittal orientation and translation components, instances of independent horizontal (anterior-posterior) jaw translation could not be adequately described because horizontal retraction is effected by activity of the lateral branch of the pterygoid (LPT). Unfortunately, the location of LPT makes it too dangerous to record, even though the anatomy makes its role in positioning the jaw during speech quite clear. Therefore, we have had to rely on models incorporating the anatomically determined structure and hypothesized activity of such muscles, as described in the next sub-section.
0 -2 -4 -6 -8 -10
Pitch (deg)
-5 -10 -15 -20 -25 -2 0 2 4 6 8 10
-12 -14 -2 0 2 4 6 8
Figure 3. Jaw motion in mastication for two subjects (A, B) shown as a function of sagittal plane orientation and horizontal position. The figure shows bilateral chewing movements for six types of food with a normal bolus size (large diameter boluses give the same pattern). Jaw movements in mastication are also characterized by independent motion in four degrees of freedom. As shown in Figure 3, the slopes and intercepts relating pitch to horizontal position are more restricted in their range than that observed in
2.2. Modeling jaw motion

Based on our empirical studies and related work [8, 9], two sagittal plane models have been proposed incorporating the equilibrium point (EP) hypothesis of motor control [10]; a model
Figure 5. Panels (from left to right) show the model components (tongue, jaw, hyoid, larynx, and vocal tract outline), laryngo-hyoid muscles (SH: sternohyoid, ST: sternothyroid, TH: thyrohyoid), jaw opening and retraction muscles (OP, RE, anterior and posterior digastric, respectively), and muscles for raising the jaw (CL: masseter and medial pterygoid, AT, PT: anterior and posterior temporalis, SP, IP: superior and inferior heads of the lateral pterygoid). of jaw and hyoid motion [11] and a model of tongue, jaw, hyoid and larynx motion [2]. The models contain neural control signals and reflexes, muscle mechanics, realistic musculo-skeletal geometry, and the dynamics of soft tissue and bony structures. The modeled mechanical properties include the dependence of force on muscle length and velocity, reflex damping and graded force development due to calcium kinetics. The EP hypothesis proposes that movements result from shifts in the equilibrium position of the neuromuscular system. The postulated shifts arise as a result of changes in control signals that act at the level of the motoneurone (MN) pool. The control signals correspond to threshold muscle lengths (lambdas) for alpha-MN recruitment. According to the model, force develops in proportion to the difference between the actual muscle length and the centrally specified threshold length. Thus by shifting lambda the system may move to a new equilibrium position. Jaw motions in the model have two degrees of freedom orientation in the sagittal plane and translation along the articular
Figure 6. Measured and modeled motion data for [isisa]. Empirical data are shown with dashed lines [for details, see 6]; simulation results are shown with solid lines; dotted lines show central commands. In each of the movement the central commands for rotation and translation commands start and stop at the same time. The co-contraction level is fixed.
surface of the temporal bone; the hyoid has three degrees of freedom, horizontal and vertical position and sagittal plane orientation. The larynx is modeled as a point mass with a single kinematic degree of freedom - vertical position. Midsagittal plane tongue movements are modeled using finite element techniques (FEM). The geometrical arrangement of modeled muscles is shown in Figure 5. A comparison of empirical and modeling results for jaw movements is given in Figure 6. The Figure provides empirical patterns of jaw rotation and translation during production of the utterance isisa. The modeled equilibrium shifts and predicted kinematics are also shown. The simulations test the idea that jaw motions arise when joint equilibrium orientations and positions start to shift simultaneously and each shifts at the same relative velocity [7]. It is seen that the time varying kinematics are satisfactorily approximated by this assumption. Note as well that the smooth changes in jaw position and orientation may be simulated using constant rate changes in the jaw equilibrium angle and jaw equilibrium position.
3.1 Functional constraints on jaw motion

Unlike mastication, in which the jaw plays a direct and necessary role, the jaw's role as a primary articulator in speech production is limited. That is, vowel sounds e.g., a, i, u, e, o are defined by the shape characteristics of the entire vocal tract, in which the jaw accommodates postural requirements of the tongue surface and the overall degree of vocal tract opening. Consonants e.g., p, t, k, n, s are defined by constrictions in the vocal tract, usually at highly specific locations, caused either by bringing some part of the tongue (the tip, blade, or body) in contact with the palate, upper maxillary arch (alveolar ridge) or teeth, or by bringing the lips in contact with each other or the teeth. In neither case, does the jaw directly create a constriction. Indeed, the full range of speech gestures can be generated without the motion of the jaw, as shown anecdotally by the pipesmoker's ability to speak with the pipe firmly clenched between the teeth and experimentally by bite-block studies [e.g., 12-14].
THE JAW AND OROFACIAL MOTION
As shown in Section 1, the patterning of jaw motion appears to be more richly varied in speech than in mastication. During mastication jaw and tongue are carefully coordinated to achieve proper positioning of the bolus for reduction and swallowing. During speech, on the other hand, the jaw interacts with articulators along the entire length of the vocal tract (from larynx to lips) in order to shape the time-varying speech acoustics. This would seem to imply a highly complex control scheme for achieving the required degree of coordination among speech articulators e.g., the jaw and lips, the jaw and tongue, even the jaw and larynx. However, as described in Section 2, the control of jaw behavior can be simulated adequately with the relatively simple control scheme provided by the EP hypothesis. In this section, we examine some of the functional and structural features of the vocal tract and orofacial systems that may account for the simultaneously rich, yet simply controlled, patterning of jaw behavior in audiovisual speech.
Figure 7. A schematic showing the approximate distribution of vowels in the midsagittal plane of the vocal tract. The exact nature of the coordination between articulators such as the tongue and jaw has been the subject of considerable examination for some time. In some instances, the jaw may compensate the motion of the tongue so that the place of constriction is changed while maintaining consistent tract volume [e.g., 15]. In other cases, jaw motion augments or follows the patterning of the lips [e.g., 16] and positioning of the
ishi
0 0 -2 -2
ipi
eshe
Pitch (deg)
-4
-4
-6
epe
-6
osho asha
-8
-8
-10
opo
0 2 4
-10
apa
-12 6 8 10 0 2 4 6 8 10
-12
Horizontal Translation (mm)
Horizontal Translation (mm)
Figure 8. Sagittal plane orientation plotted as a function of horizontal translation for VCV utterances where V is a, I, e, o and C is p or sh. Note the vowel portions of the paths are down and typically to the right (postitive translation) of the consonant orientation and position.
tongue. For example, vowels are produced with a certain amount of distinctiveness in articulatory and acoustic space [for discussion, see 17]. In particular, the distribution of vowels in midsagittal vocal tract space forms a triangle or quadrilateral. This is schematized in Figure 7. Figure 8 exemplifies how jaw motion adheres to similar pattern constraints in midsagittal orientation and position. The trajectories associated with the different target consonants, p and sh, differ considerably, yet the spatial distribution of the trajectory endpoints (usually down and to the right) adheres to the ordering of vowels shown in Figure 7.
attributed to a confounding of sometimes negative and other times positive correlation between the articulators. The other major contributor to the estimation was the motion of the cheeks. Together these two components accounted for as much as 90 percent of the tongue motion, as shown in Figure 11 [for details, see 5].
3.2 The jaw and facial motion

Recently, there has been a rapid growth of interest in the production of multi-modal, audiovisual speech behavior in order to investigate how the information from auditory and visual modalities is processed by perceivers. A first concern has been to determine the extent to which the behavior of vocal tract articulators is correlated with events visible on the face. In a series of studies, the role of the jaw and other articulators in producing visible phonetic correlates on the face has been examined for speakers of Japanese, English, and French [e.g., 4, 5]. Analysis of the midsagittal vocal tract and 3D facial position data has revealed strong correlations between jaw elevation and the facial deformations relevant to speech. A schematic overview of the types of data analyzed is shown in Figure 9.
15 6 7 5 4 2 13
14 2
17 11 16 10 7 5 3 4 8 6
18
12
9 3 1
Figure 10. Measured (thin line) motion of midsagittal tongue tip position, vertical jaw position, and acoustic RMS amplitude compared to values estimated (thick line) from face motion. Global correlations are given on the right of each panel.
Figure 9. Schematic of measurement locations for midsagittal vocal tract (tongue, lips, jaw) at left and 3D face surface at right. Principal component analysis (PCA) of facial motion for Japanese and English speakers has shown the jaw to be the major component of visible speech behavior, larger even than lip shape (rounding) information. For French speakers, lip shape information tends to be stronger than the jaw component. This language-specific difference demonstrates the flexible coupling of structural (jaw height) and functional (lip shape) constraints on spatiotemporal organization. Just as the jaw is the strongest component in recovering facial the motion, it is also the strongest component for recovering vocal tract motion, being loosely coupled structurally to the lips and tongue and tightly coupled functionally to both. Specifically, in attempting to estimate the midsagittal vocal tract behavior from the 3D motion of the face, vertical jaw motion inferred from the surface of the chin (Figure 9, right) accounted for more than 50 percent of the overall variability of the tongue tip measured midsagittally (see Figure 10). In Figure 10, note the similarity between the measured vertical jaw position and the estimation of vertical tongue tip. This indicates to us a strong degree of functional, rather than structural, coupling between motion of the jaw and tongue tip. Correlations of the tongue body and jaw were substantially worse, though this can be
Figure 11. Measured (thin line) motion and values estimated (thick lines) from facial motion for position of the midsagittal tongue tip and vertical jaw, and acoustic RMS.
The high global correlations observed between, for example, jaw motion and the spectral acoustics or facial motion can be attributed in large part to the association of the jaw and vowel production, which we believe accounts for about 70 percent of the time spent producing speech. That is, the lips and tongue may be more important to the production of consonants than the jaw, but their involvement is short in duration and discontinuous by nature [see 18].
CONCLUSION
This paper has summarized several types of investigation of the jaws movement behavior. Our comparative studies suggest that the jaws behavior is more varied in producing speech than in mastication. A major question that remains for further study is whether or not, the clear difference in task requirements are implemented by adjusting the parameters of the same neuromotor control system, or is it possible that quite different neural mechanisms are implemented to control jaw motion in speech and mastication?
[13] J. F. Lubker, The reorganization times of bite-block vowels, Phonetica, vol. 36, pp. 273-293, 1979. [14] T. J. Gay, B. Lindblom, and J. Lubker, Production of biteblock vowels: Acoustic equivalence by selective compensation, Journal of the Acoustical Society of America, vol. 69, pp. 802810, 1981. [15] M. Stone, A three-dimensional model of tongue movement based on ultrasound and x-ray microbeam data, Journal of the Acoustical Society of America, vol. 87, pp. 22072217, 1990. [16] V. L. Gracco and J. H. Abbs, Variant and Invariant Aspects of Speech Movements, Experimental Brain Research, vol. 65, pp. 156-166, 1986. [17] P. Ladefoged, A course in phonetics. Ney York: Harcourt Brace Jovanovich, 1975. [18] M. J. Macchi, Segmental and suprasegmental features and lip and jaw articulators: New York University, 1985.
REFERENCES
[1] D. J. Ostry, E. Vatikiotis-Bateson, and P. L. Gribble, An examination of the degrees of freedom of human jaw motion in speech and mastication, Journal of Speech, Language, and Hearing Research, vol. 40, pp. 1341-1351, 1997. [2] V. Sanguineti, R. Laboissiere, and D. J. Ostry, A dynamic biomechanical model for the neural control of speech production, Journal of the Acoustical Society of America, vol. 103, pp. 1615-1627, 1998. [3] M. Hirayama, E. Vatikiotis-Bateson, and M. Kawato, Physiologically based speech synthesis using neural networks, IEICE Transactions, vol. E76-A, pp. 1898-1910, 1993. [4] E. Vatikiotis-Bateson and H. Yehia, Physiological modeling of facial motion during speech, Trans. Tech. Com. Psycho. Physio. Acoust., vol. H-96-65, pp. 1-8, 1996. [5] H. C. Yehia, P. E. Rubin, and E. Vatikiotis-Bateson, Quantitative association of vocal-tract and facial behavior, Speech Communication, vol. 26, pp. 23-44, 1998. [6] E. Vatikiotis-Bateson and D. J. Ostry, An analysis of the dimensionality of jaw motion in speech, Journal of Phonetics, vol. 23, pp. 101-117, 1995. [7] D. J. Ostry and K. G. Munhall, Control of jaw orientation and position in mastication and speech, Journal of Neurophysiology, vol. 71, pp. 1515-1532, 1994. [8] J. Edwards and K. S. Harris, Rotation and translation of the jaw during speech, Journal of Speech and Hearing Research, vol. 33, pp. 550-562, 1990. [9] J. R. Westbury, Mandible and hyoid bone movements during speech, Journal of Speech and Hearing Research, vol. 31, pp. 405-416, 1988. [10] A. G. Feldman, Once more on the equilibrium point hypothesis of motor control, Journal of Motor Behavior, vol. 18, pp. 17-54, 1986. [11] R. Laboissirre, D. Ostry, and A. Feldman, Control of multi-muscle systems: human jaw and hyoid movements, Biological Cybernetics, vol. 74, pp. 373-384, 1996. [12] O. M. Hughes and J. H. Abbs, Labial-mandibular coordination in the production of speech: Implications for the operation of motor equivalence, Phonetica, vol. 33, pp. 199221, 1976.

Ieee 99

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ieee 99

Uploaded by

Copyright:

Available Formats

Analysis and Modeling of 3D Jaw Motion in Speech and Mastication

Vertical Pitch Roll

JAW DEGREES OF FREEDOM

Vertical Position (mm)

Horizontal Position (mm)

Horizontal Position (mm)

CONTROL OF JAW MOTION

Horizontal Position (mm)

Horizontal Position (mm)

2.1. Jaw motion and physiology

-5 -10 -15 -20 -25 -2 0 2 4 6 8 10

Horizontal Position (mm)

Horizontal Position (mm)

2.2. Modeling jaw motion

3.1 Functional constraints on jaw motion

THE JAW AND OROFACIAL MOTION

Horizontal Translation (mm)

Horizontal Translation (mm)

3.2 The jaw and facial motion

You might also like