You are on page 1of 8

Credits

Perceptual assessment of voice - past, present and future


The ideas presented here were developed in collaboration with Bruce Gerratt (UCLA). This research was supported by grant DC01797 from the National Institutes on Deafness and Other Communication Disorders.

Jody Kreiman, PhD University of California, Los Angeles

Outline
Quick review of the venerable history of current quality assessment protocols Discussion of theoretical reasons why these protocols remain unsatisfactory measurement tools Presentation of a psychological model of quality perception; and Description of the way in which this perceptual model can lead to psychoacoustic models of voice quality and reliable, valid, practical clinical measurement protocols.

Learner objectives and outcomes


Address historical changes in dysphonia assessment as well as future directions for researchers and clinicians. Describe key aspects of the perception of voice.

Why care about voice quality? I. INTRODUCTION


Most relevant aspect of voice to patient Document treatment progress Assess treatment efficacy Voices convey substantial information about speakers

Why include the listener? Why not just measure the acoustic signal?
Just as loudness and pitch do not exist without the listener, vocal quality is an acoustic-PERCEPTUAL phenomenon. We must be able to model listeners responses in order to reach our ultimate goal: a theoretical understanding sufficient to relate the perceived sound of a voice to the physiology that produced it, and physiology to the resultant percept.

II. THE PAST

The venerable approach to quality measurement


Create lists of terms to describe listeners auditory impressions Long history of verbal rating scales for voice quality. They are ingrained in Western culture, familiar, easy to apply, easy to understand, and have the ring of truth.

Ancient and modern labels for voice quality


Julius Pollux Moore, 1964 Brassy Brilliant Clear Deep Dull Harsh Shrill, sharp Thin Brassy, metallic Brilliant, bright Clear, white Deep Dull, dead Harsh, strident Shrill, sharp Thin Gelfer, 1988 Metallic Bright, vibrant Clear Resonant, low Dull Harsh Shrill, sharp Thin

Rating scale approaches to quality measurement


GRBAS protocol
Grade, Roughness, Breathiness, Asthenicity, Strain

CAPE-V protocol
Developed from a consensus meeting Design goals:
minimal set of meaningful parameters measures obtainable expediently applicable to a broad range of voices and settings reliable and valid with exemplars available for training

Stockholm Voice Evaluation Consensus Model


Aphonic, Breathy, Tense, Lax, Creaky, Rough, Grating, Unstable, Voice Breaks, Diplophonic

Same familiar, traditional scales (breathiness, roughness, strain, loudness, pitch)

Why consider alternate approaches?


Atheoretical approach Which scales to include?
Redundancies and ambiguities Multidimensional scaling and factor analytic studies have not resolved this problem

Vagaries of scale definition


Breathiness = dry, hard, excited, pointed, cold, choked, rough, cloudy, sharp, poor, bad (Isshiki et al.) or: Breathiness = breathy, wheezing, lack of timbre, moments of aphonia, husky, not creaky? (Hammarberg et al.)

Voice profile analysis


Consistent with phonetic theory Specifies how scales are related to each other: e.g., hoarse voice = deep, (loud), harsh/ventricular, whispery voice; gruff voice = deep, harsh, whispery, creaky voice Specifies where information about quality might be, but does not model listeners behavior

Voice profile analysis

The first elephant in the room: Validity


The problem of what qualities to measure has never been solved, leaving validity up in the air.

The second elephant: reliability


What we REALLY want to know: what is the likelihood that another rater will produce the same rating for a given voice sample? This is not what most assessments of reliability measure.

Standard reliability approaches


Standard reliability tests use statistics that measure the likelihood that a new random sample of raters would produce the same mean rating as the group studied, averaged across all the voices studied.

How to measure reliability?


These two approaches can lead to very different conclusions about reliability.
Kreiman et al., 1993 Kreiman et al., 1994 Kreiman & Gerratt, 1996 ICC = 0.99 = 0.99 ICC = 0.93 = 0.97 ICC = 0.89 = 0.90 P (exact) = 0.32

Probabilities of exact agreement

P (exact) = 0.21

P (exact) = 0.26
Roughness ratings Breathiness ratings

Solutions to unreliable rater problem


Average ratings to achieve reliable mean Other creative statistical techniques Train listeners Use fewer scale values Anchored protocols Give up and just ask the patient about satisfaction with voice/quality of life Substitute objective measures

Objective approaches to quality assessment


Acoustic assessment protocols
Dysphonia Severity Index Hoarseness Diagram Multidimensional Voice Program (MDVP)

Depend on inconsistent correlations with perceptual measures for validity as measures of quality

What to do?
Find the sources of variability. Develop alternative measurement approaches that target and reduce this variability.

III. THE PRESENT


A psychological model of voice perception

How listeners introduce variability


The literature provides evidence for four factors that introduce variability into measurements of quality:
Instability of internal standards for different qualities; Difficulties isolating individual attributes in complex acoustic voice patterns; Measurement scale resolution; The magnitude of the attribute being measured.

Experimental evidence
Four experimental factors, corresponding to these four theoretical factors:
Presence/absence of comparison stimuli; Comparison stimuli that were/were not matched to the voices being rated; Visual analog versus 6-point rating scales; The overall mean rating for each voice

Listeners should agree best when all factors are controlled, and worst when nothing is controlled.

Controlling the factors


Six experimental tasks:
Four with and two without comparison stimuli
Two with custom comparison stimuli, and two with generic comparison stimuli

Results
These four factors accounted for 84.2% of the variance in the likelihood that listeners would agree exactly in their ratings.

Continuous (visual analog) versus 6-point scales for breathiness Overall mean rating for each voice included as a covariate in the ANCOVA analysis

Unmatched anchors are worse than no anchors

So: An ideal quality assessment protocol would


1. 2. 3.

Avoid reliance on internal standards and help listeners focus attention Not depend on selection/definition of labels for quality dimensions Have fine scale resolution An analysis-by-synthesis approach meets these criteria.

Continuous scale, matched anchor stimuli

Six point scale, generic anchor stimuli

IV. THE FUTURE

What now?
So: We have a model of voice quality perception that shows us how to measure quality reliably and validly. Based on this model, we have developed a tool for assessing the perceptual importance of different acoustic parameters. Unfortunately, this model may never translate directly into a practical clinical application. HOWEVER

What now?
We can use these methods to perceptually validate acoustic measures and derive a true PSYCHOACOUSTIC model of voice quality.
For example, dB is a perceptually validated measure that relates intensity to perceived loudness.

Psychoacoustic modeling
Such a psychoacoustic model could eliminate the need for subjective quality measures, because:
The perceptual importance of each acoustic parameter can be established; interactions among parameters can be modeled; and the composite set of parameters can be selected so that it is adequate to specify voice quality.

How to build a model


Listeners typically pay attention to attributes that vary from voice to voice, so
Find the acoustic attributes that vary the most from voice to voice. Test the perceptual significance of these parameters. Using speech synthesis, evaluate the effectiveness of the parameters as a set for modeling voice quality.

How to build a model


The end point of psychoacoustic modeling efforts is a set of perceptually-valid acoustic parameters. These parameters can be used to evaluate changes in voice quality in the clinic, because they are objective measures whose relationship to quality is understood theoretically.

The final step: A theory of voice


Development of a set of perceptually-valid acoustic measures would represent a step towards a complete theory of voice that relates changes in patterns of vocal fold vibration to changes in quality. Such a model would provide a theoretical basis for clinical assessment, because it would specify causal links from laryngeal physiology, to voice acoustics, to quality, and back.

A theory of voice
We submit that development of such a comprehensive theory should be the primary goal of voice research.

Summary
The last 2000 years have produced awareness and descriptions of the importance of voice and its uses Previous work has not led to very much understanding of the whys of quality, so measurement techniques remain unsatisfying. We may be quite near a solution to this long-term problem. Even more ambitious goals are obtainable once the problem of generating reliable and valid measures of voice is solved.

Conclusion
When we cannot measure, our knowledge is meager and unsatisfactory.
Attributed to Lord Kelvin

If it exists, it exists in amounts, and if it exists, it can be measured.


Lord Thorndyke

You might also like