You are on page 1of 27

sorry, I didn’t catch that!


an investigation of non-understandings
and recovery strategies
Dan Bohus www.cs.cmu.edu/~dbohus
Alexander I. Rudnicky www.cs.cmu.edu/~air

Computer Science Department


Carnegie Mellon University
Pittsburgh, PA, 15213
systems often do not understand correctly
 non-understandings and misunderstandings

NON-  System cannot extract any meaningful information


understanding
from the user’s turn
S: What city are you leaving from?
U: Urbana Champaign [OKAY IN THAT SAME PAY]

MIS-  System extracts incorrect information from the user’s


understanding
turn
S: What city are you leaving from?
U: Birmingham [BERLIN PM]

2
systems often do not understand correctly

NON-  System cannot extract any meaningful information


understanding
from the user’s turn
S: What city are you leaving from?
U: Urbana Champaign [OKAY IN THAT SAME PAY]
 detection
 typically trivial; although diagnosis is not
 strategies
 large space of strategies
 tradeoffs between them not well understood
 policy (knowing how to engage the strategies)

 simple heuristics: “incremental prompting”

3
questions under investigation
 what are the main causes of non-understandings?
 data
 how large is their impact on performance?

 how do various recovery strategies compare to each


other?

 what are the relationships between strategies and


user behaviors?

 can we improve global dialog performance by using a


smarter policy?

 if yes, can we learn a better policy from data?

4
data collection

 Roomline
 phone-based, mixed-initiative system
 conference room reservations
 experimental design
 control group: uninformed recovery policy
 wizard group: recovery policy implemented by wizard
 46 participants, first-time users
 tasks & experimental procedure
 up to 10 scenario-driven interactions

5
non-understanding recovery strategies
S: For when do you need the conference room?
1. ASK REPEAT
Could you please repeat that?
2. ASK REPHRASE
Could you please try to rephrase that?
3. NOTIFY (NTFY)
Sorry, I didn’t catch that ...
4. YIELD TURN (YLD)

5. REPROMPT (RP)
For when do you need the conference room?
6. DETAILED REPROMPT (DRP)
Right now I need to know the date and time for when you need the reservation …
7. MOVE-ON
Sorry, I didn’t catch that. For which day you need the room?
8. YOU CAN SAY (YCS)
Sorry, I didn’t catch that. For when do you need the conference room? You can say something
like tomorrow at 10 am …
9. TERSE YOU CAN SAY (TYCS)
Sorry, I didn’t catch that. You can say something like tomorrow at 10 am …
10. FULL HELP (HELP)
Sorry, I didn’t catch that. I am currently trying to make a conference room reservation for you.
Right now I need to know the date and time for when you need the reservation. You can say
something like tomorrow at 10 am …

6
corpus statistics

 449 sessions
 8278 user turns
 utterances transcribed and checked
 manual annotations
 misunderstandings
 correct concept values at each turn
 sources of understanding errors
 user response-types to recovery strategies

7
questions under investigation

 data

 what are the main causes of non-understandings?

 how large is their impact on performance?

 how do various recovery strategies compare to each


other?

 what are the relationships between strategies and


user behaviors?

8
causes of non-understandings
user system
Goal
conversation
level Interpreta
tion

intention Semantics
level
Parsing

Text
signal
level Recognitio
n

Audio
channel
level End-
channel pointing

9
causes of non-understandings

out-of-application
conversation 16%
level

out-of-grammar
intention 16%
level

ASR error
signal 62%
level

endpointer error
channel
level

10
questions under investigation

 data

 what are the main causes of non-understandings?

 how large is their impact on performance?

 how do various recovery strategies compare to each


other?

 what are the relationships between strategies and


user behaviors?

data : causes of non-understandings : impact on performance : strategy comparison : user behaviors 11


modeling impact on performance
 logistic regression
1
 P(Task Success) =
1 + e-( α + β·FNON)
1

0.8
P(Task Success = 1)

0.6

0.4

0.2

0
0 10% 20% 30% 40% 50%
% Nonunderstandings (FNON)

12
questions under investigation

 data

 what are the main causes of non-understandings?

 how large is their impact on performance?

 how do various recovery strategies compare to each


other?

 what are the relationships between strategies and


user behaviors?

data : causes of non-understandings : impact on performance : strategy comparison : user behaviors 13


strategy performance – recovery rate
8
0
%

ryra
ratete 7
0
%

6
0
%
e

5
0
%
cov
recovery

4
0
%
Re

3
0
%

2
0
%

1
0
%
E

P
P

D
P
S
0
%
V

P
C

E
L

F
R
C

L
pt pt ify peat
O

ay

R
n lp ay se
E

R
el d

T
Y

Y
D
eO

Y
e S a
M

S
H

N
t

A
A
m m i
T

v H n o n r o o Y
Mo u Ca Pr u Ca e ph e pr N
kRe
e o R s
e Yo R Y
A sk e dR A
r s i l
Te ta
De

 overall logistic ANOVA


 significant differences in mean recovery rates
 all pairs comparison (corrected using FDR)

14
questions under investigation

 data

 what are the main causes of non-understandings?

 how large is their impact on performance?

 how do various recovery strategies compare to each


other?

 what are the relationships between strategies and


user behaviors?

data : causes of non-understandings : impact on performance : strategy comparison : user behaviors 15


user response types

 tagging scheme by Shin


 also used by Choularton, Raux

 5 categories
 repeat
 rephrase
 contradict
 change
 other

16
response types after non-understaning

50%
Communicator (Shin et al.)
40% Pizza (choularton & dale)

Roomline (this study)


30%

20%

10%

0%
rephrase repeat contradict change other

17
user response types by strategy

100%
Other
80%
Change

60% Rephrase

Repeat
40%

20%

0%
m pt tify mpt On ield eat ase Say elp Say
P ro No pro Move Y kRep ephr Can H an
C
Re dRe As skR You You
e A rse
tail Te
De

18
summary
 sources of non-understandings
 asr, but also “language” errors → more shaping strategies …
 impact on performance
 regression model allows better quantitative assessment
 strategy comparison
 help, “move-on” → further investigate “move-on”
 user responses
 margin for improving control over user responses
 can we improve global dialog performance by
using a smarter policy?
 yes
 can we learn a better policy from data?
 preliminary results promising … 

19
thank you! questions …

20
rejections

Misunderstandings
Non-understandings
Correct understandings

Before
rejection
mechanism
After
rejection
mechanism
0 20% 40% 60% 80% 100%

Correct False
rejections rejections
Figure 3. Misunderstandings and non-understandings before and after rejections

21
strategy performance assessment
 recovery rate
 recovery utility
 weighted sum of correctly and incorrectly acquired
concepts
 weights are determined in a data-driven fashion
 recovery efficiency
 also takes time to recovery into account

22
experimental design: scenarios
 10 scenarios, fixed order
 presented graphically (explained during briefing)

23
strategy pair-wise comparison
 recovery performance ranked list, based on pair-wise
t-tests:
RNK MOVE HELP TYCS RP YCS ARPH DRP NTFY AREP YLD

MOVE 1 MOVE: - - - 1.31 1.33 1.35 1.71 1.8 1.91 2.06


HELP 2 HELP: - - - - - - 1.55 1.64 1.73 1.87
HELP 3 TYCS: - - - - - - 1.5 1.58 1.68 1.81
SIG 4 RP: - - - - - - - - 1.46 1.58
HELP 5 YCS: - - - - - - - - 1.44 1.55
SIG 6 ARPH: - - - - - - - - 1.42 1.53
SIG ? DRP: - - - - - - - - - -
SIG ? NTFY: - - - - - - - - - -
SIG ? AREP: - - - - - - - - - -
SIG ? YLD: - - - - - - - - - -

 CER evaluation shows similar results

24
recovery for various response-types

80%

70%

60%
Recovery rate

50%

40%

30%

20%

10%

0
Repeat Rephrase Change Other

25
26
impact of recovery rate on performance
 recovery = next turn is correctly understood
1
 P(Task Success) =
1 + e-( α + β·RecoveryRate)
1

0.8
P(Task Success=1)

0.6

0.4

0.2

0
0 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Non-understanding recovery rate

27

You might also like