Sigdial 05

sorry, I didn’t catch that!
–
an investigation of non-understandings
and recovery strategies
Dan Bohus www.cs.cmu.edu/~dbohus
Alexander I. Rudnicky www.cs.cmu.edu/~air
Computer Science Department

Carnegie Mellon University
Pittsburgh, PA, 15213
systems often do not understand correctly
 non-understandings and misunderstandings
NON-  System cannot extract any meaningful information

understanding
from the user’s turn
S: What city are you leaving from?
U: Urbana Champaign [OKAY IN THAT SAME PAY]
MIS-  System extracts incorrect information from the user’s

understanding
turn
U: Birmingham [BERLIN PM]
2
systems often do not understand correctly
NON-  System cannot extract any meaningful information

understanding
from the user’s turn
U: Urbana Champaign [OKAY IN THAT SAME PAY]
 detection
 typically trivial; although diagnosis is not
 strategies
 large space of strategies
 tradeoffs between them not well understood
 policy (knowing how to engage the strategies)
 simple heuristics: “incremental prompting”
3
questions under investigation
 what are the main causes of non-understandings?
 data
 how large is their impact on performance?
 how do various recovery strategies compare to each

other?
 what are the relationships between strategies and

user behaviors?
 can we improve global dialog performance by using a

smarter policy?
 if yes, can we learn a better policy from data?
4
data collection
 Roomline
 phone-based, mixed-initiative system
 conference room reservations
 experimental design
 control group: uninformed recovery policy
 wizard group: recovery policy implemented by wizard
 46 participants, first-time users
 tasks & experimental procedure
 up to 10 scenario-driven interactions
5
non-understanding recovery strategies
S: For when do you need the conference room?
1. ASK REPEAT
Could you please repeat that?
2. ASK REPHRASE
Could you please try to rephrase that?
3. NOTIFY (NTFY)
Sorry, I didn’t catch that ...
4. YIELD TURN (YLD)
…
5. REPROMPT (RP)
For when do you need the conference room?
6. DETAILED REPROMPT (DRP)
Right now I need to know the date and time for when you need the reservation …
7. MOVE-ON
Sorry, I didn’t catch that. For which day you need the room?
8. YOU CAN SAY (YCS)
Sorry, I didn’t catch that. For when do you need the conference room? You can say something
like tomorrow at 10 am …
9. TERSE YOU CAN SAY (TYCS)
Sorry, I didn’t catch that. You can say something like tomorrow at 10 am …
10. FULL HELP (HELP)
Sorry, I didn’t catch that. I am currently trying to make a conference room reservation for you.
Right now I need to know the date and time for when you need the reservation. You can say
something like tomorrow at 10 am …
6
corpus statistics
 449 sessions
 8278 user turns
 utterances transcribed and checked
 manual annotations
 misunderstandings
 correct concept values at each turn
 sources of understanding errors
 user response-types to recovery strategies
7
 data

other?

user behaviors?
8
causes of non-understandings
user system
Goal
conversation
level Interpreta
tion
intention Semantics
level
Parsing
Text
signal
level Recognitio
n
Audio
channel
level End-
channel pointing
9
causes of non-understandings
out-of-application
conversation 16%
level
out-of-grammar
intention 16%
level
ASR error
signal 62%
level
endpointer error
channel
level
10
 data

other?

user behaviors?
data : causes of non-understandings : impact on performance : strategy comparison : user behaviors 11

modeling impact on performance
 logistic regression
1
 P(Task Success) =
1 + e-( α + β·FNON)
1
0.8
P(Task Success = 1)
0.6
0.4
0.2
0
0 10% 20% 30% 40% 50%
% Nonunderstandings (FNON)
12
 data

other?

user behaviors?

strategy performance – recovery rate
8
0
%
ryra
ratete 7
0
%
6
0
%
e
5
0
%
cov
recovery
4
0
%
Re
3
0
%
2
0
%
1
0
%
E
P
P
D
P
S
0
%
V
P
C
E
L
F
R
C
L
pt pt ify peat
O
ay
R
n lp ay se
E
R
el d
T
Y
Y
D
eO
Y
e S a
M
S
H
N
t
A
A
m m i
T
v H n o n r o o Y
Mo u Ca Pr u Ca e ph e pr N
kRe
e o R s
e Yo R Y
A sk e dR A
r s i l
Te ta
De
 overall logistic ANOVA

 significant differences in mean recovery rates
 all pairs comparison (corrected using FDR)
14
 data

other?

user behaviors?

user response types
 tagging scheme by Shin

 also used by Choularton, Raux
 5 categories
 repeat
 rephrase
 contradict
 change
 other
16
response types after non-understaning
50%
Communicator (Shin et al.)
40% Pizza (choularton & dale)
Roomline (this study)

30%
20%
10%
0%
rephrase repeat contradict change other
17
user response types by strategy
100%
Other
80%
Change
60% Rephrase
Repeat
40%
20%
0%
m pt tify mpt On ield eat ase Say elp Say
P ro No pro Move Y kRep ephr Can H an
C
Re dRe As skR You You
e A rse
tail Te
De
18
summary
 sources of non-understandings
 asr, but also “language” errors → more shaping strategies …
 impact on performance
 regression model allows better quantitative assessment
 strategy comparison
 help, “move-on” → further investigate “move-on”
 user responses
 margin for improving control over user responses
 can we improve global dialog performance by
using a smarter policy?
 yes
 can we learn a better policy from data?
 preliminary results promising … 
19
thank you! questions …
20
rejections
Misunderstandings
Non-understandings
Correct understandings
Before
rejection
mechanism
After
rejection
mechanism
0 20% 40% 60% 80% 100%
Correct False
rejections rejections
Figure 3. Misunderstandings and non-understandings before and after rejections
21
strategy performance assessment
 recovery rate
 recovery utility
 weighted sum of correctly and incorrectly acquired
concepts
 weights are determined in a data-driven fashion
 recovery efficiency
 also takes time to recovery into account
22
experimental design: scenarios
 10 scenarios, fixed order
 presented graphically (explained during briefing)
23
strategy pair-wise comparison
 recovery performance ranked list, based on pair-wise
t-tests:
RNK MOVE HELP TYCS RP YCS ARPH DRP NTFY AREP YLD
MOVE 1 MOVE: - - - 1.31 1.33 1.35 1.71 1.8 1.91 2.06

HELP 2 HELP: - - - - - - 1.55 1.64 1.73 1.87
HELP 3 TYCS: - - - - - - 1.5 1.58 1.68 1.81
SIG 4 RP: - - - - - - - - 1.46 1.58
HELP 5 YCS: - - - - - - - - 1.44 1.55
SIG 6 ARPH: - - - - - - - - 1.42 1.53
SIG ? DRP: - - - - - - - - - -
SIG ? NTFY: - - - - - - - - - -
SIG ? AREP: - - - - - - - - - -
SIG ? YLD: - - - - - - - - - -
 CER evaluation shows similar results
24
recovery for various response-types
80%
70%
60%
Recovery rate
50%
40%
30%
20%
10%
0
Repeat Rephrase Change Other
25
26
impact of recovery rate on performance
 recovery = next turn is correctly understood
1
 P(Task Success) =
1 + e-( α + β·RecoveryRate)
1
0.8
P(Task Success=1)
0.6
0.4
0.2
0
0 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Non-understanding recovery rate
27

Sigdial 05

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sigdial 05

Uploaded by

Copyright:

Available Formats

sorry, I didn’t catch that!

Computer Science Department

NON-  System cannot extract any meaningful information

MIS-  System extracts incorrect information from the user’s

NON-  System cannot extract any meaningful information

 simple heuristics: “incremental prompting”

 how do various recovery strategies compare to each

 what are the relationships between strategies and

 can we improve global dialog performance by using a

 if yes, can we learn a better policy from data?

 what are the main causes of non-understandings?

 how large is their impact on performance?

 how do various recovery strategies compare to each

 what are the relationships between strategies and

 what are the main causes of non-understandings?

 how large is their impact on performance?

 how do various recovery strategies compare to each

 what are the relationships between strategies and

data : causes of non-understandings : impact on performance : strategy comparison : user behaviors 11

 what are the main causes of non-understandings?

 how large is their impact on performance?

 how do various recovery strategies compare to each

 what are the relationships between strategies and

data : causes of non-understandings : impact on performance : strategy comparison : user behaviors 13

 overall logistic ANOVA

 what are the main causes of non-understandings?

 how large is their impact on performance?

 how do various recovery strategies compare to each

 what are the relationships between strategies and

data : causes of non-understandings : impact on performance : strategy comparison : user behaviors 15

 tagging scheme by Shin

Roomline (this study)

MOVE 1 MOVE: - - - 1.31 1.33 1.35 1.71 1.8 1.91 2.06

 CER evaluation shows similar results

You might also like