Professional Documents
Culture Documents
–
an investigation of non-understandings
and recovery strategies
Dan Bohus www.cs.cmu.edu/~dbohus
Alexander I. Rudnicky www.cs.cmu.edu/~air
2
systems often do not understand correctly
3
questions under investigation
what are the main causes of non-understandings?
data
how large is their impact on performance?
4
data collection
Roomline
phone-based, mixed-initiative system
conference room reservations
experimental design
control group: uninformed recovery policy
wizard group: recovery policy implemented by wizard
46 participants, first-time users
tasks & experimental procedure
up to 10 scenario-driven interactions
5
non-understanding recovery strategies
S: For when do you need the conference room?
1. ASK REPEAT
Could you please repeat that?
2. ASK REPHRASE
Could you please try to rephrase that?
3. NOTIFY (NTFY)
Sorry, I didn’t catch that ...
4. YIELD TURN (YLD)
…
5. REPROMPT (RP)
For when do you need the conference room?
6. DETAILED REPROMPT (DRP)
Right now I need to know the date and time for when you need the reservation …
7. MOVE-ON
Sorry, I didn’t catch that. For which day you need the room?
8. YOU CAN SAY (YCS)
Sorry, I didn’t catch that. For when do you need the conference room? You can say something
like tomorrow at 10 am …
9. TERSE YOU CAN SAY (TYCS)
Sorry, I didn’t catch that. You can say something like tomorrow at 10 am …
10. FULL HELP (HELP)
Sorry, I didn’t catch that. I am currently trying to make a conference room reservation for you.
Right now I need to know the date and time for when you need the reservation. You can say
something like tomorrow at 10 am …
6
corpus statistics
449 sessions
8278 user turns
utterances transcribed and checked
manual annotations
misunderstandings
correct concept values at each turn
sources of understanding errors
user response-types to recovery strategies
7
questions under investigation
data
8
causes of non-understandings
user system
Goal
conversation
level Interpreta
tion
intention Semantics
level
Parsing
Text
signal
level Recognitio
n
Audio
channel
level End-
channel pointing
9
causes of non-understandings
out-of-application
conversation 16%
level
out-of-grammar
intention 16%
level
ASR error
signal 62%
level
endpointer error
channel
level
10
questions under investigation
data
0.8
P(Task Success = 1)
0.6
0.4
0.2
0
0 10% 20% 30% 40% 50%
% Nonunderstandings (FNON)
12
questions under investigation
data
ryra
ratete 7
0
%
6
0
%
e
5
0
%
cov
recovery
4
0
%
Re
3
0
%
2
0
%
1
0
%
E
P
P
D
P
S
0
%
V
P
C
E
L
F
R
C
L
pt pt ify peat
O
ay
R
n lp ay se
E
R
el d
T
Y
Y
D
eO
Y
e S a
M
S
H
N
t
A
A
m m i
T
v H n o n r o o Y
Mo u Ca Pr u Ca e ph e pr N
kRe
e o R s
e Yo R Y
A sk e dR A
r s i l
Te ta
De
14
questions under investigation
data
5 categories
repeat
rephrase
contradict
change
other
16
response types after non-understaning
50%
Communicator (Shin et al.)
40% Pizza (choularton & dale)
20%
10%
0%
rephrase repeat contradict change other
17
user response types by strategy
100%
Other
80%
Change
60% Rephrase
Repeat
40%
20%
0%
m pt tify mpt On ield eat ase Say elp Say
P ro No pro Move Y kRep ephr Can H an
C
Re dRe As skR You You
e A rse
tail Te
De
18
summary
sources of non-understandings
asr, but also “language” errors → more shaping strategies …
impact on performance
regression model allows better quantitative assessment
strategy comparison
help, “move-on” → further investigate “move-on”
user responses
margin for improving control over user responses
can we improve global dialog performance by
using a smarter policy?
yes
can we learn a better policy from data?
preliminary results promising …
19
thank you! questions …
20
rejections
Misunderstandings
Non-understandings
Correct understandings
Before
rejection
mechanism
After
rejection
mechanism
0 20% 40% 60% 80% 100%
Correct False
rejections rejections
Figure 3. Misunderstandings and non-understandings before and after rejections
21
strategy performance assessment
recovery rate
recovery utility
weighted sum of correctly and incorrectly acquired
concepts
weights are determined in a data-driven fashion
recovery efficiency
also takes time to recovery into account
22
experimental design: scenarios
10 scenarios, fixed order
presented graphically (explained during briefing)
23
strategy pair-wise comparison
recovery performance ranked list, based on pair-wise
t-tests:
RNK MOVE HELP TYCS RP YCS ARPH DRP NTFY AREP YLD
24
recovery for various response-types
80%
70%
60%
Recovery rate
50%
40%
30%
20%
10%
0
Repeat Rephrase Change Other
25
26
impact of recovery rate on performance
recovery = next turn is correctly understood
1
P(Task Success) =
1 + e-( α + β·RecoveryRate)
1
0.8
P(Task Success=1)
0.6
0.4
0.2
0
0 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Non-understanding recovery rate
27