Professional Documents
Culture Documents
Aaron Brown
CS 294-4 ROC Seminar
Outline
Human error and computer system failures
A theory of human error
Human error and accident theory
Addressing human error
Slide 2
% of System Crashes
Other
18%
90%
80%
70%
60%
53%
50%
40%
30%
20%
10%
0%
1985
Time (1985-1993)
18%
10%
1993
System
management
Software
failure
Hardware
failure
Slide 3
9%
22%
Human-co.
Human-ext.
5%
Hardware Failure
Software Failure
47%
17%
Overload
Vandalism
comparison with 1992-4 data shows that human error is the only factor that is not
improving over time
Slide 4
Trend
2001
1992-94
98
176
100
75
49
49
Software
15
12
Overload
314
60
Vandalism
3
Slide 5
Windows
Solaris
Linux
Unsuccessful Repair
35
33
31
Slide 6
Outline
Human error and computer system failures
A theory of human error
Human error and accident theory
Addressing human error
Slide 8
2) storage
the selected plan is stored in memory until it is
appropriate to carry it out
3) execution
the plan is implemented by the process of carrying out
the actions specified by the plan
Slide 9
rule-based: mistakes
usually a result of picking an inappropriate rule
caused by misconstrued view of state, over-zealous
pattern matching, frequency gambling, deficient rules
knowledge-based: mistakes
due to incomplete/inaccurate understanding of system,
confirmation bias, overconfidence, cognitive strain, ...
Error frequencies
In raw frequencies, SB >> RB > KB
61% of errors are at skill-based level
27% of errors are at rule-based level
11% of errors are at knowledge-based level
Slide 13
Outline
Human error and computer system failures
A theory of human error
Human error and accident theory
Addressing human error
Slide 15
Slide 16
Outline
Human error and computer system failures
A theory of human error
Human error and accident theory
Addressing human error
general guidelines
the ROC approach: system-level undo
Slide 18
Slide 21
Slide 24
Slide 25
Undo details
Examples where Undo would help:
reverse the effects of a mistyped command (rm rf *)
roll back a software upgrade without losing user data
retroactively install virus filter on email server; effects
of virus are squashed on redo
Summary
Humans are critical to system dependability
human error is the single largest cause of failures