You are on page 1of 6

Reliability Reliability

"Reliability is the probability of a device performing its purpose


adequately for the period of time intended under the operating
The probability that an item will fail in conditions encountered." Billington and Allen (1983).
Reliability theory was originally developed for estimating the
the interval from 0 to time t is reliabilities of physical devices.
F (t) The source of the reliability failures of physical devices is typically
the physical deterioration of the materials used in their construction.
Reliability is This physical deterioration provides the basis of stochastic reliability
modeling, since the deterioration is assumed to vary randomly with
R (t) = 1 - F (t) time.

IEEE. Software Engineering Standards. Third ed., New York: IEEE, 1989.

Handbooks Reliability Theory


In many cases users of electronic components are
Used to estimate the reliabilities of individual devices, such as
able to access reliability estimates for individual electronic components,
components, such as resistors, or families of and the reliabilities of systems constructed of components.
components such as germanium small signal Mathematical based on probability theory.
transistors (AT&T Reliability Handbook).
Engineers use such individual reliability estimates,
along with the mathematical theory of reliability,
when deciding if an assembly including them
meets reliability requirements.

Component Level Estimation Data Collection Approach


Using the data collection approach, the physical
Reliability at the component level can be done device is observed and failure data recorded.
either based on physical principles or data This data can be collected under laboratory
conditions, or by observing the devices in the field.
collection. This information is then used in the context of a
mathematical model to calculate a reliability figure.
Reliability estimates based on physical principles For example, aircraft manufacturers collect
use knowledge of the materials in the device to extensive failure data on aircraft components such as
motors, hydraulic pumps, etc.
make estimates. Since it will be impossible to run these tests under
all environmental conditions, some extrapolation is
For example, automobile tires eventually wear out. We necessary to develop more widely applicable
know that they will not last for a billion miles of use, or reliability estimates.
for a million. Extrapolations are necessary to get from typical
testing conditions such as higher temperatures to
realistic use conditions which may involve much lower
temperatures.
Definitions Reliability Indicators
ANSI IEEE Standard Glossary of Software Engineering
Terminology. expected number of failures in a given time
An error is "A discrepancy between a computed,
observed, or measured value or condition and the
period
true, specified, or theoretically correct value or
condition.
average time between failures
average down time
A failure is "The termination of the ability of a
functional unit to perform its required function." expected revenue loss due to failure
A fault is "An accidental condition that causes a expected loss of output due to failure
functional unit to fail to perform its required
function"

Reliability of a Series System Series Example


R1 R2 R3
System R has three components
n
R1 = 0.90
Rs = Ri
i =1 R2 = 0.92
R3 = 0.98
Assumes that items in the series are independent
The system Reliability Rs =
All items must work for the system to work (0.90)(0.92)(0.98) = 0.81

Software Pipeline Architecture Parallel System Reliability


If the three components in the last example
are software filters in a pipeline
architecture, can we do the reliability
estimate?
How do we get the reliabilities of the individual
components?
Are they independent?
What might cause them to fail?
Parallel Example Complex Systems
System R has three components A complex system may contain series and
R1 = 0.90 parallel components and may include cross
R2 = 0.92 links.
R3 = 0.98 There are various equations for computing
The system Reliability Rs = the reliabilities of these systems.
1- (1 - 0.90)(1 - 0.92)(1 - 0.98) = 0.999

Complex System Example k out of n systems


System with n components in parallel will
function iff at least k of those parallel components
are functioning (1<=k<=n)

R2,3,4 = 0.936 R6,7 = 0.96

Rs = R1 * R2.3.4*R5*R6,7*R8 = 0.761

SUPER SUPER RBD Language


A simple formal language, the RBD
SUPER is a software package that provides language, was developed to allow the
description of block diagrams in
computational support for the separate SUPER. For example, a series system
called unamit comprising units U1,
U2, and U3 is described in SUPER as
maintenance model and for other useful unamit = s(U1/U2/U3).
Nested RBDs are handled in the RBD W1

system reliability descriptions. language by naming subtending blocks


and using that name in the RBD
U1 U2

W2

language statement defining the


SUPER has been used in the structure that includes those blocks.
For instance, the RBD in Figure 2 can Figure 2

telecommunications industry for over 15


be represented by the statements
U3 = p(W1/W2)
years. unamit = s(U1/U2/U3).
Software Reliability Mean Time Between Failures
"Software reliability is defined as the probability that a software fault
that causes deviations from the required output by more than a MTBF = MTTF + MTTR
specified tolerance, in a specified environment, does not occur during a
specified exposure period." Ralston and Reilly (1983) Where MTTF is mean time to failure
Software reliability is the probability of failure free operation of a
computer program in a specified environment for a specified time.
Musa and Ianino (1987).
MTTR is mean time to repair
e.g. program myprog has a reliability estimate of .92 over ten hours
of processing time. That is, if myprog were executed 100 times and
use 10 hours of processing time it is likely to fail 8 times out of
100.
The applicability of hardware reliability theory to software is the
subject of debate.

Mean Time to Failure System Availability


i
MTTF = tk
k=1

i Availability is
t k = time between failures
1. The probability that software will be able to perform its
i = number of failures designated system function when required for use.
2. The ratio of system up-time to total operating time.
Times to failures: 280, 675, 315, 212, 278, 503, 3. The ability of an item to perform its designated
431 function when required for use.

IEEE Software Engineering Standards, 3rd Edition


MTTF = 2594 / 7 = 370.57.

Key Difference Failure Modes Analysis


A key difference between software and Failure modes analysis is a standard
hardware reliability analysis is that software engineering technique for process and
does not physically wear out. product improvement, providing a
Thus the stochastic modeling must be systematic procedure for determining and
based on some other source. classifying the ways that a product or
What are the root causes of software failure? process can fail.
Reuse Failure Modes Model
Try to Reuse
Reuse Survey - Problems (1 of 2)
What problems have you had in trying to reuse
Part Exists
software?
Part Available - compiler problems and missing include files

- no library available
Part Found
- no time allowed for search
Part Understood
- didn't compile with new compiler

Part Valid - wasn't portable to new hardware

Part Integrated - wasn't fast enough

- typically, new engineers are not told of existing reuse libraries.

Reuse Survey - Problems (2 of 2) Reuse Failure Modes (1 of 3)


- classified information cannot be reused
Failure Cause
- cultural problems Responses
- lack of flexibility to fit in my design.
No Attempt to Reuse Lack of Education xxxxx
- too much dependence on ancillary (global) software to be portable.
No Economic Incentive
- lack of usable documentation. No Success Model
NIH Syndrome
- specify exactly what is needed and identifying the appropriate parts.
Time Constraints x
- the code I was trying to reuse usually had to be modified to some degree as it was not More Fun to Write
created with reuse in mind.
Non-Egoless Programming
- knowing what I wanted it to do and finding out what it did. Legal Problems
Utility of Reuse Unclear xxx
- different language. so part not 100% applicable, so have to modify it, so have to understand it in
detail. Insufficient Funding x
Reuse Technology Immature x
- knowing where to look.

Reuse Failure Modes (2 of 3) Reuse Failure Modes (3 of 3)


Part Isn't Understood Insufficient Representation x
Failure Cause
Responses Poor Education
Part Too Complex x
Part Doesn't Exist No Economic Incentive
Novel Technology Part Isn't Valid Poor Testing
Part Isn't Available No Import Organization x
Part Can't Be Scavenged Insufficient Information x
Part Not Designed for Reuse x Lack of Standards xx
Part Can't Be Found Out There x
Part is Proprietary/Classified x Part Can't Be Integrated Language Incompatibilities xxx
Source Code Missing
Improper Form x
Part Isn't Found Insufficient Representation Non-Functional Specs x
Poor or No Search Tools xx Hardware Incompatibilities x
Inability to Specify Search x Linkage to extraneous software x
Too Much Modification Required x
References
Billington, R. and Allen, R., Reliability Evaluation of Engineering
Systems, Marshfield, MA, Pitman Books, 1983.
Frakes, William B. and Christopher J. Fox. "Quality Improvement
Using A Software Reuse Failure Modes Model" IEEE Transactions on
Software Engineering, 22(4), pp. 274-279, April, 1996.
J. D. Musa, A. Iannino, and K. Okumoto. Software Reliability -
Measurement, Prediction, Application. McGraw-Hill, New York,
1987.
Schiff, D., Dagastino, R., Practical Engineering Statistics, New York :
John Wiley, 1996.
Tortorella, M. and Frakes, W.B., A Computer Implementation of the
Separate Maintenance Model for Complex System Reliability.

You might also like