You are on page 1of 28

Introduction to Reliability

Reliability is: An inherent feature of design Concerned with performance in the field, as opposed to quality of production (conformance to design specs) Definition Reliability is the probability that a system will perform in a satisfactory manner for a given period of time when used under specified operating conditions.

Introduction to Reliability (cont)


What is Satisfactory ? All critical functions Time-oriented quantitative factors--MTBF P (X>to), with X = Lifetime Qualitative factors, too Operating Conditions Use Handling, Transport, Installation, Storage
2

Reliability in the System Life-Cycle


Conceptual Design Phase
Define reliability requirements of a system Plan Reliability Program

Preliminary Design Phase


Allocate reliability requirements Predict reliability of components/subsystems Provide reliability estimates to cost estimating and design trade-off studies Participate in design reviews Assess subsystem/ component supplier reliability estimates
3

Reliability in the System LifeCycle(cont)


Detail Design Phase More detailed reliability prediction Assist in detail design decisions Assist in logistic support analysis Assist in prototype development Recommend changes prior to production Evaluate reliability of prototype Participate in other test and evaluation activities as related to reliability
4

Reliability in the System LifeCycle(cont)


Production/ Construction Phase Monitor production Perform reliability tests of selected items Qualification Tests -Prior to production, repetitive tests to determine MTBF, degradation, failure modes Acceptance Tests- Random or 100%, testing of items exiting production to assure that reliability demonstrated during qualifying-test is being achieved in production items. Collect and analyze data on operational test (product evaluation tests at a designated site) Recommend Corrective action Continue to update reliability models and predictions 5

Reliability in the System LifeCycle(cont)


System Use Phase Data collection and analysis Reliability improvement studies Change recommendations Equipment redesign projects

Measures of Reliability
Let T = Random Variable Measuring Lifetime of an item (time to first - next - failure) Range Space of T={t:t 0} Tests to establish PDF & Parameters of T are called Life Testing Cum, Distribution Function F(t)=P(T t) is called the Failure Distribution Function

Measures of Reliability(cont)
The Reliability Function is: R(t)=P(T>t)=1-F(t)= f (t )dt
1 Prob 0

F (t)

Reliability Density Function


R (t) t

Four ways to determine R(t) for a particular system Test many systems to failure. Develop curve empirically. Test many subsystems, use historical field data on others, develop subsystem reliability functions, use a reliability system model to combine. Extrapolate past experience with similar systems. 8 Physical properties--Hypothesize a certain distribution.

Failures and Failure Rates


3 Types of Failure (See Figure 12.4) Initial ( Failure at t=0) Random Wearout IF initial failures are to be disregarded in your analysis, g (t ) T T > 0) then use, f (t ) = , t>0; as density for( [ 1 P (T = 0)]

Failures and Failure Rates(cont)


The Hazard Function is the instantaneous failure rate at time t, given survival up to t has formula:
h(t ) = R(t ) f (t ) = R (t ) R (t )
t 0

Note: H(t)= number failures in [0,t]= h( x )dx is called the failure count function

10

Failures and Failure Rates(cont)


How are H(t),R(t),F(t) Related? 0
t e 0 e

H (t ) =

R( x) dx = log R( x) ] = log
0

R( x )

R(t ) + loge R(0)

So, R(t)=

e H (t )

11

Mean Lifetime (Time Between Failures)


Mean Life = E(T)=

tf (t )dt,
0 0

or

Example: Random failures often are modeled by time-to-failure is exponential with rate :
f (t ) = e t , t 0 = 0 otherwise F(t ) = 1 e t R(t ) = e t

[1- F(t)]dt = R(t )dt


0

12

Example (cont)
Then,

f (t ) e t h(t ) = = t = Constant R (t ) e

Also, because =E(T)= 1

R(t ) = e H ( t )

, H(t)=t Linear in t and

P(T< )=F()=

1 e = 1 e 1 = 1

1 = 1 .3679 = .6321 e

P(T )=.3679 , Independent of (or )


13

Examples on Pages 349


Example 1 5 Components did not fail in 600 hours 5 Others failed at various points

= = 0.001196 Example 2 4180 hours Operating Cycle = 168.8 hours Downtime = 26.8 hours Operating Time = 142
14

5 failures

Examples on Pages 352-353(cont)


Number of failures = 6 = 6 / 142 = 0.042 MTBF = 23.81 hours = 1 / Operational Availability = Only if we treat MTBF = MTBM (instant maintenance)
MTBM 23.81 = = 0.841 MTBM + MDT 23.81 + 4.4666

Other examples are on handouts Hines and Montgomery, example 15-7 Halpern, examples 10-1 thru 10-6 Note: For exponential failure module R(t) = e- t is the first Term in a poisson distribution with parameter x.
15

What if Failure Rate Not Constant?


Distribution Normal Lognormal Weibull Failure Rate h(t) Behavior Increasing Function Various Shapes Decreasing <1 Constant =1 Increasing >1 Gamma Decreasing n<1 Constant n =1 Increasing n>1
16

What if Failure Rate Not Constant(cont)


Have different h(t) for each time interval where rate is constant use average failure rate (AFR) between t1 and t2
t2

AFR(t1 , t2 ) =

h(t )dt
t1

t2 t1

H (t2 ) H (t1 ) ln R(t1 ) ln R(t2 ) = t2 t1 t2 t1

Note: AFR (0, t) =

H (t) - ln R(t) = t t
17

Concepts Our Text Skips


Renewal Rate Function r(t) = Instantaneous failure rate at time T accounting for replacement of failed items with new components from same population as original parts Censored Type I Data : A fixed test duration T is pre-set. Units that do not fail before T are censored in that the data doesnt account for their survival beyond time T. If T is poorly chosen, may get no failures by time T--then what?

18

Concepts Our Text Skips (cont)


Censored Type II Data : A fixed number of failures is prespecified, n items are tested until r fail. If r is poorly chosen, test make take too long. Readout Time Data : Record actual failure times of each failed component

19

Estimation of for Exponential Life


= (number of failures) / (total unit test hours) Type I Censored Data n items, r failures
=
r

t
i =1

+ ( n r )T

ti = time of i th failure
Type II Censored Data
=
r

(ends at r th failure time t r )

t
i =1

+ (n r )tr

If system has n components and system fails when first n component fails s = i
i =1

20

10

System Reliability Models


Defined: Math models of the system that show functional relationships among subsystems, components, etc. Examples Reliability block diagram Shows all possible success/failure combinations Series and parallel; also k-out-of-n configuration Any closed path through system is success May not resemble system physically Standby redundancy
21

System Reliability Models(cont)


Coherent systems models Fault tree analysis and other cause-consequence diagrams Work from top level events (failures) To primary events ( causes)

22

11

Series Configuration
1 2 n Static Model: Dynamic Model:

Rs =

R = R * R *...R
i =1 i 1 2

Rs (t ) = hs (t ) = Hs ( t ) =

R (t )
i =1 n i

h (t )
i i =1

H (t )
i i =1
23

Example
Exponential Subsystem Failure Models
+ + ... + n ) t Rs (t ) = e ( 1 2

hs (t ) = i
i =1

Constant

= MTBF =

i =1

See example on page 354


24

12

Active Parallel Configuration


Static:

Ra = 1

(1 Ri )
i =1

1 2
i

Dynamic: Ra (t ) = 1 Identical Components:

(1 R (t ))
i =1

n
n

Ra (t ) = 1 [1 R(t )]

System fails only if all n subsystems fail

25

Example 1
Always Keep in Mind Redundancy Has a Cost

# of Components in Parallel

Wt.

Benefit/ Cost

0.95

5 lb

0.9975

10 lb

.0475 / 5 lbs

0.999875

15 lb

.002375 / 5 lbs

26

13

Example 2
Exponential Subsystem Lifetime, Identical Subsystems

Ra (t ) = 1 1 e t
a =
n a 0 i =1

R (t )dt = * i = i
i =1

e. g., if n = 3 and

1 = = 1000 hours

a =

1000 1000 1000 + + 1 2 3 = 1000 + 500 + 333.33 = 1833.33


27

Special Configurations
K-out-of-n Configuration Systems works only if at least k of n components are working. Assume identical components with reliability R(t):
Rs (t ) =

( i )[ R(t )] [1 R(t )]
i i=k

ni

If

R(T ) = e t = e exponential, then s =

i
i=k

28

14

Special Configurations (cont)


Combined Series-Parallel Key:Treat Components in parallel as single component, then expand

Rs = Ra * RBUC = Ra [ 1 ( 1 RB )( 1 RC )]
Rs = R AUB * RCUD

=[ 1- ( 1 - R A )( 1 RB ) ] [ 1 ( 1 RC )( 1 RD ) ]
29

See pages 354 - 355

Availability Measurement
Inherent Availability (Ideal Support Environment)
Ai = MTBF MTBF + M ct

M ct = mean corrective maintenance time = mean time to repair (MTTR)

Does not include preventive maintenance, logistics delay, or administrative delay. Achieved Availability ( Ideal Support Environment) M = mean active maintenance time MTBM Aa = = weighted average of corrective MTBM + M and preventive maintenance time. MTBM = mean time between any maintenance action, corrective or preventive
30

15

Availability Measurement
Operational Availability ( Actual Support Environment)

A =

MTBM MTBM + MDT

MDT = mean downtime = weighted average of active maintenance (current and previous) and delays (logistical and administrative.

31

Comments on Availability
Availability is a function of both: Reliability of a prime item The logistics support subsystem Equipment designer can exert little control over support operations, but can design in: Built-in diagnostics Easy access Rapid disconnect / connect

32

16

Comments on Availability (cont)


The proper balance of R&M must be decided in early stages, when flexibility is great. Discussion of availability is always in some context: Actual failure or not Which mission, what is critical to success Maintenance crew, equipment, spares availability

33

Reliability Techniques in System Design Phase


Conceptual Design Phase Assignment of system reliability goal based on: Mission analysis Cost analysis Technical Limits Preliminary Design: Block Diagram Models Estimation of Ri(t) Functions Study of failure points, solutions
34

17

Reliability Techniques in System Design Phase(cont)


Preliminary Design Phase (Cont.) Definition of Success/ Failure criteria Budgeting/ Revision of Reliability Requirements Detail Design: Material and Parts Selection Standardization Test and Evaluation Requirements for Suppliers Series-Parallel Recommendations De-rating

35

Standardization
Standardization: Means selection of components and materials whose reliability characteristics are known, as well as their degradation under stress and aging. This indirectly eases the burden on spare parts inventories, by having same component used in several systems

36

18

De-rating
De-rating: - Use part in application below its rated value A type of overdesign to provide reliability margin Steps: Identify operating interval Select de-rating % ( see RCA Corp. Table) Calculate de-rated value of component to be used Example: ceramic capacitor for 100v (max) application - RCA recommends 70% de-rating - X (0.7) = 100, X = 142.85 v minimum requirement for component
37

Binomial Expansion to Explain Parallel-Redundant Systems


Consider 3 Identical Components in Parallel P = Probability of Operation of Each Q = Probability of Failure of Each
3 3 3 (P + Q)3 = P 3 + P 2 Q + P1Q 2 + P 0 Q3 1 2 3 = P 3 + 3P 2 Q + 3P1Q 2 + Q3

All 3 up

2 up, one failed

One up, two failed

All 3 down

P (System operating)

1 Q3 = 1 (1 P)3 P 3 + 3 P 2 Q + 3 P1Q 2
38

19

Binomial Expansion to Explain Parallel-Redundant Systems


Let PA=PB=PC=PD = 0.9 Which configuration is more reliable? Why? A C A C B D B D
39

Parallel Redundancy Has Its Drawbacks


Limitations
Each subsystem must have a switch to assure its failure doesnt disable the remaining components Sometimes necessary to disconnect failed system Redundancy increases weight, volume, cost and sometimes complexity. The failure sensing device may be unreliable

Alternatives to Redundancy
Reduce number of parts Simplify Improve reliability level of parts used, especially at critical nodes Burn-in of Parts On-board spares, repairs

40

20

Standby Redundancy
Assume cold standby, not energized until failure detected in original component Assume reliability of decision switch is 100% Lifetime variable is T=T1++Tn Standby always more reliable than simple parallel, if switch is 100% Reliable 1 DS 2

41

Standby Redundancy (Cont,)


Assume lifetime variable is as follows:

E (T ) V (T ) = V (T )
E (T ) =
i i

T = T1 + T2 + - - - - + Tn

If Ti each exponential, t is gamma ( , n)

n n V (T ) = 2 E (T) =

n=2 n=3

R(t) = P(system life > t / one standby) = e - t + (t )e t R(t) = P(system life > t / two standbys) = e - t + (t )e t + (t )2 t e 2!
42

21

Benefits of Computerized Reliability Models

Helps keep track of reliability relationships Across levels of design Within a given level Rapid Sensitivity Analysis Is overall R goal even feasible Study effect of different R allocations Study effects of configuration changes on R Study effects of substituting different components Perform worst-case analysis Can be adapted to multiple missions -in essence, one model for each set of mission equipment/conditions Can be used to evaluate proposed modifications to existing system
43

Analytical Methods to Support Reliability Estimation and Assist in Design Decisions


Stress-Strength Analysis Critical-Useful-Life Analysis For Complex Systems ( radar, missiles, computers) Failure Mode and Effect Analysis Worst-Case Analysis Sneak-Circuit Analysis Safety Analysis Techniques Fault-Tree Analysis Task and Error Analysis Hazard Analysis
44

22

Discussion of Stress-Strength Analysis


Measures Resistance to Stress (strength) Examples: operating wattage versus rated wattage Operating temperature vrs rated temperature pounds/square inch Includes: Stress distribution, especially maximum stress Stress causes, timing, frequency Stress testing, such as metal fatigue tests

45

Discussion of Critical-Useful-Life Analysis


Critical-Useful-Life Analysis: Identification of critical item list and requirements of each of these items for a preventive maintenance, corrective maintenance, and replacement. Includes studies of how to eliminate critical items through redesign

46

23

Discussion of FMEA
Failure Mode and Effect Analysis: Identification of all possible failure modes of equipment, the possible causes and the possible immediate/ ultimate effects on the system and operation Formal documentation in words not diagrams Estimation of probability of occurrence Classify each failure by criticality Describe corrective action alternatives
47

Discussion of Worst-Case Analysis


Worst Case Analysis: Examining how the performance of an electrical circuit (or other device) will change over time as a result of drift in part characteristics. Provides guidance on how to allow for part parameter variation in design

48

24

Discussion of Sneak-Circuit Analysis


Sneak-Circuit Analysis: Use of math models to identify any unanticipated performance signal paths in a circuit that may degrade performance or introduce failure.

49

Reliability Prediction at Part, Circuit, and Subsystem Level


Based On:
Similar equipment--Extrapolate. Not very accurate. Number and complexity of active element groups--these are controllers or converters of energy part types, counts, failure rates are combined into an estimate of system reliability Prediction based on testing, such as stress tests

Used For:
Higher-level reliability prediction As input to maintenance and logistic support analysis Comparison with requirement, where are we over/ under reliability
50

25

Reliability Degradation Studies/ Action


Determine and correct potential/ actual adverse effects due to: Storage, packing, transportation, handling Unpacking, assembly, set-up Preventive and corrective maintenance Carelessness Wrong tools and equipment Didnt follow/ know proper procedure

51

Reliability Test and Evaluation


To answer question : will the mature system achieve its MTBF requirement in operation ? Should be part of an integrated test plan to test entire spec. Type I Tests:
are early enough in design process so that design changes are fairly cheap

Type II and III Tests must :


Follow approved procedures ( first drafts of tech manuals and training courses) Use test and support equipment that was specified in the maintenance concept and detailed in LSA Be provided with ( test ) supply support Be carefully planned, instrumented, documented, analyzed
52

26

Type II Reliability Testing


Evaluation of prototype and early production models, using producer personnel Includes: Reliability qualification tests, to determine MTBF MTBM Failure sequences, detection, performance degradation Maintenance procedure adequacy Maintenance induced failures Production sampling acceptance tests
53

Types of Type 2 Tests


Sequential Qualification Tests Environmental test chambers Environmental test cycle, equipment duty cycle Multiple identical test items Statistics-based accept-reject test plan Producers Risk Consumers Risk
}usually range from .05 to .25 (negotiated)

54

27

Types of Type 2 Tests (cont)


Reliability Acceptance Testing- Plot MTBF versus time, look for growth/decline Reliability Life Testing- To determine failure distribution Continuous (Steady) Fixed Time, Count Failures Fixed number of Failures, Count Time Step-Stress (Accelerated) Testing Step up stress until all units fail Aids in planning burn in
55

Type 3 Testing
Definition- Operational Testing Using:
A group of production units Designated field test sight Representative mix of mission profiles User personnel (first trained) 1st sets of support equipment; spares

Uniqueness
All elements of the system are operational and evaluated together Where the true R, M, A and other performance measures are known for first time, rather than estimated via models plus some type 1 & 2 test data 56

28

You might also like