Understanding Safety Engineering and Failure Analysis

RAM
WHAT IS SAFETY ENGINEERING?
Safety engineering is an applied science strongly related to systems engineering and the subset System Safety Engineering. Safety engineering assures that a life-critical system behaves as needed even when pieces fail. The term "safety engineering" refers to any act of accident prevention by a person qualified in the field. Failure to identify risks to safety, and the according inability to address or "control" these risks, can result in massive costs, both human and economic.
WHAT ARE FAULTS AND FAILURES?
FAILURE UNRELIABILITY IT BECOMES COMPLETELY UN OPERATABLE OPERATES BUT NO LONGER IN A POSITION TO PERFORM THE REQUIRED FUNCTION UNSAFE FOR ITS CONTINUOUS USE
A failure is "the inability of a system or component to perform its required functions within specified performance requirements", while a fault is "a defect in a device or component, for example: a short circuit or a broken wire". System-level failures are caused by lower-level faults, which are ultimately caused by basic component faults
CAUSES OF FAILURE
DEFICIENCIES IN DESIGN DIFICIENCIES IN MATERIAL DEFICIENCIES IN PROCESSING ERRORS IN ASSEMBLY IMPROPER SERVICE CONDITIONS INADEQUATE MAINTENANCE VARIATIONS IN OPERATING & MAINTENANCE CONDITIONS
PHASES OF FAILURE AND METHODS OF PREVENTION
1. INITIAL FAILURES PROBABILITY OF DEFECTIVE DESIGN, MANUFACTURE,ASSEMBLY -----OPERATING THE ITEM FOR SEVERAL HOURS & REPLACING THE TYPE OF MATERIAL BECOMING DEFECTIVE ---- WARRANTY
RANDOM FAILURES: BY CHANCE - REDUNDANCY WEAR OUT FAILURES: AGEING PROPER MAINTENANCE
NATURE OF FAILURES
AN ITEM MAY FAIL IN MANY WAYS, AN UNDERSTANDING OF THIS FAILURES HELP IN TAKING APPROPRIATE CORRECTIVE MEASURES FOR ACHIEVING BETTER RELIABILITY CATASTROPHIC FAILURES: A NORMALLY OPERATING ITEM SUDDENLY BECOMES IN OPERATIVE Ex- BLOWING OF FUSE
DEGRADATION(CREEPING OF FAILURES)- BECAUSE OF SOME CHANGE OF PARAMETERS Ex- CHANGE OF VALUE OF RESISTOR INDEPENDENT FAILURES: Ex- FAN BELT OF A CAR
SECONDARY FAILURES: OCCUR AS A RESULT OF PRIMARY FAILURE EX:SPOKES OF A CYCLE BENT DUE TO TYRE BURST MISUSE FAILURES:FAILURES ATTRIBUTABLE TO APPLICATION OF STRESSES BEYOND THE STATED CAPABILITIES OF THE ITEM, OWING TO MIS HANDLING OR IMPROPER USE
DIFFERENT MODES OF SAFE OPERATION
A probabilistically safe system has no single point of failure, and enough redundant sensors, computers and effectors so that it is very unlikely to cause harm (usually "very unlikely" means, on average, less than one human life lost in a billion hours of operation). An inherently safe system is a clever mechanical arrangement that cannot be made to cause harm obviously the best arrangement, but this is not always possible. A fail-safe system is one that cannot cause harm when it fails. A fault-tolerant system can continue to operate with faults, though its operation may be degraded in some fashion.
For example, most biomedical equipment is only "critical", and often another identical piece of equipment is nearby, so it can be merely "probabilistically fail-safe". Train signals can cause "catastrophic" accidents (imagine chemical releases from tank-cars) and are usually "inherently safe". Aircraft "failures" are "catastrophic" (at least for their passengers and crew) so aircraft are usually "probabilistically fault-tolerant". Without any safety features, nuclear reactors might have "catastrophic failures", so real nuclear reactors are required to be at least "probabilistically fail-safe", and some such as pebble bed reactors are "inherently fault-tolerant".
Analysis techniques
The two most common fault modeling techniques are FAILURE MODES AND EFFECTS ANALYSIS FAULT TREE ANALYSIS These techniques are just ways of finding problems and of making plans to cope with failures
FAILURE MODES AND EFFECTS ANALYSIS

In the technique known as "failure mode and effects analysis" (FMEA), an engineer starts with a block diagram of a system. The safety engineer then considers what happens if each block of the diagram fails. The engineer then draws up a table in which failures are paired with their effects and an evaluation of the effects. The design of the system is then corrected, and the table adjusted until the system is not known to have unacceptable problems. It is very helpful to have several engineers review the failure modes and effects analysis.
FAULT TREE ANALYSIS

In the technique known as "fault tree analysis", an undesired effect is taken as the root ('top event') of a tree of logic. There should be only one Top Event and all concerns must tree down from it. Then, each situation that could cause that effect is added to the tree as a series of logic expressions. When fault trees are labeled with actual numbers about failure probabilities, which are often in practice unavailable because of the expense of testing, computer programs can calculate failure probabilities from fault trees.
The Tree is usually written out using conventional logic gate symbols. The route through a Tree between an event and an initiator in the tree is called a Cutset. The shortest credible way through the tree from Fault to initiating Event is called a Minimal Cutset
Usually a failure in safety-certified systems is acceptable if, on average, less than one life per 109 hours of continuous operation is lost to failure.
FAILURE RATE
The equipment reliability can be expressed withFailure Rate = Number of faults per unit time = = 1/ M The MTBF is given by M and is expressed in hours and corresponding units of are faults per hour. The component is extremely small and units may be altered to give convenient numbers. Thus failure rates may be quoted as a percentage per 100hrs, per 106 or per 10 9 hour.
For example , a system with MTBF of 2000 HRS has a failure rate of 1/2000= 0.0005 failures per hours. Or 50%(0.5) per 1000HRS. Failure rate of 100%(1) per 2000Hrs The equipment with greatest MTBF will be the most reliable Regardless of the period of observation. Thus MTBF provides a most convenient index of reliability.
PROBABILITY
Probability = P = S/ n; Where , S = No. of results and n = Possible results in all( failure and success). Ps = Probability of success = a / a+b where a is the probability of success and b is the probability of failure. Pf = Probability of failure = b / a+b Ps+Pf = a / a+b + b / a+b = 1 unit.
Given the probability P of some event , the probability of its complement, that is event will not take place is (1 P). If P = P1and P2 are probabilities of success in the two events, the probability that both occur is, P = P1x P2 , the probability that two trials will both succeed is P2 . The product rule is directly applicable to series system , in which input of each unit is connected to the output of previous
COMPOUND EVENTS(PROBABILITY)
In order for the complete system to operate correctly, each unit must operate correctly. Thus probability of success or in other word the reliability of the units are R1, R2, R3, -----Rn. The probability that they will all operate correctly, i.e, the system will function,is given by R = R1xR2x R3------x Rn. For n similar units of reliability Rr, this is R = ( Rr )n
For example a radio station has receiver, transmitter, aerial with power unit. With reliability for a given operation period of 0.9, 0.85 and 0.8 respectively. Reliability R = 0.9x 0.85x 0.8 = 0.612. The probability of system fault during this period is thus Pf = 1 R = 0.388
The product rule gives the joint probability that a number of events will all be successful. For circumstances , we require the probability that one or more events will be successful. For example a box contaminating Eight - 0.1F , seven0.5F and Five- 1 F capacitors. Total No. of capacitors = 20. If we pick capacitor randomly the probability of three values P1 = 8/20 = 0.4(0.1 F), P2 = 7/20 = 0.35( 0.5 F), P3 = 5/20 = 0.25 (1.0 F)
FAILURE ANALYSIS OF SERIES SYSTEM

If separate probability of a failure developing in the components are P1, p2, P3, P4, P5 the joint probability of a fault occurring in one or more of the five components is P = P1 + P2 + P3 + P4 + P5; P1 To P5 Probabilities depend upon duration of the test or prescribed operating period. For calculating reliability the failure rate is considered. If the failure rates for the components are 1, 2, 3 etc. The expected number of failures are: - n1 = 1xT for component-1
n2 = 2xT for component 2; Where T is the duration of the test or operating period. Since we are considering series system, any one of this faults will cause a system failure. Thus: Ns = (1 + 2 + 3 + 4 + 5) T , is the total number of faults expected during the interval T. Assuming that 1, 2, 3---- 5 are constant. Reliability R is given by R = 1 - Ns
EXAMPLE OF SERIES SYSTEM

Component Silicon diode Resistor capacitor Number 45 90 25 Failure rate in percent per 1000Hrs 0.02 0.05 0.005 0.01
Silicon Transist. 25
Numerical data are used to estimate reliability.The system failure rate is calculated by adding together.
The total failure rate for each class of component

Silicon diode Silicon Transistor Resistor Capacitor45 X 0.02 = 0.9 25X 0.05 = 1.25 90X 0.005 = 0.45 25 X 0.01 = 0.25 = 2.85 percent per
Total failure rate in percent 1000Hrs
So failure rate = number of failure = 2.85/100 = 0.0285 failure in 1000Hrs. The failure rate in 1 hour = .0285/1000 = = 0.0000285 M = 1/ = 1/0.0000285 = 35100Hrs A ship having the components of above failure rate must operate 750 Hrs continuously until the ship return. So expected No. of faults in 750Hrs = n = (750x 0.0285)/1000 = 0.0214 ; Thus reliability of each voice R = 1 n = 0.9786
REIABILITY OF PARALLEL SYSTEM

Suppose three systems are connected in parallel. Any one of three systems, if works, the system will not fail . The system will fail if all the three systems fail.Thus the probability of failure are Ps = P1 X P2 X P3; Where Probability of failures are P1, P2& P3 for a specified interval of time. The system reliability , assuming P is Small, is then R = 1 - Ps = 1 P1 X P2 X P3
EXAMPLE OF PARALLEL SYSTM

A generating system having mean time between failures of 5000Hrs. What will be the reliability of the system for a 500 Hrs operating period if there are five identical units and if three of them supply the required load? The condition imply that if three or five or five machines fail, the system will fail. The combined probability of any one of three situations occurring is given by some of three separate probability.
Let P be the probability of failure of one machine during the 500Hrs interval. The total probability of failure = Ps = 10 P3(1- P)2 +5 P4 (1P) + P5 Where, 2x5 P3(1- P)2 - Three faulty machines and one working. 5x1 P4 (1-P) - Four faulty machines and one working. Thus total probability of failure is Ps = 10 P3 - 15 P4 - 6P5 --------1 The mean time between failures for a single machine is 5000Hrs. Thus the probable number nf of failures in a 500Hrs period is nf = 500/5000 = 0.1
If we use the approximation that this is equal to the probability of a failure , we can use a value P = 0.1 in the expression - 1 for Ps. Taking nf = P This gives probability of a system failure as Ps = 10 x (0.1) 3 - 15 x (0.1) 4 + 6(0.1)5 = 0.01 0.0015 + 0. 00006 = 0.00856 The probability that system does not fail that is , the reliability is given by R = (1 Ps ) = (1 0.00856) = 0.99144
RELIABILITY
EVERY FAILURE MUST BE REGARDED AS SIGNIFICANT UNTIL ACTION HAS BEEN TAKEN TO PREVENT ITS REOCCURENCE RELIABILITY IS THE ABILITY OF AN ITEM TO PERFORM A REQUIRED FUNCTION UNDER STATED OPERATING & ENVIRONMENTAL CONDITIONS FOR A GIVEN PERIOD OF TIME
The probability of survival, R(t), plus the probability of failure, F(t), is always unity. Expressed as a formula: F(t) + R(t) = 1 or, F(t)=1 - R(t).
The required function includes both a definition of satisfactory and unsatisfactory operation (failure). The stated conditions are the total physical environment, including mechanical, thermal, and electrical conditions. The stated period of time is the time during which satisfactory operation is desired.
CONCEPTS : IT IS EXPRESSED IN TERMS OF PROBABILITY Ex. 0.95 FOR 60 HRS REQUIRED FUNCTION:EX. LIGHTING OF 10 CANDLES TIME: Ex. MISSILE OPERATING & ENVIRONMENTAL CONDITION: Ex. TYRE & ROAD CONDITIONS
MEASURES OF RELIABILITY
1. FAILURE RATE EXPRESSED IN TERMS OF FAILURES PER HOUR, 100 HR, 1000 HR OR % FAILURES PER 1000 HRS. Ex: FAILURE RATE OF RELAYS HAS BEEN CALCULATED AS 0.4623 PER 1000 Hrs, FROM THE PAST EXPERIENCE. THIS MEANS THAT OUT OF 10,000 RELAYS, 4623 ARE EXPECTED TO FAIL DURING 1000 Hrs OPERATION
2.PROBABILITY OF SURVIVAL THE PROBABILITY OF SURVIVAL IS EXPRESSED AS A DECIMAL FRACTION OR PERCENTAGE WHICH INDICATES THE PROBABLE OR EXPECTED NUMBER OF ITEMS THATWILL OPERATE FOR A REQUIRED PERIOD OF TIME. EX. 90%- 90 OUT OF 100 MACHINES EXCEEDED THE PROBABILITY OF SURVIVAL
3.MEAN TIME BETWEEN FAILURES (MTBF) APPLICABLE FOR REPAIRABLE ITEMS. EXPRESSED IN HOURS
The MTBF of a system (given by M), may be measured by testing it for a total period (given by T) during which N faults occurred. Each fault is repaired and equipment put back on test,the repair time being excluded from the total test time T. The observed MTBF is then given by M = T/N
IF AN EQUIPMENT FAILS 6 TIMES OVER A PERIOD OF 3000 Hrs, THE MTBF WOULD BE 3000/6=500 HRS. THIS IS ALWAYS TAKEN AS AN AVERAGE TIME.
4.MEAN TIME TO FAILURE(MTTF) APPLICABLE TO NON REPAIRABLE ITEMS. EXPRESSED AS AN AVERAGE TIME IT IS THE TIME AN ITEM IS EXPECTED TO FUNCTION BEFORE FAILING
MEAN TIME TO FAILURE -MTTF The mean time to failure (MTTF) is a measure of reliability for repairable equipment. A similar measure is useful for components such as resistors, capacitors , transistors which are thrown away items that can be repairable. MTTF may be calculated from the results of life testing as followsLet a set of N items be tested until all have failed . The times to failure being t1, t2, t3 ------ti------tn .
The observed MTTF is given by i=n M = ( ti)/n i=1 For Example: -If six units were tested until failure , and the times to failure were 320 , 250, 380, 290, 310 and 400 hrs. The total test time would be 1950hrs and the MTTF would be M = 1950/6=325Hrs
MAINTAINABILITY
MAINTAINABILITY
THE ACTIVITY BY WHICH THE USEFUL LIFE OF AN ITEM CAN BE EXTENDED BY CARRYING OUT CORRECTIVE ACTIONS AT SPECIFIED INTERVALS GOOD MAINTENANCE AIMS TO KEEP PRODUCTION MACHINERY & EQUIPMENT IN EFFICIENTWORKING CONDITION ALL THE TIME
MAINTAINABILITY
DEFNITION: MAINTENANCE IS A COMBINATION OF ANY ACTIONS CARRIED OUT TO RETAIN AN ITEM IN OR RESTORE IT TO AN ACCEPTABLE STANDARDS.-BRITISH SPEC. 3811(1974) MAINTAINABILITY IS A CHARECTERISTICS OF EQUIPMENT DESIGN 7 INSTALLATION WHICH IS EXPRESSED IN TERMS EASE & ECONOMY OF MAINTENANCE AVAILABILITY OF THE EQUIPMENT, SAFETY & ACCURACY IN TH EPERFORMANCE OF MAINTENANCE ACTIONS
OBJECT OF MAINTAINABILITY
TO DESIGN & DEVELOP SYSTEMS & EQUIPMENTS WHICH CAN BE MAINTAINED AT THE LEAST TIME & AT THE LEAST COST AND WITH MINIMUM EXPENDITURE OF SUPPORTING RESOURCES WITHOUT ADVERSELY AFFECTING THE ITEM PERFORMANCE OR SAFETY.
OBJECTIVES OF MAINTENANCE
TO EXTEND THE USEFUL LIFE TO ASSURE OPTIMUM AVAILABILITY OF THE INSTALLED EQUIPMENT TO ENSURE OPERATIONAL READINESS OF ALL EQUIPMENTS REQUIRED FOR EMERGENCY. TO ENSURE SAFETY FOR PERSONNEL USING SUCH FACILITY
FORMS OF MAINTENANCE
PREVENTIVE MAINTENANCE(PM) CORRECTIVE MAINTENANCE(CM)
PREVENTIVE MAINTENANCE(PM)
TUNING OR ADJUSTMENTS LUBRICATION INSPECTION CLEANING ETC MAJOR PART OF PM IS INVOLVES INSPECTION BY LOOK, FEEL & LISTEN
MAJOR ADVANTAGES OF PM
LESS PRODUCTION DOWN TIME LESS OVER TIME PAY FOR MAINTENANCE FOR ORDINARY ADJUSTMENTS FEWER LARGE SCALE REPAIRS LESS REDUNDANCY REQUIRED BETTER SPARE PART CONTROL- MIN. INVENTORY GREATER SAFETY FOR MAINTENANCE STAFF & WORKING STAFF LOWER UNIT COST OF MANUFACTURE
CORRECTIVE MAINTENANCE -CM

ONLY WHEN IT IS NECESSARY DUE TO MALFUNCTION OR FAILURE
FACTORS EFFCTING MAINTAINABILITY

DESIGN-RELIABILITY, COMPLEXICITY, INTERCHANGEABILITY, REPLACEABILITY, COMPATIBILITY, VISIBILITY & CONFIGURATION. INSTALLATION-GENERALLY RELATE TO HUMAN BEINGEXPERIENCE,TRAINING,SKILL & SUPERVISION
Maintainability
In telecommunication and several other engineering fields, the term maintainability has the following meanings: 1.A characteristic of design and installation, expressed as the probability that an item will be retained in or restored to a specified condition within a given period of time, when the maintenance is performed in accordance with prescribed procedures and resources. 2.The ease with which maintenance of a functional unit can be performed in accordance with prescribed requirements.
Maintainability is defined as the probability of performing a successful repair action within a given time. In other words, maintainability measures the ease and speed with which a system can be restored to operational status after a failure occurs. For example, if it is said that a particular component has a 90% maintainability in one hour, this means that there is a 90% probability that the component will be repaired within an hour. In maintainability, the random variable is time-to-repair, in the same manner as time-to-failure is the random variable in reliability.
What one chooses to include in the time-to-repair varies but can include: 1.The time it takes to successfully diagnose the cause of the failure. 2.The time it takes to procure or deliver the parts necessary to perform the repair. 3.The time it takes to gain access to the failed part or parts. 4.The time it takes to remove the failed components and replace them with functioning ones. 5 The time involved with bringing the system back to operating status. 6.The time it takes to verify that the system is functioning within specifications. 7.The time associated with "closing up" a system and returning it to normal operation.
AVAILABILTY
AVAILABILTY
SUCCESS CAN BE ACHIEVED BY PROVIDING VERY HIGH SYSTEMS RELIABILITY WHICH RESULTS IN A REDUCED PROBABILITY OF FAILURE. AVAILABILITY IS THE COMBINATION OF TWO ELEMENTS 1. RELIABILITY- SYSTEM CAPABILITY OF SURVIVAL 2.MAINTAINABILITY- SYSTEM CAPABILITY OF REPAIR
AVAILABILTY
Availability = A = U/ U+D ; Where U = Up time , during which the machine is in working order; D = Down time , During which the machine is faulty and being repaired. A = M/M+R = 1/ / (1/ +1/ ) = / + ; Where = failure rate = 1/M or M = 1/ and R = mean repair time . Repair rate = = 1/R or R = 1/ Unavailability = B = D/ U+D B = R/M+R = / +. A+B = (U/U+D + D/ U+D) = 1
Availability is defined as a percentage measure of the degree to which machinery and equipment is in an operable and committable state at the point in time when it is needed. This definition includes operable and committable factors that are contributed to the equipment itself, the process being performed, and the surrounding facilities and operations. This statement incorporates all aspects of malfunctions and delays relating to equipment, process, and facility issues. If one considers both reliability (probability that the item will not fail) and maintainability (the probability that the item is successfully restored after failure), then an additional metric is needed for the probability that the component/system is operational at a given time, t (i.e. has not failed or it has been restored after failure). This metric is availability.
Availability Classifications The definition of availability is somewhat flexible and is largely based on what types of downtimes one chooses to consider in the analysis. As a result, there are a number of different classifications of availability, such as: Instantaneous (or Point) Availability. Average Up-Time Availability (or Mean Availability). Steady State Availability. Inherent Availability. Achieved Availability. Operational Availability.
Instantaneous or Point Availability, A(t) Instantaneous (or point) availability is the probability that a system (or component) will be operational (up and running) at any random time, t. Average Uptime Availability (or Mean Availability) The mean availability is the proportion of time during a mission or time period that the system is available for use. It represents the mean value of the instantaneous availability function over the period (0, T) Steady State Availability The steady state availability of the system is the limit of the instantaneous availability function as time approaches infinity The instantaneous availability function will start approaching the steady state availability value after a time period of approximately four times the average time-to-failure.
Inherent Availability Inherent availability is the steady state availability when considering only the corrective downtime of the system. It is defined as the expected level of availability for the performance of corrective maintenance only. Inherent availability is determined purely by the design of the equipment. It assumes that spare parts and manpower are 100 percent available with no delays. It excludes logistics time, waiting or administrative downtime, and preventive maintenance downtime. It includes corrective maintenance downtime. Inherent availability is generally derived from analysis of an engineering design.
AVAILABILITY (INHERENT) THE PROBABILITY THAT A SYSTEM, WHEN USED UNDER STATED CONDITIONS,WITHOUT CONSIDERATION FOR ANY PREVENTIVE ACTION IN AN IDEAL SUPPORT FACILITIES SHALL OPERATE SATISFACTORILY AT ANY GIVEN POINT OF TIME Ai = MTBF/MTBF+MTTR
Achieved Availability The probability that an item will operate satisfactorily at a given point in time when used under stated conditions in an ideal support environment (i.e., that personnel, tools, spares, etc. are instantaneously available). It excludes logistics time and waiting or administrative downtime. It includes active preventive and corrective maintenance downtime. Achieved availability is defined as the achieved level of availability for the performance of corrective and preventive maintenance. Achieved availability is determined by the hard design of the equipment and the facility. Aa also assumes that spare parts and manpower are 100 percent available with no delays. Achieved availability is very similar to inherent availability with the exception that preventive maintenance (PM) downtimes are also included.
Operational Availability Operational availability is a measure of the average availability over a period of time and it includes all experienced sources of downtime, such as administrative downtime, logistic downtime, etc. It is the probability that an item will operate satisfactorily at a given point in time when used in an actual or realistic operating and support environment. It includes logistics time, ready time, and waiting or administrative downtime, and both preventive and corrective maintenance downtime. The operational availability is the availability that the customer actually experiences.
AVAILABILITY (OPERATIONAL) Ao = MEAN TIME BETWEEN FAILURES/ MTBF+MEAN TIME WAITING FOR SPARES+ ADMINISTRATIVE TIME+ MEAN TIME FOR REPAIRS
THANK YOU

Understanding Safety Engineering and Failure Analysis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Understanding Safety Engineering and Failure Analysis

Uploaded by

Copyright:

Available Formats

RAM

WHAT IS SAFETY ENGINEERING?

WHAT ARE FAULTS AND FAILURES?

PHASES OF FAILURE AND METHODS OF PREVENTION

DIFFERENT MODES OF SAFE OPERATION

FAILURE MODES AND EFFECTS ANALYSIS

FAULT TREE ANALYSIS

FAILURE ANALYSIS OF SERIES SYSTEM

EXAMPLE OF SERIES SYSTEM

The total failure rate for each class of component

Total failure rate in percent 1000Hrs

REIABILITY OF PARALLEL SYSTEM

EXAMPLE OF PARALLEL SYSTM

PREVENTIVE MAINTENANCE(PM) CORRECTIVE MAINTENANCE(CM)

CORRECTIVE MAINTENANCE -CM

FACTORS EFFCTING MAINTAINABILITY

You might also like