Reliability

Reliability engineering
Functional (Failure) analysis
Reliability engineering is engineering that emphasizes

dependability in the lifecycle management of a product.
Dependability, or reliability, describes the ability of a system or component to function under stated conditions for
a specied period of time.[1] Reliability engineering represents a sub-discipline within systems engineering. Reliability is theoretically dened as the probability of failure, the frequency of failures, or in terms of availability,
a probability derived from reliability and maintainability.
Maintainability and maintenance may be dened as a part
of reliability engineering. Reliability plays a key role in
cost-eectiveness of systems.
Human Factors / Errors

Manufacturing induced failures
Maintenance induced failures
Transport induced failures
Storage induced failures
Failure / reliability testing
Spare-parts stocking (Availability control)

Although reliability is dened and aected by stochas Technical documentation
tic parameters, according to some acknowledged specialists, quality, reliability and safety are not achieved by
Data and information acquisition/organisation
mathematics and statistics. Nearly all teaching and literature on the subject emphasizes these aspects, and ignores the reality that the ranges of uncertainty involved Eective reliability engineering requires understanding of
largely invalidate quantitative methods for prediction and the basics of failure mechanisms for which experience,
broad engineering skills and good knowledge from many
measurement.[2]
dierent special elds of engineering,[4] like:
Reliability engineering relates closely to safety engineering and to system safety, in that they use common meth Tribology
ods for their analysis and may require input from each
other. Reliability engineering focuses on costs of fail Stress (mechanics)
ure caused by system downtime, cost of spares, repair
Fracture mechanics / Fatigue (material)
equipment, personnel and cost of warranty claims. Safety
engineering normally emphasizes not cost, but preserv Thermal engineering
ing life and nature, and therefore deals only with particular dangerous system failure modes. High reliability
Fluid mechanics / shock loading engineering
(safety factor) levels are also here the result of good engi Electrical engineering
neering, attention to detail and almost never the result of
only re-active failure management (reliability accounting
Chemical engineering (e.g. Corrosion)
/ statistics).[3]
Former US Secretary of Defense James R. Schlesinger Reliability may be dened in the following ways:
once stated: Reliability is, after all, engineering in its
most practical form.[2]
The idea that an item is t for a purpose with respect
to time
Overview
The capacity of a designed, produced or maintained

item to perform as required over time
Reliability engineering for complex systems requires a

dierent, more elaborate systems approach than for noncomplex systems. Reliability engineering may involve:
The capacity of a population of designed, produced

or maintained items to perform as required over
specied time
The resistance to failure of an item over time
Use (load) studies and requirements specication

(system level analysis)
The probability of an item to perform a required

function under stated conditions for a specied period of time
Inherent (system) Design Reliability: Hardwareand Software design, System Diagnostics design
1
2 RELIABILITY AND AVAILABILITY PROGRAM PLAN

The durability of an object.
Many engineering techniques are used in reliability engineering, such as reliability hazard analysis, failure mode
and eects analysis (FMEA), failure modes, mechanisms, and eects analysis (FMMEA),[5] fault tree analysis (FTA), material stress and wear calculations, fatigue
and creep analysis, nite element method, reliability prediction, thermal (stress) analysis, corrosion analysis, human error analysis, reliability testing, statistical uncertainty estimations, Monte Carlo simulations, design of experiments, reliability centered maintenance (RCM), failure reporting and corrective actions management. Because of the large number of reliability techniques, their
expense, and the varying degrees of reliability required
for dierent situations, most projects develop a reliability program plan to specify the reliability tasks that will
be performed for that specic system.
Consistent with the creation of safety cases, for example
ARP4761, the goal is to provide a robust set of qualitative and quantitative evidence that use of a component or
system will not be associated with unacceptable risk. The
basic steps to take[6] are to:
First thoroughly identify relevant unreliability hazards, e.g. potential conditions, events, human errors, failure modes, interactions, failure mechanisms
and root causes, by specic analysis or tests
Assess the associated system risk, by specic analysis or testing
Propose mitigation, e.g. requirements, design
changes, detection, maintenance, training, by which
the risks may be lowered and controlled for at an
acceptable level.
Determine the best mitigation and get agreement on
nal, acceptable risk levels, possibly based on costbenet analysis
Risk is the combination of probability and severity of the
failure incident (scenario) occurring.
In a deminimus denition, severity of failures include the
cost of spare parts, man hours, logistics, damage (secondary failures) and downtime of machines which may
cause production loss. A more complete denition of
failure also can mean injury, dismemberment and death
of people within the system (witness mine accidents, industrial accidents, space shuttle failures) and the same to
innocent bystanders (witness the citizenry of cities like
Bhopal, Love Canal, Chernobyl or Sendai and other victims of the 2011 Thoku earthquake and tsunami) - in this
case, Reliability Engineering becomes System Safety.
What is acceptable is determined by the managing authority or customers or the eected communities. Residual risk is the risk that is left over after all reliability activities have nished and includes the un-identied risk and
is therefore not completely quantiable.
2 Reliability and availability program plan

A reliability program plan is used to document exactly
what best practices (tasks, methods, tools, analysis and
tests) are required for a particular (sub)system, as well as
clarify customer requirements for reliability assessment.
For large scale, complex systems, the reliability program
plan should be a separate document. Resource determination for manpower and budgets for testing and other
tasks is critical for a successful program. In general, the
amount of work required for an eective program for
complex systems is large.
A reliability program plan is essential for achieving high
levels of reliability, testability, maintainability and the resulting system Availability and is developed early during
system development and rened over the systems lifecycle. It species not only what the reliability engineer
does, but also the tasks performed by other stakeholders.
A reliability program plan is approved by top program
management, which is responsible for allocation of sucient resources for its implementation.
A reliability program plan may also be used to evaluate and improve availability of a system by the strategy
on focusing on increasing testability & maintainability
and not on reliability. Improving maintainability is generally easier than reliability. Maintainability estimates
(Repair rates) are also generally more accurate. However, because the uncertainties in the reliability estimates
are in most cases very large, it is likely to dominate the
availability (prediction uncertainty) problem; even in the
case maintainability levels are very high. When reliability is not under control more complicated issues may
arise, like manpower (maintainers / customer service capability) shortage, spare part availability, logistic delays,
lack of repair facilities, extensive retro-t and complex
conguration management costs and others. The problem of unreliability may be increased also due to the
domino eect of maintenance induced failures after repairs. Only focusing on maintainability is therefore not
enough. If failures are prevented, none of the others are
of any importance and therefore reliability is generally
regarded as the most important part of availability. Reliability needs to be evaluated and improved related to
both availability and the cost of ownership (due to cost
of spare parts, maintenance man-hours, transport costs,
storage cost, part obsolete risks, etc.). But, as GM and
Toyota have belatedly discovered, TCO also includes the
down-stream liability costs when reliability calculations
do not suciently or accurately address customers personal bodily risks. Often a trade-o is needed between
the two. There might be a maximum ratio between availability and cost of ownership. Testability of a system
should also be addressed in the plan as this is the link between reliability and maintainability. The maintenance
strategy can inuence the reliability of a system (e.g.
3
by preventive and/or predictive maintenance), although else. The information is often not available without huge
it can never bring it above the inherent reliability.
uncertainties within the development phase. This makes
The reliability plan should clearly provide a strategy for this allocation problem almost impossible to do in a useavailability control. Whether only availability or also cost ful, practical, valid manner, wich does not result in masof ownership is more important depends on the use of the sive over- or under specication. A pragmatic approach
system. For example, a system that is a critical link in a is therefore needed. For example; the use of general levproduction system e.g. a big oil platform is normally els / classes of quantitative requirements only depending
allowed to have a very high cost of ownership if this trans- on severity of failure eects. Also the validation of results is a far more subjective task than for any other type
lates to even a minor increase in availability, as the unavailability of the platform results in a massive loss of rev- of requirement. (Quantitative) Reliability parameters in terms of MTBF - are by far the most uncertain design
enue which can easily exceed the high cost of ownership.
A proper reliability plan should always address RAMT parameters in any design.
analysis in its total context. RAMT stands in this case for Furthermore, reliability design requirements should drive
reliability, availability, maintainability/maintenance and a (system or part) design to incorporate features that pretestability in context to the customer needs.
vent failures from occurring or limit consequences from
failure in the rst place! Not only to make some predictions, this could potentially distract the engineering effort to a kind of accounting work. A design requirement
3 Reliability requirements
should be so precise enough so that a designer can design to it and can also prove -through analysis or testingFor any system, one of the rst tasks of reliability en- that the requirement has been achieved, and if possible
gineering is to adequately specify the reliability and within some a stated condence. Any type of reliabilmaintainability requirements derived from the overall ity requirement should be detailed and could be derived
availability needs and more importantly, from proper fail- from failure analysis (Finite Element Stress and Fatigue
ure analysis or preliminary test results. Requirements analysis, Reliability Hazard Analysis, FTA, FMEA, Hushould constrain the designers from designing particular man Factor analysis, Functional Hazard Analysis, etc.) or
unreliable systems. Setting only availability (reliability, other any type of reliability testing. Also, requirements
testability and maintainability) targets (e.g. max. Failure are needed for verication tests e.g. required overload
rates)is not appropriate. This is a broad misunderstanding loads (or stresses) and test time needed. To derive these
about Reliability Requirements Engineering. Reliabil- requirements in an eective manner, a systems engineerity requirements address the system itself, including test ing based risk assessment and mitigation logic should be
and assessment requirements, and associated tasks and used. These practical design requirements shall drive the
documentation. Reliability requirements are included in design and not only be used for verication purposes.
the appropriate system or subsystem requirements speci- These requirements (often design constraints) are in this
cations, test plans and contract statements. Creation of way derived from failure analysis or preliminary tests.
proper lower level requirements is critical.
Understanding of this dierence with only pure quantitaProvision of only quantitative minimum targets (e.g. tive requirement specication (e.g. Failure Rate / MTBF
MTBF values/ Failure rates) is not sucient for dier- setting) is paramount in the development of successful
[7]
ent reasons. One reason is that a full validation (related (complex) systems.
to correctness and veriability in time) of an quantitative The maintainability requirements address the costs of rereliability allocation (requirement spec) on lower levels pairs as well as repair time. Testability (not to be confor complex systems can (often) not be made as a conse- fused with test requirements) requirements provide the
quence of 1) The fact that the requirements are probabal- link between reliability and maintainability and should
istic 2) The extremely high level of uncertainties involved address detectability of failure modes (on a particular sysfor showing compliance with all these probabalistic re- tem level), isolation levels and the creation of diagnostics
quirements 3) Reliability is a function of time and accu- (procedures).
rate estimates of a (probabalistic) reliability number per
item are available only very late in the project, sometimes As indicated above, reliability engineers should also adeven only many years after in-service use. Compare this dress requirements for various reliability tasks and docproblem with the continues (re-)balancing of for example umentation during system development, test, production,
lower level system mass requirements in the development and operation. These requirements are generally speciof an aircraft, which is already often a big undertaking. ed in the contract statement of work and depend on how
Notice that in this case masses do only dier in terms of much leeway the customer wishes to provide to the cononly some %, are not a function of time the data is non- tractor. Reliability tasks include various analyses, planprobabalistic and available already in CAD models. In ning, and failure reporting. Task selection depends on
case of reliability, the levels of unreliability (failure rates) the criticality of the system as well as cost. A safety critmay change with factors of decades (1000s of %)as result ical system may require a formal failure reporting and
of very minor deviations in design, process or anything review process throughout development, whereas a non-
5 DESIGN FOR RELIABILITY
critical system may rely on nal test reports. The most

common reliability program tasks are documented in reliability program standards, such as MIL-STD-785 and
IEEE 1332. Failure reporting analysis and corrective action systems are a common approach for product/process
reliability monitoring.
5 Design for reliability
Reliability design begins with the development of a (system) model. Reliability and availability models use block
diagrams and fault trees to provide a graphical means of
evaluating the relationships between dierent parts of the
system. These models may incorporate predictions based
on failure rates taken from historical data. While the (in4 Reliability culture
put data) predictions are often not accurate in an absolute
sense, they are valuable to assess relative dierences in
Practically, most failures can in the end be traced back design alternatives. Maintainability parameters, for exto a root causes of the type of human errors of any kind. ample MTTR, are other inputs for these models.
For example, human errors in:
The most important fundamental initiating causes and
failure mechanisms are to be identied and analyzed with
Use studies
engineering tools. A diverse set of practical guidance
and practical performance and reliability requirements
Requirement analysis / setting
should be provided to designers so they can generate lowstressed designs and products that protect or are protected
Conguration control
against damage and excessive wear. Proper Validation
Assumptions
of input loads (requirements) may be needed and verication for reliability performance by testing may be
Calculations / simulations / FEM analysis
needed.
Design
Design drawings
Testing (incorrect load settings or failure measurement)
Statistical analysis
Manufacturing
Quality control
Maintenance
Maintenance manuals
Incorrect feedback of information
etc.
However, humans are also very good in detection of (the
same) failures, correction of failures and improvising
when abnormal situations occur. The policy that human
actions should be completely ruled out of any design and
production process to improve reliability may not be effective therefore. Some tasks are better performed by
humans and some are better performed by machines.[8]
Furthermore, human errors in management and the organization of data and information or the misuse or abuse
of items may also contribute to unreliability. This is the
core reason why high levels of reliability for complex systems can only be achieved by following a robust systems
engineering process with proper planning and execution
of the validation and verication tasks. This also includes
careful organization of data and information sharing and
creating a reliability culture in the same sense as having a safety culture is paramount in the development of
safety critical systems.
A Fault Tree Diagram
One of the most important design techniques is

redundancy. This means that if one part of the system
fails, there is an alternate success path, such as a backup
system. The reason why this is the ultimate design choice
is related to the fact that high condence reliability evidence for new parts / items is often not available or extremely expensive to obtain. By creating redundancy,
together with a high level of failure monitoring and the
avoidance of common cause failures, even a system with
relative bad single channel (part) reliability, can be made
highly reliable (mission reliability) on system level. No
testing of reliability has to be required for this. Furthermore, by using redundancy and the use of dissimilar design and manufacturing processes (dierent suppliers) for
the single independent channels, less sensitivity for quality issues (early childhood failures) is created and very
5
high levels of reliability can be achieved at all moments
of the development cycles (early life times and long term).
Redundancy can also be applied in systems engineering
by double checking requirements, data, designs, calculations, software and tests to overcome systematic failures.
Another design technique to prevent failures is called
physics of failure. This technique relies on understanding the physical static and dynamic failure mechanisms.
It accounts for variation in load, strength and stress leading to failure at high level of detail, possible with use
of modern nite element method (FEM) software programs that may handle complex geometries and mechanisms like creep, stress relaxation, fatigue and probabilistic design (Monte Carlo simulations / DOE). The material
or component can be re-designed to reduce the probability of failure and to make it more robust against variation. Another common design technique is component
derating: Selecting components whose tolerance significantly exceeds the expected stress, as using a heavier
gauge wire that exceeds the normal specication for the
expected electrical current.
Another eective way to deal with unreliability issues
is to perform analysis to be able to predict degradation
and being able to prevent unscheduled down events / failures from occurring. RCM (Reliability Centered Maintenance) programs can be used for this.
Functional analysis and functional failure

analysis (e.g., function FMEA, FHA or
FFA)
Predictive and preventive maintenance:
reliability centered maintenance (RCM)
analysis
Testability analysis
Failure diagnostics analysis (normally
also incorporated in FMEA)
Human error analysis
Operational hazard analysis
Manual screening
Integrated logistics support
Results are presented during the system design reviews
and logistics reviews. Reliability is just one requirement
among many system requirements. Engineering trade
studies are used to determine the optimum balance between reliability and other requirements and constraints.
6 Reliability prediction and improvement
Many tasks, techniques and analyses are specic to par- Reliability prediction is the combination of the creation
ticular industries and applications. Commonly these in- of a proper reliability model together with estimating
(and justifying) the input parameters for this model (like
clude:
failure rates for a particular failure mode or event and the
mean time to repair the system for a particular failure)
Built-in test (BIT) (testability analysis)
and nally to provide a system (or part) level estimate for
Failure mode and eects analysis
the output reliability parameters (system availability or a
(FMEA)
particular functional failure frequency).
Reliability hazard analysis
Some recognized reliability engineering specialists e.g.
Reliability block-diagram analysis
Dynamic
analysis[9]
Reliability
block-diagram
Fault tree analysis

Root cause analysis
Sneak circuit analysis
Accelerated testing
Reliability growth analysis (re-active reliability)
Weibull analysis (re-active reliability)
Thermal analysis by nite element analysis (FEA) and / or measurement
Thermal induced, shock and vibration fatigue analysis by FEA and / or measurement
Electromagnetic analysis
Statistical interference
Avoidance of single point of failure
Patrick O'Connor, R. Barnard have argued that too

much emphasis is often given to the prediction of reliability parameters and more eort should be devoted to the
prevention of failure (reliability improvement).[2] Failures can and should be prevented in the rst place for most
cases. The emphasis on quantication and target setting
in terms of (e.g.) MTBF might provide the idea that there
is a limit to the amount of reliability that can be achieved.
In theory there is no inherent limit and higher reliability
does not need to be more costly in development. Another
of their arguments is that prediction of reliability based
on historic data can be very misleading, as a comparison is only valid for exactly the same designs, products,
manufacturing processes and maintenance under exactly
the same loads and environmental context. Even a minor change in detail in any of these could have major effects on reliability. Furthermore, normally the most unreliable and important items (most interesting candidates
for a reliability investigation) are most often subjected
to many modications and changes. Engineering designs
are in most industries updated frequently. This is the reason why the standard (re-active or pro-active) statistical
8 QUANTITATIVE SYSTEM RELIABILITY PARAMETERS THEORY
methods and processes as used in the medical industry or

insurance branch are not as eective for engineering. Another surprising but logical argument is that to be able to
accurately predict reliability by testing, the exact mechanisms of failure must have been known in most cases
and therefore in most cases can be prevented! Following the incorrect route by trying to quantify and solving a complex reliability engineering problem in terms of
MTBF or Probability and using the re-active approach is
referred to by Barnard as Playing the Numbers Game
and is regarded as bad practise.[3]
perform its intended function during a specied period of

time under stated conditions. Mathematically, this may
be expressed as,
R(t) = P r{T > t} =
f (x) dx
t
where f (x) is the failure probability density

function and t is the length of the period of
time (which is assumed to start from time
zero).
For existing systems, it is arguable that responsible programs would directly analyse and try to correct the root
cause of discovered failures and thereby may render There are a few key elements of this denition:
the initial MTBF estimate fully invalid as new assump1. Reliability is predicated on intended function:"
tions (subject to high error levels) of the eect of the
Generally, this is taken to mean operation without
patch/redesign must be made. Another practical issue
failure. However, even if no individual part of the
concerns a general lack of availability of detailed failsystem fails, but the system as a whole does not do
ure data and not consistent ltering of failure (feedback)
what was intended, then it is still charged against the
data or ignoring statistical errors, which are very high
system reliability. The system requirements specfor rare events (like reliability related failures). Very
ication is the criterion against which reliability is
clear guidelines must be present to be able to count and
measured.
compare failures, related to dierent type of root-causes
(e.g. manufacturing-, maintenance-, transport-, system2. Reliability applies to a specied period of time. In
induced or inherent design failures, ). Comparing dierpractical terms, this means that a system has a specient type of causes may lead to incorrect estimations and
ed chance that it will operate without failure before
incorrect business decisions about the focus of improvetime t. Reliability engineering ensures that compoment.
nents and materials will meet the requirements durTo perform a proper quantitative reliability prediction for
ing the specied time. Units other than time may
systems may be dicult and may be very expensive if
sometimes be used.
done by testing. On part level, results can be obtained
3. Reliability is restricted to operation under stated
often with higher condence as many samples might be
(or explicitly dened) conditions. This constraint is
used for the available testing nancial budget, however
necessary because it is impossible to design a sysunfortunately these tests might lack validity on system
tem for unlimited conditions. A Mars Rover will
level due to the assumptions that had to be made for part
have dierent specied conditions than a family car.
level testing. These authors argue that it can not be emThe operating environment must be addressed durphasized enough that testing for reliability should be done
ing design and testing. That same rover may be reto create failures in the rst place, learn from them and
quired to operate in varying conditions requiring adto improve the system / part. The general conclusion is
ditional scrutiny.
drawn that an accurate and an absolute prediction by
eld data comparison or testing of reliability is in most
cases not possible. An exception might be failures due to
wear-out problems like fatigue failures. In the introduc- 8 Quantitative system reliability
tion of MIL-STD-785 it is written that reliability predicparameters theory
tion should be used with great caution if not only used for
comparison in trade-o studies.
Quantitative Requirements are specied using reliability
See also: Risk Assessment#Quantitative risk assessment parameters. The most common reliability parameter is
Critics paragraph
the mean time to failure (MTTF), which can also be specied as the failure rate (this is expressed as a frequency
or conditional probability density function (PDF)) or the
number of failures during a given period. These param7 Reliability theory
eters may be useful for higher system levels and systems
that are operated frequently, such as most vehicles, maMain articles: Reliability theory, Failure rate and chinery, and electronic equipment. Reliability increases
Survival analysis
as the MTTF increases. The MTTF is usually specied
in hours, but can also be used with other units of meaReliability is dened as the probability that a device will surement, such as miles or cycles. Using MTTF values
7
on lower system levels can be very misleading, specially For part level predictions, two separate elds of investiif the Failures Modes and Mechanisms it concerns (The gation are common:
F in MTTF) are not specied with it.[10]
In other cases, reliability is specied as the probability of
mission success. For example, reliability of a scheduled
aircraft ight can be specied as a dimensionless probability or a percentage, as in system safety engineering.
The physics of failure approach uses an understanding of physical failure mechanisms involved, such as
mechanical crack propagation or chemical corrosion
degradation or failure;
A special case of mission success is the single-shot de The parts stress modelling approach is an empirical
vice or system. These are devices or systems that remain
method for prediction based on counting the number
relatively dormant and only operate once. Examples inand type of components of the system, and the stress
clude automobile airbags, thermal batteries and missiles.
they undergo during operation.
Single-shot reliability is specied as a probability of onetime success, or is subsumed into a related parameter. Software reliability is a more challenging area that must
Single-shot missile reliability may be specied as a re- be considered when it is a considerable component to sysquirement for the probability of a hit. For such systems, tem functionality.
the probability of failure on demand (PFD) is the reliability measure which actually is an unavailability number.
This PFD is derived from failure rate (a frequency of oc10 Reliability test requirements
currence) and mission time for non-repairable systems.
For repairable systems, it is obtained from failure rate
and mean-time-to-repair (MTTR) and test interval. This
measure may not be unique for a given system as this measure depends on the kind of demand. In addition to system level requirements, reliability requirements may be
specied for critical subsystems. In most cases, reliability parameters are specied with appropriate statistical
condence intervals.
Reliability modelling
Reliability modelling is the process of predicting or understanding the reliability of a component or system prior
to its implementation. Two types of analysis that are
often used to model a complete system availability (including eects from logistics issues like spare part provisioning, transport and manpower) behavior are fault tree
analysis and reliability block diagrams. On component
level the same type of analysis can be used together with
others. The input for the models can come from many
sources: Testing, Earlier operational experience eld data
or data handbooks from the same or mixed industries can
be used. In all cases, the data must be used with great caution as predictions are only valid in case the same product
in the same context is used. Often predictions are only
made to compare alternatives.
A reliability block diagram showing a 1oo3 (1 out of 3) redundant designed subsystem
Reliability test requirements can follow from any analysis for which the rst estimate of failure probability, failure mode or eect needs to be justied. Evidence can
be generated with some level of condence by testing.
With software-based systems, the probability is a mix of
software and hardware-based failures. Testing reliability
requirements is problematic for several reasons. A single test is in most cases insucient to generate enough
statistical data. Multiple tests or long-duration tests are
usually very expensive. Some tests are simply impractical, and environmental conditions can be hard to predict
over a systems life-cycle.
Reliability engineering is used to design a realistic and
aordable test program that provides empirical evidence
that the system meets its reliability requirements. Statistical condence levels are used to address some of these
concerns. A certain parameter is expressed along with a
corresponding condence level: for example, an MTBF
of 1000 hours at 90% condence level. From this specication, the reliability engineer can, for example, design
a test with explicit criteria for the number of hours and
number of failures until the requirement is met or failed.
Dierent sorts of tests are possible.
The combination of required reliability level and required
condence level greatly aects the development cost and
the risk to both the customer and producer. Care is
needed to select the best combination of requirements
e.g. cost-eectiveness. Reliability testing may be performed at various levels, such as component, subsystem
and system. Also, many factors must be addressed during
testing and operation, such as extreme temperature and
humidity, shock, vibration, or other environmental factors (like loss of signal, cooling or power; or other catastrophes such as re, oods, excessive heat, physical or
security violations or other myriad forms of damage or
degradation). For systems that must last many years, accelerated life tests may be needed.
11
11
Reliability testing
A reliability sequential test plan
RELIABILITY TESTING
creased by increasing either the test time or the number of

items tested. Reliability test plans are designed to achieve
the specied reliability at the specied condence level
with the minimum number of test units and test time.
Dierent test plans result in dierent levels of risk to the
producer and consumer. The desired reliability, statistical condence, and risk levels for each side inuence the
ultimate test plan. The customer and developer should
agree in advance on how reliability requirements will be
tested.
A key aspect of reliability testing is to dene failure.
Although this may seem obvious, there are many situations where it is not clear whether a failure is really the
fault of the system. Variations in test conditions, operator dierences, weather and unexpected situations create
dierences between the customer and the system developer. One strategy to address this issue is to use a scoring conference process. A scoring conference includes
representatives from the customer, the developer, the test
organization, the reliability organization, and sometimes
independent observers. The scoring conference process
is dened in the statement of work. Each test case is considered by the group and scored as a success or failure.
This scoring is the ocial result used by the reliability
engineer.
The purpose of reliability testing is to discover potential

problems with the design as early as possible and, ulti- As part of the requirements phase, the reliability engineer
mately, provide condence that the system meets its reli- develops a test strategy with the customer. The test strategy makes trade-os between the needs of the reliability
ability requirements.
organization, which wants as much data as possible, and
Reliability testing may be performed at several levels and
constraints such as cost, schedule and available resources.
there are dierent types of testing. Complex systems
Test plans and procedures are developed for each reliabilmay be tested at component, circuit board, unit, assemity test, and results are documented.
bly, subsystem and system levels [11] . (The test level
nomenclature varies among applications.) For example,
performing environmental stress screening tests at lower 11.1 Accelerated testing
levels, such as piece parts or small assemblies, catches
problems before they cause failures at higher levels. Test- The purpose of accelerated life testing (ALT test) is to ining proceeds during each level of integration through full- duce eld failure in the laboratory at a much faster rate by
up system testing, developmental testing, and operational providing a harsher, but nonetheless representative, envitesting, thereby reducing program risk. However, testing ronment. In such a test, the product is expected to fail
does not mitigate unreliability risk.
in the lab just as it would have failed in the eldbut in
With each test both a statistical type 1 and type 2 error much less time. The main objective of an accelerated test
could be made and depends on sample size, test time, as- is either of the following:
sumptions and the needed discrimination ratio. There is
risk of incorrectly accepting a bad design (type 1 error)
To discover failure modes
and the risk of incorrectly rejecting a good design (type
To predict the normal eld life from the
2 error).
high stress lab life
It is not always feasible to test all system requirements.
Some systems are prohibitively expensive to test; some An Accelerated testing program can be broken down
failure modes may take years to observe; some complex into the following steps:
interactions result in a huge number of possible test cases;
and some tests require the use of limited test ranges or
Dene objective and scope of the test
other resources. In such cases, dierent approaches to
Collect required information about the
testing can be used, such as (highly) accelerated life testproduct
ing, design of experiments, and simulations.
Identify the stress(es)
The desired level of statistical condence also plays an
role in reliability testing. Statistical condence is in-
Determine level of stress(es)
9
Conduct the accelerated test and analyze
the collected data.
Common way to determine a life stress relationship are
Arrhenius model
Eyring model
Inverse power law model
Temperaturehumidity model
Temperature non-thermal model
12
Software reliability
Further information: Software reliability
unintended consequences. There is more overlap between software quality engineering and software reliability engineering than between hardware quality and reliability. A good software development plan is a key aspect of the software reliability program. The software
development plan describes the design and coding standards, peer reviews, unit tests, conguration management, software metrics and software models to be used
during software development.
A common reliability metric is the number of software
faults, usually expressed as faults per thousand lines of
code. This metric, along with software execution time,
is key to most software reliability models and estimates.
The theory is that the software reliability increases as
the number of faults (or fault density) decreases or goes
down. Establishing a direct connection between fault
density and mean-time-between-failure is dicult, however, because of the way software faults are distributed in
the code, their severity, and the probability of the combination of inputs necessary to encounter the fault. Nevertheless, fault density serves as a useful indicator for the
reliability engineer. Other software metrics, such as complexity, are also used. This metric remains controversial, since changes in software development and verication practices can have dramatic impact on overall defect
rates.
Software reliability is a special aspect of reliability engineering. System reliability, by denition, includes all
parts of the system, including hardware, software, supporting infrastructure (including critical external interfaces), operators and procedures. Traditionally, reliability engineering focuses on critical hardware parts of the
system. Since the widespread use of digital integrated cirTesting is even more important for software than hardcuit technology, software has become an increasingly critware. Even the best software development process results
ical part of most electronics and, hence, nearly all present
in some software faults that are nearly undetectable until
day systems.
tested. As with hardware, software is tested at several
There are signicant dierences, however, in how soft- levels, starting with individual units, through integration
ware and hardware behave. Most hardware unreliabil- and full-up system testing. Unlike hardware, it is inadvisity is the result of a component or material failure that able to skip levels of software testing. During all phases
results in the system not performing its intended func- of testing, software faults are discovered, corrected, and
tion. Repairing or replacing the hardware component re-tested. Reliability estimates are updated based on the
restores the system to its original operating state. How- fault density and other metrics. At a system level, meanever, software does not fail in the same sense that hard- time-between-failure data can be collected and used to
ware fails. Instead, software unreliability is the result of estimate reliability. Unlike hardware, performing exactly
unanticipated results of software operations. Even rela- the same test on exactly the same software conguration
tively small software programs can have astronomically does not provide increased statistical condence. Instead,
large combinations of inputs and states that are infeasi- software reliability uses dierent metrics, such as code
ble to exhaustively test. Restoring software to its original coverage.
state only works until the same combination of inputs and
Eventually, the software is integrated with the hardware
states results in the same unintended result. Software rein the top-level system, and software reliability is subliability engineering must take this into account.
sumed by system reliability. The Software Engineering
Despite this dierence in the source of failure between Institutes capability maturity model is a common means
software and hardware, several software reliability mod- of assessing the overall software development process for
els based on statistics have been proposed to quantify reliability and quality purposes.
what we experience with software: the longer software
is run, the higher the probability that it will eventually
be used in an untested manner and exhibit a latent defect
that results in a failure (Shooman 1987), (Musa 2005), 13 Reliability engineering vs safety
(Denney 2005).
engineering
As with hardware, software reliability depends on good
requirements, design and implementation. Software reli- Reliability engineering diers from safety engineering
ability engineering relies heavily on a disciplined software with respect to the kind of hazards that are considered.
engineering process to anticipate and design against Reliability engineering is in the end only concerned with
10
14
cost. It relates to all Reliability hazards that could transform into incidents with a particular level of loss of revenue for the company or the customer. These can be cost
due to loss of production due to system unavailability, unexpected high or low demands for spares, repair costs,
man hours, (multiple) re-designs, interruptions on normal
production (e.g. due to high repair times or due to unexpected demands for non-stocked spares) and many other
indirect costs.
ing. There are no safe xed positions for rudder or other

steering parts when the aircraft is ying).
13.2 Basic reliability and mission (operational) reliability

The above example of a 2oo3 fault tolerant system increases both mission reliability as well as safety. However, the basic reliability of the system will in this case
still be lower than a non redundant (1oo1) or 2oo2 system! Basic reliability refers to all failures, including those
that might not result in system failure, but do result in
maintenance repair actions, logistic cost, use of spares,
etc. For example, the replacement or repair of 1 channel in a 2oo3 voting system that is still operating with one
failed channel (which in this state actually has become a
1oo2 system) is contributing to basic unreliability but not
mission unreliability. Also, for example, the failure of
the taillight of an aircraft is not considered as a mission
loss failure, but does contribute to the basic unreliability.
Safety engineering, on the other hand, is more specic

and regulated. It relates to only very specic and system
safety hazards that could potentially lead to severe accidents and is primarily concerned with loss of life, loss of
equipment, or environmental damage. The related system functional reliability requirements are sometimes extremely high. It deals with unwanted dangerous events
(for life, property, and environment) in the same sense
as reliability engineering, but does normally not directly
look at cost and is not concerned with repair actions after
failure / accidents (on system level). Another dierence
is the level of impact of failures on society and the control
of governments. Safety engineering is often strictly controlled by governments (e.g. nuclear, aerospace, defense,
13.3
rail and oil industries).
Furthermore, safety engineering and reliability engineering may even have contradicting requirements. This relates to system level architecture choices . For example,
in train signal control systems it is common practice to
use a fail-safe system design concept. In this concept the
Wrong-side failure need to be fully controlled to an extreme low failure rate. These failures are related to possible severe eects, like frontal collisions (2* GREEN
lights). Systems are designed in a way that the far majority of failures will simply result in a temporary or total loss of signals or open contacts of relays and generate
RED lights for all trains. This is the safe state. All trains
are stopped immediately. This fail-safe logic might unfortunately lower the reliability of the system. The reason for this is the higher risk of false tripping as any full
or temporary, intermittent failure is quickly latched in a
shut-down (safe)state. Dierent solutions are available
for this issue. See chapter Fault Tolerance below.
13.1
Fault tolerance
Reliability can be increased here by using a 2oo2 (2 out

of 2) redundancy on part or system level, but this does in
turn lower the safety levels (more possibilities for Wrong
Side and undetected dangerous Failures). Fault tolerant
voting systems (e.g. 2oo3 voting logic) can increase both
reliability and safety on a system level. In this case the
so-called operational or mission reliability as well as
the safety of a system can be increased. This is also common practice in Aerospace systems that need continued
availability and do not have a fail safe mode (e.g. ight
computers and related electrical and / or mechanical and
/ or hydraulic steering functions need always to be work-
RELIABILITY VERSUS QUALITY (SIX SIGMA)
Detectability and common cause failures
When using fault tolerant (redundant architectures) systems or systems that are equipped with protection functions, detectability of failures and avoidance of common
cause failures become paramount for safe functioning
and/or mission reliability.
14 Reliability versus Quality (Six

Sigma)
The everyday usage term quality of a product is loosely
taken to mean its inherent degree of excellence. In industry, this is made more precise by dening quality to
be conformance to requirements at the start of use. Assuming the product specications adequately capture customer needs, the quality level can now be precisely measured by the fraction of units shipped that meet the detailed product specications.[12]
But are requirements and related product specications
validated? Will it later result in worn, by fatigue of corrosion mechanisms or due to maintenance induced failures
changed items and how many of these systems still meet
function and fullll customer needs after a week of operation? What performance loss do we see or is it fully out
of main function? What happens after a month, or at the
end of a one year warranty period? That is where reliability comes in. Quality is a snapshot at the start of life
and mainly related to control of product specications and
reliability is more of a system level motion picture of the
day-by-day operation. Time zero defects are manufacturing mistakes that escaped nal test (Quality Control). The
11
additional defects that appear over time are reliability
defects or reliability fallout. These reliability issues may
just as well occur due to Inherent design issues, which
may have nothing to do with non-conformance product
specications. Items that are produced perfectly - according all product specications - may fail over time due
to any failure mechanism (e.g. mechanical-, electrical, chemical- or human error related). Theoretically, all
items will functionally fail over innite time.[13] The Quality level might be described by a single fraction defective.
To describe reliability fallout a probability model that describes the fraction fallout over time is needed. This is
known as the life distribution model.[12]
organization.
Quality is therefore related to Manufacturing and Reliability is more related to the validation of sub-system
or lower item requirements and design solutions. Items
that do not conform to (any) product specication in general will do worse in terms of reliability (having a lower
MTTF), but this does not always have to be the case. The
full Quantication (in statistical models) of this combined
relation is in general very dicult. In case manufacturing variances can be eectively reduced, six sigma tools
may be used to nd optimal process solutions and may
thereby also increase reliability. Other Reliability Solutions are generally found by either simplifying a system, understanding all mechanisms of failure involved,
increase robustness (against variation from the manufacturing variances and failure mechanisms) and possibly to
use redundancy and fault tolerant systems in case of high
availability needs (see chapter Reliability engineering vs
Safety engineering above).
It is extremely important to have one common source

FRACAS system for all end items. Also, test results
should be able to be captured here in a practical way. Failure to adopt one easy to handle (easy data entry for eld
engineers and repair shop engineers)and maintain integrated system is likely to result in a FRACAS program
failure.
One of the most common methods to apply to a reliability

operational assessment are failure reporting, analysis, and
corrective action systems (FRACAS). This systematic
approach develops a reliability, safety and logistics assessment based on Failure / Incident reporting, management, analysis and corrective/preventive actions. Organizations today are adopting this method and utilize commercial systems such as a Web based FRACAS application enabling an organization to create a failure/incident
data repository from which statistics can be derived to
view accurate and genuine reliability, safety and quality
performances.
Some of the common outputs from a FRACAS system includes: Field MTBF, MTTR, Spares Consumption, Reliability Growth, Failure/Incidents distribution by type,
location, part no., serial no, symptom etc.
The use of past data to predict the reliability of new comparable systems/items can be misleading as reliability is
a function of the context of use and can be aected by
small changes in the designs/manufacturing.
16 Reliability organizations
15
Reliability operational assessment
After a system is produced, reliability engineering monitors, assesses and corrects deciencies. Monitoring includes electronic and visual surveillance of critical parameters identied during the fault tree analysis design
stage. Data collection is highly dependent on the nature of
the system. Most large organizations have quality control
groups that collect failure data on vehicles, equipment and
machinery. Consumer product failures are often tracked
by the number of returns. For systems in dormant storage
or on standby, it is necessary to establish a formal surveillance program to inspect and test random samples. Any
changes to the system, such as eld upgrades or recall repairs, require additional reliability testing to ensure the
reliability of the modication. Since it is not possible to
anticipate all the failure modes of a given system, especially ones with a human element, failures will occur. The
reliability program also includes a systematic root cause
analysis that identies the causal relationships involved in
the failure such that eective corrective actions may be
implemented. When possible, system failures and corrective actions are reported to the reliability engineering
Systems of any signicant complexity are developed by

organizations of people, such as a commercial company
or a government agency. The reliability engineering
organization must be consistent with the companys
organizational structure. For small, non-critical systems,
reliability engineering may be informal. As complexity
grows, the need arises for a formal reliability function.
Because reliability is important to the customer, the customer may even specify certain aspects of the reliability
organization.
There are several common types of reliability organizations. The project manager or chief engineer may employ
one or more reliability engineers directly. In larger organizations, there is usually a product assurance or specialty
engineering organization, which may include reliability,
maintainability, quality, safety, human factors, logistics,
etc. In such case, the reliability engineer reports to the
product assurance manager or specialty engineering manager.
In some cases, a company may wish to establish an independent reliability organization. This is desirable to ensure that the system reliability, which is often expensive
and time consuming, is not unduly slighted due to bud-
12
19
get and schedule pressures. In such cases, the reliability

engineer works for the project day-to-day, but is actually
employed and paid by a separate organization within the
company.
Failing badly
Because reliability engineering is critical to early system

design, it has become common for reliability engineers,
however the organization is structured, to work as part of
an integrated product team.
Fault tree analysis
17
Certication
The American Society for Quality has a program to become a Certied Reliability Engineer, CRE. Certication is based on education, experience, and a certication
test: periodic re-certication is required. The body of
knowledge for the test includes: reliability management,
design evaluation, product safety, statistical tools, design
and development, modeling, reliability testing, collecting
and using data, etc.
Another highly respected certication program is the
CRP (Certied Reliability Professional). To achieve certication, candidates must complete a series of courses
focused on important Reliability Engineering topics, successfully apply the learned body of knowledge in the
workplace and publicly present this expertise in an industry conference or journal.
SEE ALSO
FMEA
Fault-tolerant system
Fracture mechanics
Solid mechanics
Highly accelerated life test
Highly accelerated stress test
Human reliability
Industrial engineering
Integrated logistics support
Logistic engineering
Performance engineering
Product qualication
Professional engineer
Quality assurance
RAMS
Redundancy (engineering)
Redundancy (total quality management)
18
Reliability engineering education
Some universities oer graduate degrees in reliability engineering. Other reliability engineers typically have an
engineering degree, which can be in any eld of engineering, from an accredited university or college program.
Many engineering programs oer reliability courses, and
some universities have entire reliability engineering programs. A reliability engineer may be registered as a professional engineer by the state, but this is not required
by most employers. There are many professional conferences and industry training programs available for reliability engineers. Several professional organizations exist
for reliability engineers, including the IEEE Reliability
Society, the American Society for Quality (ASQ), and
the Society of Reliability Engineers (SRE).
19
See also
Reliability (disambiguation)
Reliability, availability and serviceability (computer
hardware)
Reliability theory
Reliability theory of aging and longevity
Reliable system design
Risk assessment
Safety engineering
Safety integrity level
Security engineering
Single point of failure (SPOF)
Software engineering
Software reliability testing
Spurious trip level
Brittle systems
Structural fracture mechanics
Burn-in
Strength of materials
Cauchy stress tensor
Systems engineering
Factor of safety
Temperature cycling
13
20
References
[1] Institute of Electrical and Electronics Engineers (1990)

IEEE Standard Computer Dictionary: A Compilation of
IEEE Standard Computer Glossaries. New York, NY
ISBN 1-55937-079-3
[2] O'Connor, Patrick D. T. (2002), Practical Reliability Engineering (Fourth Ed.), John Wiley & Sons, New York.
ISBN 978-0-4708-4462-5.
[3] Barnard, R.W.A. (2008). What is wrong with Reliability
Engineering?". Lambda Consulting. Retrieved 30 October 2014.
[4] Articles - Where Do Reliability Engineers Come From?
- ReliabilityWeb.com: A Culture of Reliability.
[5] Using Failure Modes, Mechanisms, and Eects Analysis in Medical Device Adverse Event Investigations, S.
Cheng, D. Das, and M. Pecht, ICBO: International Conference on Biomedical Ontology, Bualo, NY, July 26
30, 2011, pp. 340345
[6] Federal Aviation Administration (19 March 2013).
System Safety Handbook (PDF). U.S. Department of
Transportation. Retrieved 2 June 2013.
[7] System Reliability Theory, second edition, Rausand and
Hoyland - 2004
[8] The Blame Machine, Why Human Error Causes Accidents - Whittingham, 2007
[9] Salvatore Distefano, Antonio Puliato: Dependability
Evaluation with Dynamic Reliability Block Diagrams and
Dynamic Fault Trees. IEEE Trans. Dependable Sec.
Comput. 6(1): 4-17 (2009)
[10] Practical Reliability Engineering, O'Conner, 2001
[11] Ben-Gal I., Herer Y. and Raz T. (2003). Self-correcting
inspection procedure under inspection errors. IIE Transactions on Quality and Reliability, 34(6), pp. 529540.
[12] 8.1.1.1. Quality versus reliability.
[13] The Second Law of Thermodynamics, Evolution, and
Probability.
21
Further reading
Blanchard, Benjamin S. (1992), Logistics Engineering and Management (Fourth Ed.), Prentice-Hall,
Inc., Englewood Clis, New Jersey.
Breitler, Alan L. and Sloan, C. (2005), Proceedings
of the American Institute of Aeronautics and Astronautics (AIAA) Air Force T&E Days Conference,
Nashville, TN, December, 2005: System Reliability Prediction: towards a General Approach Using a
Neural Network.
Ebeling, Charles E., (1997), An Introduction to Reliability and Maintainability Engineering, McGrawHill Companies, Inc., Boston.
Denney, Richard (2005) Succeeding with Use
Cases: Working Smart to Deliver Quality. AddisonWesley Professional Publishing. ISBN . Discusses
the use of software reliability engineering in use
case driven software development.
Gano, Dean L. (2007), Apollo Root Cause Analysis (Third Edition), Apollonian Publications, LLC.,
Richland, Washington
Holmes, Oliver Wendell, Sr. The Deacons Masterpiece
Kapur, K.C., and Lamberson, L.R., (1977), Reliability in Engineering Design, John Wiley & Sons,
New York.
Kececioglu, Dimitri, (1991) Reliability Engineering Handbook, Prentice-Hall, Englewood Clis,
New Jersey
Trevor Kletz (1998) Process Plants: A Handbook for
Inherently Safer Design CRC ISBN 1-56032-619-0
Leemis, Lawrence, (1995) Reliability: Probabilistic
Models and Statistical Methods, 1995, Prentice-Hall.
ISBN 0-13-720517-1
Frank Lees (2005). Loss Prevention in the Process
Industries (3rdEdition ed.). Elsevier. ISBN 978-07506-7555-0.
MacDiarmid, Preston; Morris, Seymour; et al.,
(1995), Reliability Toolkit: Commercial Practices
Edition, Reliability Analysis Center and Rome Laboratory, Rome, New York.
Modarres, Mohammad;
Kaminskiy, Mark;
Krivtsov, Vasiliy (1999), Reliability Engineering
and Risk Analysis: A Practical Guide, CRC Press,
ISBN 0-8247-2000-8.
Musa, John (2005) Software Reliability Engineering: More Reliable Software Faster and Cheaper,
2nd. Edition, AuthorHouse. ISBN
Neubeck, Ken (2004) Practical Reliability Analysis, Prentice Hall, New Jersey
Neufelder, Ann Marie, (1993), Ensuring Software
Reliability, Marcel Dekker, Inc., New York.
O'Connor, Patrick D. T. (2002), Practical Reliability
Engineering (Fourth Ed.), John Wiley & Sons, New
York. ISBN 978-0-4708-4462-5.
Shooman, Martin, (1987), Software Engineering:
Design, Reliability, and Management, McGraw-Hill,
New York.
14
21 FURTHER READING
Tobias, Trindade, (1995), Applied Reliability, Chapman & Hall/CRC, ISBN 0-442-00469-9
Springer Series in Reliability Engineering
Nelson, Wayne B., (2004), Accelerated Testing
Statistical Models, Test Plans, and Data Analysis,
John Wiley & Sons, New York, ISBN 0-471-697362
Bagdonavicius, V., Nikulin, M., (2002), Accelerated Life Models. Modeling and Statistical analysis, CHAPMAN&HALL/CRC, Boca Raton, ISBN
1-58488-186-0
21.1
US standards, specications, and

handbooks
Aerospace
Report
Number:
TOR2007(8583)6889 Reliability Program Requirements for Space Systems, The Aerospace
Corporation (10 Jul 2007)
DoD 3235.1-H (3rd Ed) Test and Evaluation of System Reliability, Availability, and Maintainability (A
Primer), U.S. Department of Defense (March 1982)
.
NASA GSFC 431-REF-000370 Flight Assurance
Procedure: Performing a Failure Mode and Eects
Analysis, National Aeronautics and Space Administration Goddard Space Flight Center (10 Aug 1996).
IEEE 13321998 IEEE Standard Reliability Program for the Development and Production of Electronic Systems and Equipment, Institute of Electrical
and Electronics Engineers (1998).
MIL-STD-690D Failure Rate Sampling Plans and

Procedures, U.S. Department of Defense (10 Jun
2005).
MIL-HDBK-338B Electronic Reliability Design
Handbook, U.S. Department of Defense (1 Oct
1998).
MIL-HDBK-2173 Reliability-Centered Maintenance (RCM) Requirements for Naval Aircraft,
Weapon Systems, and Support Equipment, U.S. Department of Defense (30 JAN 1998); (superseded
by NAVAIR 00-25-403).
MIL-STD-1543B Reliability Program Requirements
for Space and Launch Vehicles, U.S. Department of
Defense (25 Oct 1988).
MIL-STD-1629A Procedures for Performing a Failure Mode Eects and Criticality Analysis, U.S. Department of Defense (24 Nov 1980).
MIL-HDBK-781A Reliability Test Methods, Plans,
and Environments for Engineering Development,
Qualication, and Production, U.S. Department of
Defense (1 Apr 1996).
NSWC-06 (Part A & B) Handbook of Reliability Prediction Procedures for Mechanical Equipment,
Naval Surface Warfare Center (10 Jan 2006).
SR-332 Reliability Prediction Procedure for Electronic Equipment, Telcordia Technologies (January
2011).
FD-ARPP-01 Automated Reliability Prediction Procedure, Telcordia Technologies (January 2011).
JPL D-5703 Reliability Analysis Handbook, 21.2 UK standards

National Aeronautics and Space Administration Jet
In the UK, there are more up to date standards maintained
Propulsion Laboratory (July 1990).
under the sponsorship of UK MOD as Defence Standards.
MIL-STD-785B Reliability Program for Systems The relevant Standards include:
and Equipment Development and Production, U.S. DEF STAN 00-40 Reliability and Maintainability
Department of Defense (15 Sep 1980). (*Obsolete, (R&M)
superseded by ANSI/GEIA-STD-0009-2008 titled
Reliability Program Standard for Systems Design,
PART 1: Issue 5: Management Responsibilities and
Development, and Manufacturing, 13 Nov 2008)
Requirements for Programmes and Plans
MIL-HDBK-217F Reliability Prediction of Elec PART 4: (ARMP-4)Issue 2: Guidance for Writing
tronic Equipment, U.S. Department of Defense (2
NATO R&M Requirements Documents
Dec 1991).
MIL-HDBK-217F (Notice 1) Reliability Prediction
of Electronic Equipment, U.S. Department of Defense (10 Jul 1992).
PART 6: Issue 1: IN-SERVICE R & M

PART 7 (ARMP-7) Issue 1: NATO R&M Terminology Applicable to ARMPs
MIL-HDBK-217F (Notice 2) Reliability Prediction

of Electronic Equipment, U.S. Department of De- DEF STAN 00-42 RELIABILITY AND MAINTAINfense (28 Feb 1995).
ABILITY ASSURANCE GUIDES
15
PART 1:
Issue
VICES/SYSTEMS
1:
ONE-SHOT
DE-
PART 2: Issue 1: SOFTWARE

PART 3: Issue 2: R&M CASE
PART 4: Issue 1: Testability
PART 5: Issue 1: IN-SERVICE RELIABILITY
DEMONSTRATIONS
DEF STAN 00-43 RELIABILITY AND MAINTAINABILITY ASSURANCE ACTIVITY
PART 2: Issue 1: IN-SERVICE MAINTAINABILITY DEMONSTRATIONS
DEF STAN 00-44 RELIABILITY AND MAINTAINABILITY DATA COLLECTION AND CLASSIFICATION
PART 1: Issue 2: MAINTENANCE DATA &
DEFECT REPORTING IN THE ROYAL NAVY,
THE ARMY AND THE ROYAL AIR FORCE
PART 2: Issue 1: DATA CLASSIFICATION AND
INCIDENT SENTENCING GENERAL
PART 3: Issue 1: INCIDENT SENTENCING
SEA
PART 4: Issue 1: INCIDENT SENTENCING
LAND
DEF STAN 00-45 Issue 1: RELIABILITY CENTERED
MAINTENANCE
DEF STAN 00-49 Issue 1: RELIABILITY AND MAINTAINABILITY MOD GUIDE TO TERMINOLOGY
DEFINITIONS
These can be obtained from DSTAN. There are also many
commercial standards, produced by many organisations
including the SAE, MSG, ARP, and IEE.
21.3
French standards
FIDES . The FIDES methodology (UTE-C 80-811)

is based on the physics of failures and supported by
the analysis of test data, eld returns and existing
modelling.
UTE-C 80810 or RDF2000 . The RDF2000
methodology is based on the French telecom experience.
21.4
International standards
TC 56 Standards: Dependability
22 External links
Prognostics Journal, an open-access journal, provides an international forum for the electronic publication of original research and industrial experience
articles in all areas of systems reliability and prognostics.
Models and methods regarding reliability analysis
Structural Safety
16
23
23
23.1
TEXT AND IMAGE SOURCES, CONTRIBUTORS, AND LICENSES
Text and image sources, contributors, and licenses

Text
Reliability engineering Source: http://en.wikipedia.org/wiki/Reliability_engineering?oldid=632134781 Contributors: AxelBoldt, Tedernst, Edward, Michael Hardy, Rp, Karada, Ronz, CatherineMunro, Silvonen, Nurg, Aetheling, Tom harrison, Daveplot, Jonel, Andreas
Kaufmann, Spalding, Mdd, Musiphil, Velella, Jack4875, Wyatts, Versageek, Ringbang, Woohookitty, David Haslam, Agaran, Macaddct1984, Mandarax, Rjwilmsi, Ckelloug, Chobot, DVdm, Gdrbot, Pinecar, RussBot, Sasuke Sarutobi, Xaviervd, SamJohnston, Welsh,
Jpbowen, Tony1, Eclipsed, Tribaal, Georgecmu, Nelson50, JLaTondre, Allens, SmackBot, Sonoma-rich, Slashme, WalNi, Commander
Keane bot, Thumperward, Shalom Yechiel, Frap, KaiserbBot, Allan McInnes, MichaelBillington, Kuru, Copeland.James.H, Bushsf, Petter73, SjoerdOptLand, RichardDenney, Levineps, Pqrstuv, Gavrilov, CmdrObot, David s gra, Egoadvocate, Hertzsprung, Gregbard, Mr
pand, Vincentvikram, Noclevername, Ingolfson, JAnDbot, Yocto42, MER-C, PhilKnight, Hlperng, Upholder, SunSw0rd, JaGa, Jr2349,
Mfhall, R'n'B, EdBever, Pharaoh of the Wizards, Rlsheehan, Maurice Carbonaro, Tonyshan, Bernard S. Jansen, Deevakar k, Modarres, Inwind, Rei-bot, Senarvi, Seb az86556, Michaeldsuarez, Hanwufu, Antosheryl, Vitalikk, Finnrind, Itemuk, Reinderien, SouthLake,
Sanya3, S2000magician, Melcombe, Adamantios1, Fa1275, Johnqtodd, Schbrownie, Bassplr19, J.Trew, Niemeyerstein en, Ashchori,
Niceguyedc, Rockfang, Sv1xv, PixelBot, Sun Creator, Maurizio.Cattaneo, Fede.Campana, Dthomsen8, Printz150, WikHead, Addbot,
Longcr, Deangano, MrOllie, LaaknorBot, Favonian, Damsun, -, Luckas-bot, Yobot, Les boys, AnomieBOT, Boreman, Tripodian, Kokcharov, Crzer07, KrazyKotik, Rjackchapman, Aldfavoweb, FrescoBot, CG-Bradley, Downsize43, Alexdfromald, Louperibot,
Monnini1, Kiefer.Wolfowitz, Vasywriter, Tmcgibbo, Mrausand, Rosenlind, Jandalhandler, SchreyP, Daniel dulay, Makki98, DexDor,
Dewritech, Ibbn, Telcoterry, GoingBatty, WCroslan, Hhhippo, Pokeme444, Akerans, Druzhnik, Thine Antique Pen, ASQ-Reliability-Div,
ClueBot NG, Jack Greenmaven, Urul, Dougmcdonell, CasualVisitor, Helpful Pixie Bot, Scwarebang, Wbm1058, BG19bot, Academy633,
Shreyankh, Mark Arsten, Atoine85, FeralOink, M2OS, ALD1984, BattyBot, Anna.nozik, ChrisGualtieri, CouchSurfer222, FMEA Expert,
Mogism, Standardschecker, ABCZzzWriter1, Zzzz12345678, Matjaz285, Londella, Dependability, On.the.same.page, Cmattison387,
Deedmonds, Vebjorl, IrfanSha, Aecannon12, JaneningUMD and Anonymous: 152
23.2
Images
File:Commons-logo.svg Source: http://upload.wikimedia.org/wikipedia/en/4/4a/Commons-logo.svg License: ? Contributors: ? Original

artist: ?
File:Edit-clear.svg Source: http://upload.wikimedia.org/wikipedia/en/f/f2/Edit-clear.svg License: ? Contributors: The Tango! Desktop
Project. Original artist:
The people from the Tango! project. And according to the meta-data in the le, specically: Andreas Nilsson, and Jakub Steiner (although
minimally).
File:Fault_tree.png Source: http://upload.wikimedia.org/wikipedia/commons/d/d6/Fault_tree.png License: Public domain Contributors:
Transferred from en.wikipedia Original artist: Original uploader was Wyatts at en.wikipedia
File:Folder_Hexagonal_Icon.svg Source: http://upload.wikimedia.org/wikipedia/en/4/48/Folder_Hexagonal_Icon.svg License: ? Contributors: ? Original artist: ?
File:Nuvola_apps_kcmsystem.svg Source: http://upload.wikimedia.org/wikipedia/commons/7/7a/Nuvola_apps_kcmsystem.svg License: LGPL Contributors: Own work based on Image:Nuvola apps kcmsystem.png by Alphax originally from [1] Original artist:
MesserWoland
File:Reliability_block_diagram.png Source: http://upload.wikimedia.org/wikipedia/commons/0/03/Reliability_block_diagram.png License: Public domain Contributors: DOD USA Original artist: User:Wyatts
File:Reliability_sequential_test_plan.png Source:
png License: ? Contributors: ? Original artist: ?
http://upload.wikimedia.org/wikipedia/en/4/41/Reliability_sequential_test_plan.
File:Text_document_with_red_question_mark.svg Source: http://upload.wikimedia.org/wikipedia/commons/a/a4/Text_document_

with_red_question_mark.svg License: Public domain Contributors: Created by bdesham with Inkscape; based upon Text-x-generic.svg
from the Tango project. Original artist: Benjamin D. Esham (bdesham)
23.3
Content license
Creative Commons Attribution-Share Alike 3.0

Reliability

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Reliability

Uploaded by

Copyright:

Available Formats

Reliability engineering

Functional (Failure) analysis

Reliability engineering is engineering that emphasizes

Human Factors / Errors

Spare-parts stocking (Availability control)

The capacity of a designed, produced or maintained

Reliability engineering for complex systems requires a

The capacity of a population of designed, produced

Use (load) studies and requirements specication

The probability of an item to perform a required

2 RELIABILITY AND AVAILABILITY PROGRAM PLAN

2 Reliability and availability program plan

5 DESIGN FOR RELIABILITY

critical system may rely on nal test reports. The most

5 Design for reliability

A Fault Tree Diagram

One of the most important design techniques is

Functional analysis and functional failure

6 Reliability prediction and improvement

Fault tree analysis

Patrick O'Connor, R. Barnard have argued that too

8 QUANTITATIVE SYSTEM RELIABILITY PARAMETERS THEORY

methods and processes as used in the medical industry or

perform its intended function during a specied period of

R(t) = P r{T > t} =

where f (x) is the failure probability density

A reliability block diagram showing a 1oo3 (1 out of 3) redundant designed subsystem

A reliability sequential test plan

creased by increasing either the test time or the number of

The purpose of reliability testing is to discover potential

Determine level of stress(es)

Further information: Software reliability

ing. There are no safe xed positions for rudder or other

13.2 Basic reliability and mission (operational) reliability

Safety engineering, on the other hand, is more specic

Reliability can be increased here by using a 2oo2 (2 out

RELIABILITY VERSUS QUALITY (SIX SIGMA)

Detectability and common cause failures

14 Reliability versus Quality (Six

It is extremely important to have one common source

One of the most common methods to apply to a reliability

Reliability operational assessment

Systems of any signicant complexity are developed by

get and schedule pressures. In such cases, the reliability

Because reliability engineering is critical to early system

Fault tree analysis

Reliability engineering education

Structural fracture mechanics

Cauchy stress tensor

[1] Institute of Electrical and Electronics Engineers (1990)

US standards, specications, and

MIL-STD-690D Failure Rate Sampling Plans and

JPL D-5703 Reliability Analysis Handbook, 21.2 UK standards

PART 6: Issue 1: IN-SERVICE R & M

MIL-HDBK-217F (Notice 2) Reliability Prediction

PART 2: Issue 1: SOFTWARE

FIDES . The FIDES methodology (UTE-C 80-811)

TEXT AND IMAGE SOURCES, CONTRIBUTORS, AND LICENSES

Text and image sources, contributors, and licenses

File:Commons-logo.svg Source: http://upload.wikimedia.org/wikipedia/en/4/4a/Commons-logo.svg License: ? Contributors: ? Original

File:Text_document_with_red_question_mark.svg Source: http://upload.wikimedia.org/wikipedia/commons/a/a4/Text_document_

Creative Commons Attribution-Share Alike 3.0

You might also like