You are on page 1of 17

Regular Paper

Improving System
Reliability by Failure-Mode
Avoidance Including Four
Concept Design Strategies
Don Clausing1, * and Daniel D. Frey2

1
Massachusetts Institute of Technology (retired)
2
Massachusetts Institute of Technology, 77 Massachusetts Avenue, Room 3-449D, Cambridge, MA 02139
IMPROVING SYSTEM RELIABILITY BY FAILURE-MODE AVOIDANCE

Received 25 March 2005; Accepted 6 June 2005, after one or more revisions
Published online in Wiley InterScience (www.interscience.wiley.com).
DOI 10.1002/sys.20034

ABSTRACT
To be reliable, a system must be robust—it must avoid failure modes even in the presence
of a broad range of conditions including harsh environments, changing operational demands,
and internal deterioration. This paper discusses and codifies techniques for robust system
design that operate by expanding the range of conditions under which the system functions.
A distinction is introduced between one-sided and two-sided failure modes, and four strate-
gies are presented for creating larger windows between sets of one-sided failure modes. Each
strategy is illustrated through two examples from industrial practice. For each strategy, one
example is from paper handling and another is from jet engines. By showing that every strategy
has been successfully applied to each system, we seek to illustrate that the strategies are
widely applicable and highly effective. © 2005 Wiley Periodicals, Inc. Syst Eng 8: 245–261, 2005

Key words: reliability; robust design; operating window; system architecture

1. MOTIVATION: RELIABILITY AND


SYSTEMS ENGINEERING

Reliability is among the most important topics in sys-


*
Author to whom all correspondence should be addressed: 245 Bishops tems engineering. Reliability is the proper functioning
Forest Drive, Waltham, MA 02452 (e-mail: dontqd@comcast.net). of the system under the full range of conditions experi-
Systems Engineering, Vol. 8, No. 3, 2005 enced in the field. Reliability requires two critical con-
© 2005 Wiley Periodicals, Inc. ditions:

245
246 CLAUSING AND FREY

1. Mistake avoidance We claim that, especially in the early development


2. Robustness of systems, the failure-mode avoidance approach will
lead to many improvements being made with a mini-
By “mistake” we refer to the plethora of design mum amount of data required—just enough to guide
decisions and manufacturing operations that may be the next improvement. The failure-mode avoidance ap-
grossly in error. Examples of mistakes are installing a proach is deeply rooted in the physics of the system and
switch backwards, or interpreting a software command is therefore tangible to the engineers, which facilitates
as being expressed in inches when it represents centi- the needed creative insights for concept design. This
meters. Reliability can be improved by reducing the advantage is supported by recent results from cognitive
incidence of such mistakes by a combination of knowl- psychology.
edge-based engineering and the problem-solving proc- Gigerenzer and Edwards [2003] conducted an ex-
ess. periment in which medical doctors were given data
By “robustness” we refer to the ability of a system regarding tests for cancer. If the data are presented in
to function (i.e., to avoid failure) under the full range of terms of probabilities, the doctors typically perform
conditions that may be experienced in the field. It is one very poorly (Fig. 1). However, given the same basic
sort of challenge to develop a system that functions for scenario described in frequency formats, doctors per-
a demonstration under tightly controlled conditions form far better. In interpreting these results, Gigerenzer
such as in a laboratory. It is an entirely different chal- states “our perceptual system has been shaped by the
lenge to make a system that functions reliably through- environment in which our ancestors evolved, which is
out its lifecycle as it experiences a broad set of real often referred to as the ‘environment of evolutionary
world environmental and operating conditions. Effec- adaptiveness’ or EEA … I propose that human reason-
tive systems engineering is the second challenge, not ing algorithms are … designed for information that
comes in a format that was present in the EEA” [Gig-
the first one.
erenzer, 1998, p. 10]. Gigerenzer goes on to say “I
In its traditional formulation reliability is stated as
believe we can be as certain as we ever can be: Prob-
the probability of failure under specified operating con-
abilities and percentages were not the way organisms
ditions. A typical textbook that addresses reliability will
encountered information” and “I propose the original
present a set of probabilistic concepts such as a survival
format was event frequencies, acquired by natural sam-
function, failure rates, and mean times between failures.
pling,” p. 12. Gigerenzer then makes a more general
These concepts are then related to a model of the causes
claim: “Information needs representation. If a repre-
of failure such as component reliabilities or material
sentation is recurrent and stable during human evolu-
and environmental variability. To make the model quan-
tion, one can expect that mental algorithms are designed
titative, specified operating conditions are stipulated as to operate on this representation,” p. 29.
an agreed upon range of allowable conditions or an We seek to apply Gigerenzer’s insight to reliability
estimated probability density function for uncertain or engineering. What kind of information about reliability
variable parameters. This approach is well suited to was recurrent and stable during human evolution? We
calculating predicted failure rates once all of the data propose that our ancestors observed failures in the
are available. This is the general approach of texts that systems they were crafting (spears, fields of crops,
emphasize reliability analysis (such as Ushakov pottery, etc.) and could directly perceive the conditions
[1994]) as well as texts oriented toward design for that led to failures. Because of this, humans find it
reliability (such as Rao [1992]). This is a very sound natural to reason about failure modes and their physical
approach, but here we present an alternative formula- causes. Humans also have very natural visual thinking
tion of reliability that has proven very effective in the abilities and may find it natural to reason about failure-
improvement of reliability early in the development of mode boundaries—regions of a map of the parameter
a new system. space that lead to failure. “Failure-mode avoidance” is
An alternative conception of reliability engineering a design activity in which these failure-mode bounda-
is based on what we call “failure-mode avoidance.” ries are changed to create a large region in which the
Many changes in system design that improve reliability system can function.
do so by moving the physical failure modes. In fact, we A further advantage of the “failure-mode avoidance”
argue that the most significant improvements in reli- approach is that it reduces the salience of so-called
ability come about by this means. Although this ap- “specified operating conditions.” Such a set of specified
proach can be integrated with probability theory, it is operating conditions is an approximation that helps to
not necessary to use probability theory to understand guide concept selection. At an early stage of system
how these design changes bring about their effects. development, one cannot reasonably define a complete
IMPROVING SYSTEM RELIABILITY BY FAILURE-MODE AVOIDANCE 247

Figure 1. Two ways of thinking about uncertainty—probabilistic and naturalistic. Medical doctors were asked questions in two
different formats. Their answers are graphed here as dots and the correct answer is annotated. These results suggest that the more
naturalistic formulation leads to far more accurate judgments by professional practitioners [adapted from Gigerenzer and Edwards,
2003].

set of conditions that a system is likely to experience in this as quickly and economically as one can manage
its lifecycle. Although an approximate set of conditions within the time available. This implies that systems
can be defined, it will surely miss some important engineers should not spend much energy on predicting
combinations of conditions. Later on, these unantici- field reliability but instead use that same energy to
pated operating conditions may arise and the system increase field reliability [Clausing, 1994].
may cease to function. When this happens, it is tempting It seems that the creative design work that leads to
to say that, since the condition was not specified, the reliability improvement is a very natural activity and is
system did not actually fail—that the system was mis- consistent with our “failure-mode avoidance” concep-
used. It is essential for systems engineers to recognize tion of reliability.
that nature does not care what systems engineers think We propose that thinking of reliability as failure-
the “specified operating conditions” are. When the sys- mode avoidance can have real advantages, especially in
tem fails to function under the conditions the system the early stages of system design or in a long-term
actually experiences, that constitutes a failure. This scenario such as technology development. In early
point is well understood by some reliability engineers. stages of system design, probability theory may be too
For example, Thomas, Ayers, and Pecht [2002] discuss quantitative for the task at hand. Probability density
“trouble not identified” warrantee returns in the auto functions imply a level of precision in modeling the
industry and conclude: “[I]t must not be assumed that scenario that is often unwarranted, especially during
a returned module that passes tests associated with an early development. As a project advances through its
engineering specification is good,” p. 650. Because of development stages the probabilistic view of reliability
uncertainty regarding specified operating conditions, becomes increasingly useful. Analysis of reliability us-
we argue that an effective approach is to increase the set ing probability theory is useful for component selec-
of conditions under which the system operates and do tion, system validation, and the management of
248 CLAUSING AND FREY

field-service operations. The value of the failure mode on target. Taguchi’s method employs orthogonal arrays
avoidance conception of reliability is greatest for tech- to explore the design space. At the same time, outer
nology strategy, systems architecting, concept design, arrays or compounded noises are used to explore the
and for some robust parameter design activities, all range of possible operating conditions. Signal to noise
done early during the development of the system. ratios are used as measures of the robustness of the
system and guide the engineer to preferable levels of
the control factors.
2. REVIEW OF RELATED WORK Taguchi’s philosophy of robust design is consistent
This paper is intended to help engineers with the early- with the approach to reliability engineering discussed
stage, conceptual phase of design. Therefore, an impor- here. Taguchi rejected the “goal post” mentality inher-
ent in tolerance limits and specifications. His notion of
tant related development is the Theory of Inventive
a quality-loss function replaced consideration of defect
Problem Solving (sometimes described by the acro-
rates and process yields with an emphasis on reducing
nyms TRIZ or TIPS). The theory was first described by
variance followed by adjustment to target. Taguchi en-
Altschuller [1984] and was recently placed in a broader
couraged engineers to deliberately expose designs to
context of innovation by Clausing and Fey [2004]. The
harsh conditions in experiments. To do this requires a
theory is based on a study of thousands of patents that
transformation in the culture of an engineering organi-
revealed patterns among inventive solutions. An impor-
zation. The emphasis must shift from demonstrating
tant underlying hypothesis is that inventive problems
adequate performance with high statistical confidence
can be viewed as conflicts which the inventive solutions
to aggressive improvement followed by adequate con-
resolve. This enabled large numbers of patents to be
firmation.
organized in a useful taxonomy. It has also given rise to
Robust parameter design is among the most impor-
commercial software products that facilitate the use of
tant developments in systems engineering in the 20th
the theory by professional practitioners. However, we century. These methods seem to have accounted for a
note that many patents claim robustness as their primary significant part of the quality differential that made
advantage—they do not deliver new functions, but de- Japanese manufacturing so dominant during the 1970s.
liver existing functions over a broader range of condi- The methods were subsequently adopted outside of
tions. While TRIZ is helpful in development of new Japan. The timing of that adoption in the West corre-
functions and elimination of harmful side effects, it sponded closely with improvement in quality that im-
does not seem to support reliability innovations to the proved competitiveness of North American and
extent we desire. Therefore, this paper analyzes patents European manufacturers. Robust design methods were
and seeks new patterns of inventive engineering work. surely a significant part of both the rise of Japanese
A development in reliability engineering closely industry and the response to that competitive challenge.
related to this paper is the “physics-of-failure” (PoF) Robust design methods have continued to be refined
approach developed at the Computer Aided Life Cycle and are still an active area of systems engineering
Engineering (CALCE) Electronic Products and Sys- innovation.
tems Center at the University of Maryland. The first Another approach relevant to this paper known as
instance in archival literature of the term “physics of “operating window methods” was developed and prac-
failure” is Pecht et al. [1990], which emphasizes use of ticed at Xerox Corporation in the 1970s. The operating
a physics-based model for reliability prediction and window is the set of conditions under which the system
design for reliability. This approach has been extended operates without failure. In operating window methods,
to product development by Pecht and Desgupta [1995] reliability is improved by making the operating window
and to accelerated life testing by Kimseng et al. [1999]. larger. Clausing [2004] described the approach in detail
This paper builds upon the conception of physics-of- in a recent issue of Technometrics, but the essence of
failure and seeks to extend this conception to the earli- the approach is simple enough to present here:
est, creative phases of system design.
An important development in reliability engineering 1. Increase the value of the noise factors so that the
is robust parameter design pioneered by Genichi failure rate is high.
Taguchi [Taguchi, 1993]. For any design concept, there 2. Change the value of the control factors to seek a
is a potentially large space of control factor settings that broader operating window at a fixed failure rate.
will nominally place the function at the desired target
value. In robust parameter design, the engineer explores This approach was used, for example, to improve the
the design space seeking changes that will make the reliability of paper handling machines. At Xerox, paper
system more robust while still keeping the performance stacks were designed and constructed to deliberately
IMPROVING SYSTEM RELIABILITY BY FAILURE-MODE AVOIDANCE 249

produce a large magnitude of variation. The papers of the system is even more critical for early stage system
varied in their weight, surface condition, geometry, and design than it is for later stage parameter design.
so on. These paper stacks were similar to the worst As discussed in this section, the basic concept of
stacks one would encounter in field use, and, in con- operating windows is to seek a larger set of conditions
junction with operation near the limit of the operating under which the system functions. While the idea is
window, they brought about higher failure rates than very simple, implementation is challenging, requiring
would normally be encountered, on the order of 1 in 10 deep knowledge of the system and the creativity to
rather than 1 in 10,000. These high failure rates enabled develop the needed design innovations. This paper
the engineers to more quickly discern the effect of seeks to help engineers implement early stage robust-
changes in failure rate with changes in the control ness work via operating window methods. The next
factors such as stack forces, feed belt angles, and so on. section covers some theoretical developments. The sub-
This approach worried managers since they observed sequent sections present specific strategies for imple-
the machines jamming with high frequency, but they mentation.
eventually came to understand why this was needed. As
a consequence the engineers were able to quickly con-
verge to more reliable machine configurations. 3. OPERATING WINDOWS AND FAILURE
Despite the use of failure rates as a measure of MODES FORMALIZED
performance, the operating window method is, upon
This section develops a formal treatment of operating
closer examination, consistent with Taguchi’s quality
windows and failure modes. The details developed here
philosophy. Because failure rates were greatly in-
are not regarded by the authors as necessary for imple-
creased by applying aggressive noises, improvements
menting the four strategies presented in this paper. The
could be made rapidly, even though they sacrificed the
formal framework may, however, justify the approach
ability to accurately predict field reliability. The term
“operating window” may seem to imply an emphasis and will be helpful to those who seek a deeper under-
on goal posts, but in fact the “customer-specified” limits standing of the strategies. However, those readers who
are viewed as irrelevant and the expansion of actual are primarily interested in the operational aspects could
physical limits is valued instead. skip to Section 4.
Operating window methods continue to be an active To formalize the idea of operating windows, it is
area of research in quality engineering. Joseph and Wu helpful to define failure modes mathematically. A fail-
[2004] showed that under certain conditions a failure ure-mode criterion is an inequality that applies to a
rate of 50% maximizes the information gained from functional response of a system Yi(X, Z) > Li or Yi(X,
robust design using an operating window. As an exam- Z) < Ui. The criteria are defined such that, if the criteria
ple, they carried out a case study wherein line width in are satisfied, the failure will not occur. The inputs X and
a lithography process set at a much finer pitch than Z are vectors of physical variables in the engineering
actually needed in practice. The control factor settings system. The physical variables are sorted into two types,
that improved the robustness at the finer pitch also not necessarily disjoint—noise factors Z and control
improved the robustness at the pitch needed in opera- factors X. The control factors are variables the designer
tion. The basic concept of operating windows was may change during the parameter-design phase of sys-
therefore further corroborated. tems engineering. The noise factors are physical vari-
While retaining the benefits of Taguchi’s quality ables that vary in the environment, manufacture, or
philosophy, operating window methods may have a lifecycle of the system. Yi is a functional response of the
further advantage. In operating window methods, the system and the mapping Yi(X, Z) describes the physical
progress in reliability is measured in physical terms by or logical process by which the system responds to the
the size of the operating window. This may be prefer- control and noise factors. Li and Ui are lower and upper
able to measuring results with a more abstract measure limits on a response defined so that exceeding that limit
such as signal to noise ratios. For example, operating constitutes a system failure.
window methods encouraged engineers at Xerox to To illustrate these ideas, consider a jet engine. A
devise ways to double the range of paper weights the functional response of an engine is the thrust it devel-
machine could feed rather than contemplate how to ops. If thrust were to fall below some prescribed limit,
increase signal to noise ratios by 6 decibels. As pre- we could define that condition as a failure. The thrust
viously discussed, cognitive psychology suggests there is affected by control factors such as the chord of the
is an advantage in maintaining a connection to physical fan blades. The thrust is also affected by noise factors
quantities rather than probabilistic measures. We pro- such as the inlet temperature and angle of attack of the
pose that a mental connection to the physics and logic free stream into the engine inlet. A reliable engine is
250 CLAUSING AND FREY

designed so that the thrust is within acceptable limits flow at low temperatures. In this paper, a two-sided
over a wide range of the noise factors. failure mode is necessarily governed by a single set of
To make these ideas operational, we have found it failure-mode physics.
necessary to introduce a distinction between two types In the presence of a two-sided failure mode, robust
of failure modes—one-sided and two-sided failure parameter design is critical. Figure 2 depicts a two-
modes [Clausing and Frey, 2004]. A one-sided failure sided failure mode applied to a response. The operating
mode is a functional response and the associated physi- conditions give rise to a variation in the functional
cal process Yi(X, Z) with either a lower or upper limit response; therefore, the response has a probability dis-
but not both. A common one-sided failure mode is tribution p(Yi). In the scenario on the left side of Figure
plastic deformation of a material. When plastic defor- 2, the variability is so wide that it cannot be accommo-
mation is unacceptable or reaches a prescribed limit, the dated within the limits between the failure mode
designer will define that as a failure. Plastic deforma- boundaries. If robust parameter design were applied,
tion often occurs when a level of stress is exceeded, so the sensitivity of the response would be reduced, result-
the failure criterion would naturally fit the form Yi(X, ing in a tighter distribution of the response enabling
Z) < Ui where Yi denotes stress in physical units such both sides of the failure mode to be avoided. Thus,
as pounds per square inch. If there is no parallel failure robust parameter design is essential in the presence of
mode for low values of stress, then it is most natural to two-sided failure modes and, indeed, much of the re-
think of plastic deformation as a one-sided failure search in robust design is oriented toward scenarios
mode. with two-sided failure modes. This paper by contrast
A two-sided failure mode is a functional response concerns itself primarily with single-sided failure
and the associated physical process Yi(X, Z) with both modes, which seem to admit a wider range of robust
a lower and an upper limit. Two-sided failure modes are design approaches.
frequently found in measurement or metering functions It is common for a single noise factor to be limited
within a system. If a measuring system is inaccurate, from above and below by two different physical failure
the designer will regard it as a failure when the readings modes. Here, this situation is characterized as an oper-
are too high or too low compared to the true quantity, ating window between two one-sided failure modes
so the failure criterion would naturally fit the form Li < rather than a two-sided failure mode. To illustrate the
Yi(X, Z) < Ui where Yi denotes, for example, measure- difference, consider fluid metering again. It is possible
ment error in physical units such as volts. that an upper limit on the noise factor of temperature is
Note that, given the definitions here, a two-sided set by the physical process of a boiling while the lower
failure mode is driven by the same physical process limit on temperature is set by the previously discussed
description Yi at both the high and low failure-mode increase in viscosity with reduced temperature. It there-
boundaries. Thus, a single noise factor like ambient fore seems more natural to consider two failure modes
temperature can be limited from above and below by a governed by two different functional responses, Yi(X,
single physical phenomenon. For example, a fluid me- Z) < Ui and Yi+1(X, Z) > Li+1. The difference here is
tering system may operate in a limited temperature reflected in the fact that the two responses have different
range due to the fact that the fluid viscosity is a function indices. In theory this seems minor, but in practice we
of temperature. This single physical phenomenon of regard this as highly significant. Robust parameter de-
temperature dependence of viscosity may make too sign might still be applied with success, but it seems
much fluid flow at high temperatures and too little fluid that other approaches will also be applicable. All of the

Figure 2. Robust parameter design accomplishes failure-mode avoidance in the presence of two-sided failure modes.
IMPROVING SYSTEM RELIABILITY BY FAILURE-MODE AVOIDANCE 251

four strategies presented here illustrate specific alterna- This theorem is mathematically straightforward. Re-
tives for such cases. liability is traditionally defined as the probability of
Now that we have defined failure-mode criteria of failure where probability and probability density are
various types, we may define the operating window defined so that integration over a set gives a probability.
formally. The operating window is the set of noise Since probability density cannot be negative, integrat-
conditions Z that satisfy the full set of failure-mode ing over a set must give a larger or equal probability
criteria: than integrating over any of its subsets. Although
mathematically basic, the theorem may be important in
 Yi(X, Z) ≥ Li for all i with lower limited  system design due to its practical implications. The
  onewsided failure modes  probability density function of the noise factors is gen-
  
 Yi(X, Z) ≤ Ui for all i with upper limited  erally known only very approximately. If changes can
Z  onewsided failure modes  be identified that meet the conditions of the theorem
  
 Li ≤ Yi(X, Z) ≤ Ui for all i with  above, then reliability can be improved in spite of our
  twowsided failure modes  ignorance about the probability density function of the
   noise factors.
To simplify the notation, we will compress this down A graphical illustration of this theorem is Figure 3
to {Z|Yi(X, Z) ∈ Wi for all i} where Wi therefore defines in which one axis represents a single noise factor and
a window, which may be one-sided or two-sided. another axis represents a single control factor. Two
A goal of systems engineering is to make the system different functional responses define constraints within
robust by adding more points to this set. Given this the space defined. At the initial setting of the control
concept of failure-mode criteria and operating win- factor X1, there is an operating window. A change is
dows, it is possible to identify design changes that made in the control factor setting making it X 1g . Since
improve reliability without any recourse to probability. the new range of the noise factor Z1 completely contains
The development principle is to add points to the oper- the old range of Z1, the operating window has been
ating window as rapidly as possible. The theorem below increased and reliability has been improved.
formalizes this concept for parameter design. It is instructive to consider coupling among failure
modes. In pursuing robustness to the ith failure mode,
the designer may consider changing the value of control
Operating Windows and Parameter Design -- If factor Xk. If a change in Xk that affects the set satisfying
the design parameters of a system are changed from the ith failure-mode criterion also changes the set satis-
X to X′ and the new operating window holds the old fying the jth failure-mode criterion, then we say that
operating window as a subset {Z|Yi(X, Z) ∈ Wi for failure-mode criteria i and j are coupled by control
all i} ∈ {Z|Yi(X′, Z) ∈ Wi for all i}, then reliability factor Xk. This definition of coupling is consistent with
has improved. the definition of coupling among equations in mathe-
matics [Borowski and Borwein, 1991]. The definition

Figure 3. Robust parameter design can accomplish failure-mode avoidance in the presence of multiple one-sided failure modes.
252 CLAUSING AND FREY

is also similar to the definition of coupling in Axiomatic a graphical depiction of a two-dimensional operating
Design [Suh, 1990] except that coupling occurs among window formed between three one-sided failure-mode
failure modes rather than functional requirements. It criteria. A key distinction between Figure 3 and Figure
should be evident that the two failure modes in Figure 3 4 is that in Figure 4 two noise factors are represented
are coupled by control factor X1. In this instance, how- rather than one. In addition, no control factors are
ever, the coupling is not such that it negatively affects represented using an axis. Instead Figure 4 represents
the robust parameter design process. The theorem’s the operating window at a two distinct design configu-
conditions are satisfied and reliability improvements rations X and X′. The shape and size of the window can
may proceed despite the coupling. However, it should vary with the design parameters. A useful goal, as
also be clear that when failure modes are not coupled, before, is to add points to the operating window without
robust parameter design may be simpler to accomplish. removing any points. This condition holds in Figure 4,
In the absence of coupling, any control factor Xi so the change in design will improve the system’s
affects at most one failure-mode criterion. Once the reliability.
direction of the dependence is determined, the operat- The theorem discussed previously applies to pa-
ing window can be increased by sequentially maximiz- rameter design, but the idea depicted in Figure 4 can be
ing or minimizing the size of the set as a function of that readily extended to conceptual design in which not only
single control factor. This is frequently accomplished the control factors are changed, but the functional re-
by driving the value of the control factor to its technical sponse of the system is modified as well. All that is
or architectural limits. An example of this is found in required is the idea that the functional response itself
paper-feeding machinery. A higher friction coefficient can be varied as well as the control factors.
of the feed rolls helps to prevent misfeeds and does not
particularly encourage multifeeds. For this reason, de- Operating Windows and Conceptual Design -- If
velopers of paper handlers worked to increase the fric- the conceptual design of a system is changed in-
tion coefficient of feed rolls as far as technically cluding a change in functional responses Yi to Y ig and
feasible. Even though these technical developments
the corresponding design parameter changes from X
improved the system, the reliability was still not suffi-
to X′ and the new operating window holds the old
cient and further improvements had to be sought. Be-
operating window as a subset {Z|Yi(X, Z) ∈ Wi for
cause of this phenomenon, in any system that is fairly
all i} ∈ {Z|Y ig (X ′, Z) ∈ Wi for all i}, then reliability
mature, it is common for the parameters that do not
has improved.
couple multiple failure modes to be set near their physi-
cal or architectural limits. Since consideration of un-
At the earliest stages of system design when our
coupled parameters is straightforward, much of the
attention in systems engineering is therefore directed to latitude to make changes is greatest, it is these types of
dealing with parameters that are coupled to multiple conceptual changes that are most critical to find and
failure modes. implement. Although robust parameter design has been
It is often necessary to consider the operating win- a valued development in systems engineering, large
dow with respect to two or more noise factors simulta- changes in system reliability observed over time cannot
neously. This requires a representation of be explained by parameter design alone. As an example,
multidimensional failure-mode boundaries. Figure 4 is vehicles sold today are far more reliable than those that

Figure 4. Robust design with a two-dimensional operating window.


IMPROVING SYSTEM RELIABILITY BY FAILURE-MODE AVOIDANCE 253

were sold 30 years ago. The majority of that reliability defined in Section 3. Such control factors should be
improvement is due to the scores of system design and maximized or minimized to create the greatest possible
technological changes made over these decades. Elec- distance from the affected one-sided failure mode con-
tronic spark timing replaced the distributor. Fuel injec- sistent with any constraints on the control factor. As the
tion systems replaced carburetion. In addition there system is placed under greater demands over time due
were many other less widely known innovations that to system evolution and competition, the operating
created large improvements in reliability. We propose window afforded under the current system constraints
that at an early stage of system design, many design may become insufficient. Under these circumstances,
opportunities exist that meet the criteria of the theorems the constraint can often be relaxed by making changes
presented here. Much of early stage, conceptual reli- in the system architecture or by changes in technology.
ability engineering can therefore be undertaken without The relaxed constraint enables further changes to the
any probabilistic modeling freeing up engineers for uncoupled control factor, which opens the operating
deep thought about patterns of innovation in reliability window.
engineering. This is a principal message of this paper, Primary Case Study—Paper Feeder. As an indus-
and it will be emphasized by presenting specific strate- trial example, we present the Xerox paper feeder that
gies for carrying out this suggestion. first went into production in 1981, and has appeared in
many different Xerox copiers and printers. This paper
4. FOUR STRATEGIES FOR IMPROVED feeder is known as a friction-retard feeder (Fig. 5).
ROBUSTNESS The feedbelt rests on the paper stack, and drags the
top sheet forward. The friction of the retard roll holds
Up to this point, this paper has focused on the interre- back (retards) the second sheet if it tries to come
lated concepts of reliability, robustness, and one-sided through. Thus, the retard roll prevents multifeeds (feed-
failure modes. From this point forward, the paper con- ing of more than one sheet). Therefore, the wrap angle
centrates on strategies to avoid one-sided failure modes. between the feedbelt and the retard roll only affects the
All of these strategies involve concept design rather failure mode of multifeeds. The other primary failure
than parameter design. The design changes considered mode is misfeeds (no sheet is fed). This failure mode is
here are not only changes in the values of design pa- not affected by the wrap angle between the feedbelt and
rameters but also additions of new features or compo- the retard roll. Because multifeeds are reduced by a
nents, changes in the configuration of the system, or large wrap angle and misfeeds are unaffected, it is clear
even new inventions. We present four strategies along that the wrap angle should be as big as possible.
these lines: Despite the desirability of having a large wrap angle,
the previous-generation feeder (ca. 1975) had a wrap
1. Relax a constraint limit on an uncoupled control angle of only 13°, which was constrained by the system
factor. architecture. In the new design that first went into
2. Use physics of incipient failure to avoid failure.
3. Create two distinct operating modes for two dif-
ferent demand conditions.
4. Exploit interdependence between two operating-
window system variables.

To illustrate these strategies and demonstrate their


versatility, we present two different example applica-
tions of each strategy, a primary example that is de-
scribed in considerable detail and a supplementary
example that is described in less detail. Two engineer-
ing domains are used throughout—paper feeders and
jet engines. The next four subsections present these
strategies.

4.1. Relax a Constraint Limit on an


Uncoupled Control Factor
A control factor that affects only one of the one-sided Figure 5. Friction-retard feeder, U.S. Patent #4,475,732
failure modes in a system is said to be uncoupled as [Clausing et al., 1984].
254 CLAUSING AND FREY

Figure 6. The architecture on the left has a nearly linear paper path, U.S. Patent # 3,390,725 [Jones and Van Deluyster, 1976]. A
newer architecture on the right has a looping paper path, which enabled a larger wrap angle, U.S. Patent # 4,475,732 [Clausing
et al., 1984].

production in 1981 the wrap angle was increased to 45°. case of wrap angles in paper feeders, innovation en-
This large improvement in wrap angle was enabled by abled a critical parameter to be pushed past its previous
a change in the total system architecture. In large copi- constraints to move a one-sided failure-mode boundary
ers and printers the next subsystem after the paper and increase the operating window.
feeder is the registration subsystem, which aligns the Summary of the Strategy. When a system variable
sheet with the image. In the new design the architecture only affects one of the one-sided failure modes, take its
was changed so that the paper came out of the feeder value to its constraint limit. If the operating window is
and turned down to reach the registration subsystem still not large enough, seek new architectures or tech-
(Fig. 6), which was underneath the feeder. This enabled nologies that relax the constraint.
the wrap angle to be greatly increased. This architecture
also reduced the width of the copier/printer, which is 4.2. Use Physics of Incipient Failure To
desirable. This paper feeder with the large wrap angle Avoid Failure
has been very successful in many generations of Xerox
copiers and printers. In some systems the physics of the incipient failure can
Supplementary Case Study—Jet Engines. A be used to prevent or delay the failure mode. All one-
similar approach was used to improve the reliability of sided failure modes are associated with underlying
axial-flow fans in jet engines. A fan is a component of physical phenomena. In many cases the failure mode
modern high by-pass commercial jet engines that pro- exhibits distinct physical mechanisms that become ac-
vides a significant increase in the total mass flow, and tive as the onset of the failure mode is approached. In
therefore improvement in propulsive efficiency. A criti- some systems there exists an opportunity to exploit the
cal failure mode of such fans is flutter vibration due to physics of incipient failure to increase the size of the
the length of the blades and their exposure to inlet flow operating window.
distortions. It had long been known that increasing the Primary Case Study—Jet Engines. An example is
chord of a fan blade stiffened the blade and thereby afforded by the use of shaped grooves in compressor
reduced the incidence of the failure mode of flutter, but casings in modern jet engines. An axial flow compres-
the chord of the blade was limited by constraints on sor is comprised of multiple alternating stages of rotor
weight [Koff, 2004]. Eventually, new technologies for assemblies and stators. To limit engine complexity and
manufacturing hollow blades enabled engine manufac- weight, a large pressure rise per stage is desired so that
turers to increase chords significantly without added the desired pressure rise in the compressor can be
weight. For example both Patent #4,345,877 [Monroe, accomplished with a small number of stages. However,
1980] and Patent #4,720,244 [Kluppel and Monroe, the pressure increase of each stage is limited by a failure
1987] contributed to these advances. Wide-chord fans mode of aerodynamic stall and surge. A stall involves
provided much greater resistance to flutter and have separation of airflow from a blade, which at any given
thereby greatly improved engine reliability. As in the time may affect only one stage or even a group of stages.
IMPROVING SYSTEM RELIABILITY BY FAILURE-MODE AVOIDANCE 255

Figure 7. The arrangements of slots in an axial flow compressor. Adapted from U.S. Patent #4,086,022 [Freeman and Moritz,
1978].

A compressor surge generally refers to a complete flow [Freeman and Moritz, 1978], a series of angled channels
breakdown throughout the compressor. The value of are placed in the casing of the compressor extending
airflow and pressure ratio at which a surge occurs is from the leading edge of the rotors and extending just
termed the “surge point” and “surge margin” is a term aft of the trailing edge (see Fig. 7). If a surge begins to
for the difference between the airflow and compression occur, then “a rotating annulus of pressurized gas will
ratio at which it will normally be operated and the begin to build up about the tips of the blades”. Because
airflow and compression ratio at which a surge will of the geometry of the slots, “the annulus of air will be
occur. Thus, we can readily interpret surge margin as directed into the slots … thus reducing or eliminating
the distance from the one-sided failure mode of com- the surge” [Freeman and Moritz, 1978, p. 5].
pressor surge. To understand how the casing treatments are related
In the late 1970s new technologies known as “casing to the operating window, it is useful to consider Figure
treatments” were developed. In one casing treatment 8 adapted from Cumpsty [1997]. The abscissa in the
technology assigned to Rolls Royce, Patent #4,086,022 figure is mass flow of air into the engine. The mass flow

Figure 8. The effect of casing treatment on surge of jet engine compressors [adapted from Cumpsty, 1997].
256 CLAUSING AND FREY

in an engine may vary due to changes in inlet conditions rate will be excessive. If the stack force is too small, the
caused by atmospheric conditions or aircraft maneu- misfeed rate will be excessive. Therefore, there is an
vers; therefore, mass flow is a noise factor as defined in operating window between these two one-sided failure
Section 3. The ordinate in Figure 8 is pressure rise modes (Fig. 9).
across a stage of the compressor. When conditions are When the range of papers is moderate, it is easy to
at their nominal state, the engine will generally remain develop a sufficient operating window so that both the
on the operating line with mass flow and pressure rise multifeed rate and the misfeed rate are very small.
both changing as a function of the throttle position set However, for the large range of papers that are typically
by the pilot. At a fixed throttle position, when mass flow used in large production copiers and printers, it is very
is reduced due to maneuvers or environmental condi- difficult, or impossible, to develop a sufficient operat-
tions, the state of the engine moves toward the surge ing window, as shown on the left of Figure 9.
line as indicated in step 1 of Figure 8. This pushes the On the left hand side of Figure 9, it is evident that no
engine off the operating line and toward the failure- single value of stack force will simultaneously avoid
mode boundary. The amount of mass-flow drop that can both multifeeds and misfeeds over the full range of
be tolerated before failure (step 3a or step 3b) is some- paper weights. This was still true after robust parameter
times called the “surge margin” which we interpret as design had been completed, so there was little hope to
an indication of the operating window size. The tech- improve it further beyond the great improvement that
nology described in Patent #4,086,022 can be viewed had already been achieved.
as a means to exploit the incipient failure-mode physics The problem was resolved through the development
(the rotating annulus of air—step 2) to increase the of a “stack force relief/enhancement” technology, U.S.
surge margin. The treatments are designed so that the Patent # 4,561,644 [Clausing, 1985]. This technology
incipient physics will lead to a pressure relief across the uses two different values of the stack force, a small
stage (step 3b). The advanced casing treatment “in- value for most papers, and a larger value for heavy
creased fan stall margin by a staggering 20% under papers (as depicted on the right side of Fig. 9). Under
distorted inlet flow and with little loss in efficiency.” normal conditions, the stack force is set to the small
[Koff, 2004, p. 582]. value. For most common paper weights this works very
Supplementary Case Study—Paper Feeder. A reliably. If a larger paper weight is used, a misfeed
similar approach was used to improve the reliability of condition may begin to emerge. A sensor near the retard
paper feeders. For friction-retard paper feeders, the roll is designed to sense the arrival of the lead edge of
stack force between the feedbelt and the paper stack is the sheet. If an incipient misfeed occurs, the paper will
a critical system variable. If it is too large the multifeed not arrive within the desired time period. Under this

Figure 9. Operating window for friction retard paper feeder.


IMPROVING SYSTEM RELIABILITY BY FAILURE-MODE AVOIDANCE 257

condition, the stack force is increased to the large value.


This was done by energizing the solenoid 90 in Figure
5, which pushed the feeder around the pivot 11, thus
increasing the stack force. Thus, the machine was able
to reliably feed the full range of paper weights.
Summary of the Strategy. Exploit the physical
mechanisms associated with an incipient failure to off-
set the failure mode, thereby increasing size of the
operating window. Figure 10. Two failure modes, one system variable. (Cross-
hatched region is negative operating window—no safe
range.)
4.3. Employ Two Different Operating
Modes
In some cases, the development process reaches a state The inventive process that led to this invention is
in which the system has a limited operating window well described in terms of the theory of inventive prob-
between multiple one-sided failure modes and therefore lem solving (TRIZ). The TRIZ process generally begins
cannot operate reliably. In such cases, it is often advis- by framing the current problem as a conflict. In this
able to change from a single operating mode to two case, there was an engineering conflict between avoid-
operating modes. Separately designing two distinct op- ing multifeeds and avoiding excessive wear. In TRIZ,
erating modes enables significant design freedom to one effective way to seek a conflict resolution is through
seek better resistance to the failure modes. This strategy “Sufield” or “substance-field” analysis [Clausing and
is often similar to the strategy “use physics of incipient Fey, 2004]. Simple Sufield diagrams are in the form of
failure to avoid failure” and in fact the two strategies a triad. The relevant triad diagram for the retard-roll
can overlap. However two key distinctions should be problem is shown in the left hand side of Figure 11.
made: (1) Incipient failure-mode physics do not always Here substances are (1) the paper and (2) the roll/shaft.
lead to clearly distinct operating modes, and (2) the The field is the contact force. TRIZ includes many
switch between two modes need not be cued by incipi- standards for the creative revision of the Sufield. One
ent failure physics and can instead be cued by operator of the standards is: “To enhance the effectiveness of the
inputs or state variables of the system. Sufield, transform one substance into an independently
Primary Case Study—Paper Feeder. A failure controlled Sufield, thus generating a chain Sufield,” p.
mode of friction retard paper feeders (Fig. 5) is exces- 112. This can be implemented by introducing a field
sive wear of the retard roll. In previous designs the roll between the retard roll and its shaft (as shown in right
had been rotated approximately once per hour to dis- hand side of Fig. 11).
tribute the wear over the entire roll. Nevertheless, the This is as far as Sufield analysis will take us. Now
wear was excessive, and was a considerable expense in we have to use science and art to identify a field and a
service cost and lost production of the copier/printer. component for creating the field that will open an
The critical variable that determines the wear of the operating window. One such approach is to insert a
retard roll is the force between the feedbelt and the friction brake with a brake torque T into the design to
retard roll, F, multiplied by the contact distance D produce a field between the retard roll and its shaft (U.S.
between the feedbelt and the retard roll. The product, Patent 4,475,732). This field creates the possibility of
FD, is the work that the retard roll can do to remove two distinct operating modes: (1) When the torque that
energy from the second sheet, and thus stop the second is applied to the roll is less than T, the roll remains
sheet. However, this is also the work that causes wear stationary, and (2) when the torque that is applied to the
of the retard roll. roll is greater than T, the roll rotates.
The result is as shown in Figure 10. With the pre- The torque that is applied to the retard roll is pro-
vious design, one system variable FD has control of duced by the friction from the belt or the paper, which-
both of the one-sided failure modes, excessive mul- ever is contacting the roll. When one sheet of paper is
tifeeds and excessive wear of the retard roll. Maurice between the roll and the feedbelt, the friction coefficient
Holmes at Xerox recognized that this problem could be has a value of 2, which overcomes the brake torque.
resolved through a redesign of the retard mechanism by Therefore, the roll rotates, and there is not any wear.
adding a second operating mode. The innovation was When two sheets of paper are between the roll and the
included in the advanced paper feeder that first went feedbelt, the friction coefficient is 0.6, and the brake
into production in the Xerox 1075 copier in 1981, torque prevents rotation of the retard roll. Thus the
Patent # 4,475,732 [Clausing et al., 1984]. second sheet is stopped.
258 CLAUSING AND FREY

vary), it is a challenge to maintain the combustion


conditions in the small operating window between the
failure modes. In the 1970s a new technology called
“two-zone” or “staged” combustion substantially in-
creased the operating window by affording multiple
operating modes [Markowski, Lohmann, and Reilly,
1976; Lefebvre, 1999]. When the demand for thrust is
Figure 11. Sufield diagrams for retard roll. low, all the combustion takes place in a single “primary
zone.” When thrust demands are highest, the engine
automatically switches to a mode in which combustion
The addition of the new operating mode created an
occurs in two different zones each of which is function-
additional design parameter “brake torque” which sets
ing within the operating window between the CO and
the condition for the switch between the two modes.
NOX related failure modes. This technology has been
Thus, the design space expands from a 1-D operating
developed through many inventions including Patent
window to a 2-D operating window (Fig. 12). If the
#4,052,844 [Caruel, Quillevere, and Gastebois, 1977]
brake torque is set to an appropriate value, the retard
and has become popular especially in gas turbine en-
roll will only rub against the paper when the incipient
gines for ground based power [Washam, 1983]. As in
multifeed condition actually occurs. In this case, the
the case of the paper feeders with a friction brake, the
excessive-wear failure-mode boundary is never active
system automatically switches between two modes of
and a new failure mode (paper damage) becomes the
operation in order to increase the operating window
limiting factor on parameter FD, leaving a greatly in-
between two coupled one-sided failure-mode bounda-
creased operating window.
ries.
Supplementary Case Study—Jet Engines. A
Summary of Strategy. When it is not possible to
similar approach was used to simultaneously avoid two
simultaneously avoid two one-sided failure modes due
one-sided failure modes associated with combustion in
to a wide range of noise values, consider defining two
jet engines. A combustor is a part of a jet engine in
distinct operating modes so that at least one of the
which fuel is injected into the air stream, mixed with
failure modes will be moved to increase the size of the
air, and burned. Two key failure modes of a combustor
operating window.
are concerned with the composition of the exhaust gas,
which is tightly regulated to protect the environment.
4.4. Identify and Exploit Dependencies
One failure mode is excessive production of carbon
Among Failure Modes
monoxide (CO), which occurs with an overly lean
mixture and low temperature in the combustion zone. In the operating-window approach, the parameter space
Another failure mode is excessive production of oxides is sketched out and the failure mode boundaries are
of nitrogen (NOX), which is associated with overly high identified. In the sketch, it is often the case that the
temperature in the combustion zone. Given the changes parameters associated with the axes are not inde-
in the thrust demands (and many other parameters that pendent. A small change induced in one parameter will

Figure 12. Operating window for improved retard-roll design.


IMPROVING SYSTEM RELIABILITY BY FAILURE-MODE AVOIDANCE 259

pressure Pp. The gas then flows through each of the


many first stage blades. The purpose of this flow is to
cool the surface of the blades and thereby avoid the
failure mode of early blade oxidation.
To apply operating-window methods to this sce-
nario, one may first sketch the parameter space and the
failure-mode boundaries. Figure 14 depicts a highly
simplified window with just two failure modes, oxida-
tion of blade #1 and oxidation of blade #2. Manufactur-
ing variation may excite failure mode #1 (oxidation of
blade #1) if its flow passages are constricted causing m1
to drop. However, the schematic diagram of Figure 14
suggests that there is a dependency among the failure
modes. Any small drop in m1 tends to cause a rise in
plenum pressure and a resulting rise in m2. The reverse
is also true—any small drop in m2 tends to cause a rise
in plenum pressure and a resulting rise in m1. This
interdependency of the failure modes creates an oppor-
tunity to create larger distance from both failure modes.
Turbine blades are routinely tested for their flow char-
acteristics. Sidwell proposed that this test could be used
Figure 13. Physical layout of a cooling system for a turbine
blade. to sort the blades into low flow, medium flow, and
high-flow classes. In this way, a second interdepen-
dency is added to the system. The low m1 due to the
have an associated effect on the other one. It seems clear sorting process brings about a low m2. The nature of the
that such dependencies can influence system reliability. interdependency caused by the plenum causes the two
effects to cancel (or very nearly cancel) as depicted in
What is sometimes overlooked is that they often pro-
Figure 14. Sidwell [2004] estimated that “binning”
vide an opportunity to use the dependence to stay within
turbine blades will increase the life of the high flow and
the operating window.
medium flow blades by 50% or more and would enable
Primary Case Study—Jet Engines. An example is
low-flowing blades to be used with approximately the
afforded by turbine blade cooling systems [Sidwell, same life as current engines.
2004]. The physical layout of the system is described Supplementary Case Study—Paper Feeder. In a
in Figure 13. Air from the compressor is routed to the document feeder for a copier it is highly desirable to
first-stage turbine blades. The cooling flow path in- feed from the bottom of the stack of documents. This
cludes a Tangential On-Board Injector, which brings the leaves the top of the stack free to receive the recirculated
flow from a supply at Ps into the rotating parts of the document after it has been copied. The most advanced
engine. The area between the rotating seal and the document-feeder technology uses air to move the docu-
blades acts as a plenum storing compressed gas at a ment, which minimizes damage to the document. Such

Figure 14. The failure-mode boundaries in a simplified, two-blade system.


260 CLAUSING AND FREY

given a mathematical definition of the operating win-


dow. We have shown that adding to the window in-
creases the reliability regardless of the probability
distributions of the noise factors. To this we add the
principle that this should be done early and rapidly
during the system development. In particular, concept
design changes frequently add large regions to the
operating window and account for some of the largest
improvements to reliability of systems over the course
of their development.
Figure 15. Operating window for bottom-feeding vacuum To illustrate this approach, we have described four
document feeder. strategies for increasing operating window through
concept design. Each strategy is illustrated by two case
studies, one from the field of paper feeders for copiers
feeders typically use a combination of positive air pres- and printers, and the other from the field of jet engines.
sure and negative air pressure (vacuum). The positive Each case study includes past inventions that signifi-
air pressure is used to levitate the document stack cantly improved reliability. By showing the theory and
(otherwise the weight of the document stack would tend eight case studies we have displayed both the funda-
to cause both misfeeds and multifeeds). Therefore, a mentals and the diversity of industrial applications of
sufficient pressure under the stack is required to avoid this important approach to the development of reliable
both misfeeds and multifeeds. However, excessive pres- systems.
sure under the stack could cause the last sheet to blow
away. Therefore, good system design requires an oper- REFERENCES
ating window between inadequate pressure and exces-
G.S. Altschuller, Creativity and an exact science: The theory
sive pressure, as shown in Figure 15. of the solution of inventive problems, Gordon and Breach,
The simple approach to achieve robust document New York, 1984.
feeding is to arrange a natural dependence between E.J. Borowski and J.M. Borwein (Editors), The Harper Col-
weight of the paper stack and the air pressure under the lins dictionary of mathematics, Harper Collins, New York,
stack. This is done by careful sizing of all of the flow 1991.
impedances. Thus the pressure under the stack is main- J.M. Browne, Bottom sheet feeding apparatus, U.S. Pat.
tained proportional to the stack weight without the need #4,411,417, 1983.
for any additional components. This strategy was used J.E.J. Caruel, H.A. Quillevere, and P.M.D. Gastebois, Gas
in a series of patents at Xerox which made successive turbine combustion chambers, U.S. Pat. #4,052,844, 1977.
improvements in robustness [Stange, 1977; Silverberg, D.P. Clausing, Sheet feeding and separating apparatus with
stack force relief/enhancement, U.S. Pat. #4,561,644,
1981; Browne, 1983].
1985.
Summary of the Strategy. When there are depend- D. P. Clausing, Total quality development, ASME Press, New
encies among failure modes, look for ways to use those York, 1994.
dependencies to counteract the effects of noise factors. D.P. Clausing, Operating window–an engineering measure
for robustness, Technometrics 46(1) (2004), 25–29.
5. SUMMARY D.P. Clausing and D.D. Frey, Failure modes and two types of
robustness, INCOSE Annual Symp, 2004, CD, Paper
Reliability is one of the most important characteristics number 321.
of an engineering system. Probabilistic formulations of D.P. Clausing and V. Fey, Effective innovation, ASME Press,
reliability are useful for component selection, verifica- New York, 2004.
tion testing, and field-service management. However, D.P. Clausing, M.F. Holmes, R.A. Povio, and R.P. Rebres,
at the early stages of system architecting and concept Sheet feeding and separating apparatus with stack force
design, probabilistic formulations are not as helpful. We relief/enhancement, U.S. Pat. #4,475,732, 1984.
N.A. Cumpsty, Jet propulsion: A simple guide to the aerody-
propose that thinking in terms of physical mechanisms
namic and thermodynamic design and performance of jet
of failure is much more effective and that the fundamen-
engines, Cambridge University Press, Cambridge, UK,
tal principle of reliability engineering is failure-mode 1997.
avoidance. C. Freeman and R.R. Moritz, Gas turbine engine with im-
A useful reliability-engineering concept is the oper- proved compressor casing for permitting higher air flow
ating window, which is the region in noise parameter and pressure ratios before surge, U.S. Pat. 4,086,022,
space that avoids failure modes. In this paper we have 1978.
IMPROVING SYSTEM RELIABILITY BY FAILURE-MODE AVOIDANCE 261

G. Gigerenzer, “Ecological intelligence: An adaptation for M. Pecht and A. Dasgupta, Physics-of-failure: An approach
frequencies,” The evolution of mind, D.D. Cummins and to reliable product development, J Inst Environ Sci 38
C. Allen (Editors), Oxford University Press, New York, (1995), 30–34.
1998, http://www.mpib-berlin.mpg.de/dok/full/gg/gge- M. Pecht, A. Dasgupta, D. Barker, and C. T. Leonard, The
juevm_/ggejuevm_.html. reliability physics approach to failure prediction model-
G. Gigerenzer and A. Edwards, Simple tools for under- ing, Qual Reliab Eng Int, 6 (1990), 267–273.
standing risks: From innumeracy to insight, Br Med J 327
S.S. Rao, Reliability-based design, McGraw Hill, New York.
(2003), 741–744.
1992.
H. Jones and J.W. Van Deluyster, Multiple sheet feeding
system for electrostatographic printing machines, U.S. C.V. Sidwell, On the impact of variability and assembly on
Pat. #3,930,725, 1976. turbine cooling flow and oxidation life, Ph.D. Thesis,
R.V. Joseph and C.F.J. Wu, Failure amplification method: An Massachusetts Institute of Technology, Cambridge, MA,
information maximization approach to categorical re- 2004.
sponse optimization, Technometrics 46(1) (2004), 1–12. M. Silverberg, Interrupted jet air knife for sheet separator,
K. Kimseng, M. Hoit, N. Tiwari, and M. Pecht, Physics-of- U.S. Pat. #4,275,877, 1981.
failure assessment of a cruise control module, Microelec- K.K. Stange, Air floatation bottom feeder, U.S. Pat.
tron Reliab 39(10) (1999), 1423–1444. #4,014,537, 1977.
G.E. Kluppel and R.C. Monroe, Fan blade for an axial flow N.P. Suh, The principles of design, Oxford University Press,
fan and method of forming same, U.S. Pat. #4,720,244, New York, 1990.
1987. G. Taguchi, Taguchi on robust technology development,
B.L. Koff, Gas turbine technology evolution: A designer’s
ASME Press, New York, 1993.
perspective, AIAA J Propulsion Power 18(14) (2004),
D.A. Thomas, K. Ayers, and M. Pecht, The trouble not iden-
577–595.
tified phenomenon in automotive electronics, Microelec-
A.H. Lefebvre, Gas turbine combustion, Philadelphia , Taylor
& Francis, 1999. tron Reliab 42(4–5) (2002), 641–651.
S.J. Markowski, R.P. Lohmann, and R.S. Reilly, Vorbix I.A. Ushakov (Editor), Handbook of reliability engineering,
burner: A new approach to gas turbine combustors, ASME Wiley, New York, 1994.
J Eng Power 98(1) (1976), 123–129. R.M. Washam, Dry low NOX combustion system for utility
R.C. Monroe, Axial flow fans and blades therefore, U.S. Pat. gas turbine, ASME Paper 83-JPGC-GT-13, ASME, New
#4,345,877, 1980. York, 1983.

Don Clausing received the B.S. degree in mechanical engineering from Iowa State University in 1952.
After working for nine years he again became a full-time student, and received his M.S. (1962) and Ph.D.
(1966) degrees from the California Institute of Technology (Caltech). He worked in industry for a total
of 29 years before becoming a half-time faculty member at MIT from 1986 until 2000. Starting about
1975 he has had a role in the major improvements in product development and systems engineering that
have enhanced the competitiveness of many commercial industries. This includes the publication (1994)
of his book Total Quality Development—World-Class Concurrent Engineering. He now has a new book
(2004), co-authored with Victor Fey, Effective Innovation—The Development of Winning Technologies.
Clausing has long been a leader in robust design, a key to reliable systems. During the 1970s he led in the
development of the operating-window method to achieve robustly reliable systems.

Dan Frey earned the B.S. degree in aeronautical engineering from Rensselaer Polytechnic Institute in
1987. After serving as a Naval Officer for 4 years, he earned his M.S. from the University of Colorado in
1993 and Ph.D. from the Massachusetts Institute of Technology in 1997. Since then, he has been a faculty
member conducting research in robust design, statistics, design methodology, and systems engineering.
He currently holds a dual key faculty position at MIT in the Department of Mechanical Engineering and
in the Engineering Systems Division.

You might also like