Professional Documents
Culture Documents
Springer
New York
Berlin
Heidelberg
Barcelona
Hong Kong
London
Milan
Paris
Singapore
Tokyo
Applications of Mathematics
Stochastic Models
in Reliability
With 12 Figures
m Springer
Terje Aven Uwe Jensen
Department of Mathematics Depeirtment of Stochastics
University of Oslo University of Ulm
N-0316 Oslo, Norway D-89069 Ulm, Germany
Managing Editors
I. Karatzas
Departments of Mathematics and Statistics
Columbia University
New York, NY 10027, USA
M. Yor
CNRS, Laboratoire de Probabilites
Universite Pierre et Marie Curie
4, Place Jussieu, Tour 56
F-75252 Paris Cedex 05, France
Mathematics Subject Classification (1991); 62L20, 93E10, 93E23, 65C05, 93E35, 93-02, 90C15
December 1998
Contents
1 Introduction 1
1.1 Lifetime Models . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Complex Systems . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Damage Models . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Different Information Levels . . . . . . . . . . . . . . 5
1.1.4 Simpson’s Paradox . . . . . . . . . . . . . . . . . . . 5
1.1.5 Predictable Lifetime . . . . . . . . . . . . . . . . . . 5
1.1.6 A General Failure Model . . . . . . . . . . . . . . . . 6
1.2 Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.1 Availability Analysis . . . . . . . . . . . . . . . . . . 8
1.2.2 Optimization Models . . . . . . . . . . . . . . . . . . 9
1.3 Reliability Modeling . . . . . . . . . . . . . . . . . . . . . . 10
1.3.1 Nuclear Power Station . . . . . . . . . . . . . . . . . 12
1.3.2 Gas Compression System . . . . . . . . . . . . . . . 14
References 253
Index 266
1
Introduction
This chapter gives an introduction to the topics covered in this book: failure
time models, complex systems, different information levels, maintenance
and optimal replacement. We also include a section on reliability modeling,
where we draw attention to some important factors to be considered in the
modeling process. Two real life examples are presented: a reliability study of
a system in a power plant and an availability analysis of a gas compression
system.
2
• 1 •
3
T = inf{t ∈ R+ : Xt ≥ S},
i.e., as the first time the damage hits a given level S. Here S can be a
constant or, more general, a random variable independent of the damage
process. Some examples of damage processes X = (Xt ) of this kind are
described in the following subsections.
Wiener Process
The damage process is a Wiener process with positive drift starting at 0
and the failure threshold S is a positive constant. The lifetime of the sys-
tem is then known to have an inverse Gaussian distribution. Models of
this kind are especially of interest if one considers different environmental
conditions under which the system is working, as, for example, in so-called
burn-in models. An accelerated aging caused by additional stress or dif-
ferent environmental conditions can be described by a change of time. Let
τ : R+ → R+ be an increasing function. Then Zt = Xτ (t) denotes the ac-
tual observed damage. The time transformation τ drives the speed of the
deterioration. One possible way to express different stress levels in time
4 1. Introduction
i−1
τ (t) = βj (tj+1 − tj ) + βi (t − ti ), t ∈ [ti , ti+1 ), βv > 0.
j=0
This example shows that it makes a great difference whether only the
system lifetime can be observed (aging property: IFR) or additional infor-
mation about the component lifetimes is available (aging property: DFR).
The aging property of the system lifetime of a complex system does not
only depend on the joint distribution of the component lifetimes but also, of
course, on the structure function. Instead of a two-component parallel sys-
tem, consider a series system where the component lifetimes have the same
distributions as in Proposition 1. Then the failure rate of Tser = T1 ∧ T2
decreases, whereas Tpar = T1 ∨ T2 has an increasing failure rate.
is that the paths of X are not monotone. As a partial answer, one can
respond that maintenance actions also lead to improvements and thus X
could be decreasing at some time points. A more severe criticism from
the point of view of the available information is the following. It is often
assumed that in this model the paths of the damage process can be observed
continuously. But this would make the lifetime T a predictable random
time (a precise definition follows in Chapter 3), i.e., there is an increasing
sequence τn , n ∈ N, of random time points that announces the failure. In
this model one could choose τn = inf{t ∈ R+ : Xt ≥ S − 1/n}, and take
n large enough and stop operating the system at τn “just” before failure,
to carry out some preventive maintenance, cf. Figure 1.1. This does not
usually apply in practical situations. This example shows that one has to
distinguish carefully between the different information levels for the model
formulation (complete information) and for the actual observation (partial
information).
Xt
1
S− n
t
0 τn T
FIGURE 1.1. Predictable Stopping Time
needs knowledge about the random component lifetimes Ti . Now the failure
rate λt is a stochastic process and the information about the status of the
components at time t is represented by a filtration. The model allows for
changing the information level and the ordinary failure rate can be derived
from λt on the lowest level possible, namely no information about the
component lifetimes.
The modern theory of stochastic processes allows for the development of
a general failure model that incorporates the above aspects: time dynamics
and different information levels. Chapter 3 presents this model. The failure
rate process λt is one of the basic parameters of this set-up. If we consider
the lifetime T, under some mild conditions we obtain the failure rate process
on {T > t} as the limit of conditional expectations with respect to the pre-
t-history (σ-algebra) Ft ,
1
λt = lim P (T ≤ t + h|Ft ),
h→0+ h
extending the classical failure rate λ(t) of the system. To apply the set-up,
focus should be placed on the failure rate process (λt ). When this process
has been determined, the model has basically been established. Using the
above interpretation of the failure rate process, it is in most cases rather
straightforward to determine its form. The formal proofs are, however, often
quite difficult.
If we go one step further and consider a model in which the system can
be repaired or replaced at failure, then attention is paid to the number Nt
8 1. Introduction
of system failures in [0, t]. Given certain conditions, the counting process
N = (Nt ), t ∈ R+ , has an “intensity” that as an extension of the failure
rate process can be derived as the limit of conditional expectations
1
λt = lim E[Nt+h − Nt |Ft ],
h→0+ h
1.2 Maintenance
To prolong the lifetime, to increase the availability, and to reduce the prob-
ability of an unpredictable failure, various types of maintenance actions are
being implemented. The most important maintenance actions include:
candidate for an optimal replacement time with respect to some cost cri-
terion, at least if c is “small” compared to k. How can we prove that this
random time T1 ∧ T2 is optimal among all possible replacement times? How
can we characterize the set of all possible replacement times?
These questions can only be answered in the framework of martingale
theory and are addressed in Chapter 5.
One can imagine that thousands of models (and papers) can be created
by combining the different types of lifetime models with different mainte-
nance actions. The general optimization framework formulated in Chapter
5 incorporates a number of such models. Here the emphasis is placed on
determining the optimal replacement time of a deteriorating system. The
framework is based on the failure model of Chapter 3, which means that
rather complex and very different situations can be studied. Special cases
include monotone systems, (minimal) repair models, and damage processes,
with different information levels.
recommended that one starts with a simple model describing the main
features of the system.
The following application examples give further insight into the situa-
tions that can be modeled using the theory presented in this book.
V1
LP RHR SWS CCWS HP V3
V2
In both cases the failure time distributions and the failure rates have to
be estimated. One essential condition for the derivation of the above for-
mulae is that all components have stochastically independent failure times
or lifetimes. In some cases such an independence condition does not ap-
ply. In Chapter 3 a general theory is developed that also includes the case
of complex systems with dependent component lifetimes. The framework
presented covers different information levels, which allow updating of reli-
ability predictions using observations of the condition of the components
of the system, for example.
Normal production is 100%. For case i) this means that the train is
operating normally and a failure stops production completely. For case ii)
both trains are operating normally. If one train fails, production is reduced
to 50%. If both trains are down, production is 0.
Each train comprises compressor–turbine, cooler, and scrubber. A fail-
ure of one of these “components” results in the shutdown of the train.
Thus a train is represented by a series structure of the three components
compressor–turbine, cooler, and scrubber.
The following failure and repair time data were assumed:
Compressor–turbine 10 12
Cooler 2 50
Scrubber 1 20
some dependencies present, but by looking into the failure causes and the
way the components were defined, the assumption of independence was not
considered to be a serious weakness of the model, undermining the results
of the analysis.
To determine the repair time distribution, expert opinions were used. The
repair times, which also include fault diagnosis, repair preparation, test and
restart, were assessed for different failure modes. As for the uptimes, it was
assumed that no major changes over time take place concerning component
design, operational procedures, etc.
Uncertainty related to the input quantities used was not considered. In-
stead, sensitivity studies were performed with the purpose of identifying
how sensitive the results were with respect to variations in input parame-
ters.
Of the results obtained, we include the following examples:
• The gas train is down 2.7% of the time in the long run.
• For alternative i), the average system failure rate, i.e., the average
number of system failures per year, equals 13. For alternative ii) it
is distinguished between failures resulting in production below 100%
and below 50%. The average system failure rates for these levels are
approximately 26 and 0.7, respectively. Alternative ii) has a probabil-
ity of about 50% of having one or more complete shutdowns during
a year.
• The mean lost production equals 2.7% for both alternatives. The
probability that the lost production during one year is more than 4%
of demand is approximately equal to 0.16 for alternative i) and 0.08
for alternative ii).
Pierskalla and Voelker [140], which appeared with 259 references in 1976,
updated by Sherif and Smith [155] with an extensive bibliography of 524
references in 1981, followed by Valdez-Flores and Feldman [170] with 129
references in 1989. Bergman’s review [41] reflects the author’s experience
in industry and emphasizes the usefulness of reliability methods in appli-
cations. Gertsbakh’s paper [83] reviews asymptotic methods in reliability
and especially investigates under what conditions the lifetime of a com-
plex system with many components is approximately exponentially dis-
tributed. Natvig [136] gives a concise overview of importance measures for
monotone systems. The surveys of Arjas [4] and Koch [119] consider reli-
ability models using more advanced mathematical tools as marked point
processes and martingales. A guided tour for the non-expert through point
process and intensity-based models in reliability is presented in the arti-
cle of Hokstad [98]. The book of Thompson [167] gives a more elementary
presentation of point processes in reliability. Other reliability books that
we would like to draw attention to are Aven [16], Barlow and Proschan
[33, 34], Beichelt and Franken [38], Bergman and Klefsjö [42], Gaede [79],
Gertsbakh [82], Høyland and Rausand [99], and Kovalenko, Kuznetsov, and
Pegg [121]. Some of the models addressed in this introduction are treated
in the overview of Jensen [105] where related references can also be found.
2
Basic Reliability Theory
a • 1 2 ... n • b
Structural properties
We consider a system comprising n components, which are numbered con-
secutively from 1 to n. In this section we distinguish between two states: a
functioning state and a failure state. This dichotomy applies to the system
as well as to each component. To indicate the state of the ith component,
we assign a binary variable xi to component i:
1 if component i is in the functioning state
xi =
0 if component i is in the failure state.
(The term binary variable refers to a variable taking on the values 0 or 1.)
Similarly, the binary variable Φ indicates the state of the system:
1 if the system is in the functioning state
Φ=
0 if the system is in the failure state.
We assume that
Φ = Φ(x),
2
• •
..
.
1 2
• 1 3 •
2 3
Condition 1 says that the system cannot deteriorate (that is, change from
the functioning state to the failed state) by improving the performance
of a component (that is, replacing a failed component by a functioning
component). Condition 2 says that if all the components are in the failure
state, then the system is in the failure state, and if all the components are
in the functioning state, then the system is in the functioning state.
All the systems we consider are monotone. In the reliability literature,
much attention has be devoted to coherent systems, which is a subclass
of monotone systems. Before we define a coherent system we need some
notation.
The vector (·i , x) denotes a state vector where the state of the ith com-
ponent is equal to 1 or 0; (1i , x) denotes a state vector where the state of
the ith component is equal to 1, and (0i , x) denotes a state vector where
the state of the ith component is equal to 0; the state of component j, j
= i,
equals xj . If we want to specify the state of some components, say i ∈ J
(J ⊂ {1, 2, . . . , n}), we use the notation (·J , x). For example, (0J , x) de-
notes the state vector where the states of the components in J are all 0
and the state of component i, i ∈ / J, equals xi .
1 4
• •
2
5
3
pi = P (Xi = 1)
qi = P (Xi = 0)
h = h(p) = P (Φ(X) = 1) (2.2)
g = g(q) = P (Φ(X) = 0),
n
h = P (Φ(X) = 1) = P ( Xi = 1)
i=1
= P (X1 = 1, X2 = 1, . . . , Xn = 1)
n n
= P (Xi = 1) = pi . (2.3)
i=1 i=1
n
n
h=1− (1 − pi ) = pi . (2.4)
i=1 i=1
and
k
g = P( Aj ).
j=1
Furthermore, let
k
w1 = P (Aj )
j=1
w2 = i<j P (Ai Aj )
..
. r
wr = 1≤i1 <i2 <···<ir ≤k P ( j=1 Aij ).
26 2. Basic Reliability Theory
g = w1 − w2 + w3 − · · · + (−1)k+1 wk (2.5)
and for r ≤ k
g ≤ w1 − w2 + w3 − · · · + wr , r odd
g ≥ w1 − w2 + w3 − · · · − wr , r even.
Although in general it is not true that the upper bounds decrease and the
lower bounds increase, in practice it may be necessary to calculate only a
few wr terms to obtain a close approximation. If the component unrelia-
bilities qi are small, i.e., the reliabilities pi are large, then the w2 term will
usually be negligible compared to w1 , such that g ≈ w1 . Note that w1 is
an upper bound for g. By using w1 as an estimate for the system unrelia-
bility, we will overestimate the system unreliability. In most cases, such an
underestimation of reliability is preferable compared to an overestimation
of reliability.
With a large number of minimal cut sets, the exact calculation using
(2.5) will be extensive. The number of terms in the sum in wr equals kr .
Thus the total number of terms is
k
k
= (1 + 1)k − 1 = 2k − 1.
r
r=1
w2 = q1 q4 q5 + q1 q2 q3 q5 + q1 q2 q3 q4 q5 + q1 q2 q3 q4 q5 + q2 q3 q4 q5 + q1 q2 q3 q4
= 2.2 · 10−6 .
There exist also other bounds and approximations for the system relia-
bility. For example, it can be shown that
k
k
1− (1 − qi ) = 1 − pi
j=1 i∈Kj j=1 i∈Kj
k
Φ(X) = Xi ,
j=1 i∈Kj
and by multiplying out the right-hand side of this expression, we can find
an exact expression of h (or g). As an illustration consider a 2-out-of-3
system. Then
Φ = (X1 X2 ) · (X1 X3 ) · (X2 X3 )
Φ = X1 · X2 + X1 · X3 + X2 · X3 − 2 · X1 · X2 · X3 .
State Enumeration Method. Of the direct methods that do not use the
minimal cut (path) sets, the state enumeration method is conceptually the
simplest. With this method reliability is calculated using
n
h = EΦ(X) = Φ(x)P (X = x) = pxi i (1 − pi )1−xi .
x x:Φ(x)=1 i=1
This method, however, is not suitable for larger systems, since the number
of terms in the sum can be extremely large, up to 2n − 1.
where h(xi , p) equals the reliability of the system given that the state of
component i is xi . Formula (2.6) follows from the law of total probabil-
ity. This process repeats until the system comprises only series– parallel
structures. To illustrate the method we will give an example.
28 2. Basic Reliability Theory
1 4
• 3 •
2 5
1 4
• •
2 5
1 4
• •
2 5
These two structures are both of series–parallel form, and we see that
h(13 , p) = (p1 p2 )(p4 p5 )
h(03 , p) = p1 p4 p2 p5 .
Thus a formula for the exact computation of h(p) is established. Note that
it was sufficient to perform only one pivotal decomposition in this case. If
the structure given x3 = 1 had not been in a series–parallel form, we would
have had to perform another pivotal decomposition, and so on.
2.1 Complex Systems 29
Time Dynamics
The above theory can be applied to different situations, covering both re-
pairable and nonrepairable systems. As an example, consider a monotone
system in a time interval [0, t0 ], and assume that the components of the
system are “new” at time t = 0 and that a failed component stays in the
failure state for the rest of the time interval. Thus the component is not
repaired or replaced. This situation, for example, can describe a system
with component failure states that can only be discovered by testing or
inspection. We assume that the lifetime of component i is determined by
a lifetime distribution Fi (t) having failure rate function λi (t). To calculate
system reliability at a fixed point in time, i.e., the reliability function at
this point, we can proceed as above with qi = Fi (t) and pi = F̄i (t). Thus,
for a series system the reliability at time t takes the form
n
h= F̄i (t). (2.8)
i=1
But F̄i (t) can be expressed by means of the failure rate λi (t):
t
F̄i (t) = e− 0
λi (u)du
. (2.9)
By putting (2.9) into formula (2.8) we obtain
t n
h = e− 0
[ i=1 λi (u)]du
. (2.10)
From (2.10) we can conclude that the failure rate of a series structure of
independent components equals the sum of the failure rates of the com-
ponents of the structure. In particular this means that if the components
have constant failure rates λi , i = 1, 2, . . . , n, then the series structure has
constant failure rate λi .
For a parallel structure we do not have a similar result. With constant
failure rates of the components, the system will have a time-dependent
failure rate; cf. Example 2, p. 6.
30 2. Basic Reliability Theory
g = {1 − p1 p4 }{1 − p5 (p2 + p3 − p2 p3 )} ≈ w1
w1 = q1 q5 + q4 q5 + q1 q2 q3 + q2 q3 q4
= 0.02 · 0.01 + 0.01 · 0.01
+0.02 · 0.02 · 0.02 + 0.02 · 0.02 · 0.01
= 3 · 10−4 .
where h(p) is the reliability of the system and h(1i , p) is the reliability as-
suming that component i is in the best state 1. The measure IiA expresses
the system reliability improvement potential of the component, in other
words, the unreliability that is caused by imperfect performance of compo-
nent i. This measure can be used for all types of reliability definitions, and
it can be used for repairable or nonrepairable systems.
For a highly reliable monotone system the measure IiA is equivalent to
the well-known Vesely–Fussell importance measure [95]. In fact, in this case
IiA is approximately equal to the sum of the unreliabilities of the minimal
cut sets that include component i, i.e.,
IiA ≈ ql . (2.11)
j: i∈Kj l∈Kj
2.1 Complex Systems 31
∂h
IiB = .
∂pi
Thus Birnbaum’s measure equals the partial derivative of the system reli-
ability with respect to pi . The approach is well known from classical sensi-
tivity analyses. We see that if IiB is large, a small change in the reliability
of component i will give a relatively large change in system reliability.
Birnbaum’s measure might be appropriate, for example, in the operation
phase where possible improvement actions are related to operation and
maintenance parameters. Before looking closer into specific improvement
actions of the components, it will be informative to measure the sensitivity
of the system reliability with respect to small changes in the reliability of
the components.
To compute IiB the following formula is often used:
We see that for this example the Birnbaum measure gives the same ranking
of the components as the measure IiA . However, this is not true in general.
32 2. Basic Reliability Theory
Dependent Components
One of the most difficult tasks in reliability engineering is to analyze de-
pendent components (often referred to as common mode failures). It is
2.1 Complex Systems 33
xi0 , xi1 , xi2 , . . . , xiMi (xi0 < xi1 < xi2 < · · · < xiMi ).
The set comprising these states is denoted Si . The states xij represent, for
example, different levels of performance, from the worst, xi0 , to the best,
xiMi . The states xi0 , xi1 , . . . , xi,Mi −1 are referred to as the failure states of
the components.
Similarly, Φ = Φ(x) denotes the state (level) of the system. The various
values Φ can take are denoted
- 1
a • - 3 • b
- 2
System level 2
Minimal cut vectors: (0, 1, 2), (1, 0, 2), and (1, 1, 1)
Minimal path vectors : (1, 1, 2)
System level 1
Minimal cut vectors: (0, 0, 2) and (1, 1, 0)
Minimal path vectors : (0, 1, 1) and (1, 0, 1).
We call hj the reliability of the system at system level j. For the flow net-
work example above, a represents the expected throughput (flow) relatively
to the maximum throughput (flow) level.
The problem is to compute hj for one or more values of j, and a, based
on the probabilities pij . We assume that the random variables Xi are in-
dependent.
where (z1r , z2r , . . . , znr ) represents the rth cut vector for level j and is a
positive error term satisfying
n
≤ P (Xi ≤ min{zir , zil }).
r<l i=1
Discussion
The traditional reliability theory based on a binary approach has recently
been generalized by allowing components and system to have an arbitrary
2.2 Basic Notions of Aging 37
shows that F is uniquely determined by the failure rate. One notion of ag-
ing could be an increasing failure rate (IFR). However, this IFR property is
in some cases too strong and other intuitive notions of aging have been sug-
gested. Among them are the increasing failure rate average (IFRA) prop-
erty and the notions of new better than used (NBU) and new better than
38 2. Basic Reliability Theory
for all t such that Λ(t) < ∞, where∆Λ(s) = Λ(s) − Λ(s−) is the jump
height at time s and Λc (t) = Λ(t) − s≤t ∆Λ(s) is the continuous part of
Λ (cf. [2], p. 91 or [126], p. 436). In the case that F is continuous, we obtain
the simple exponential formula F̄ (t) = exp{−Λ(t)} or Λ(t) = − ln F̄ (t).
Examples can be constructed which show that none of the above impli-
cations can be reversed.
2.2 Basic Notions of Aging 41
where pα = (pα α
1 , ..., pn ).
Proof. We prove the result for binary structures, which are nondecreasing
in each argument (nondecreasing structures) but not necessarily satisfy
Φ(0) = 0 and Φ(1) = 1. We use induction by n, the number of components
in the system. For n = 1 the assertion is obviously true. The induction step
is carried out by means of the pivotal decomposition formula:
n h(1n , p ) + (1 − pn )h(0n , p ).
h(pα ) = pα α α α
F̄ (αt) = h(F̄1 (αt), ..., F̄n (αt)) ≥ h(F̄1α (t), ..., F̄nα (t))
≥ hα (F̄1 (t), ..., F̄n (t)) = F̄ α (t)
for 0 < α ≤ 1. For α = 0 this inequality holds true since F (0) = 0. This
proves the IFRA property of F.
Proof. The inequality for the survival probability follows from Lemma 6
with α = 1/µ, where in the degenerate case t∗ = µ we have tµ = t∗ = µ.
It remains to show tµ ≥ µ. To this end we first confine ourselves to the
continuous case and assume that F has no jump at t∗ . Then F (T ) has a
uniform distribution on [0, 1] and we obtain E[ln F̄ (T )] = −1. Now
F̄ (t + x)
= exp{−(Λ(t + x) − Λ(t))}
F̄ (t)
1 = E[− ln F̄ (T )] ≥ − ln F̄ (µ).
A lot of other bounds for the survival probability can be set up under
various conditions (see the references listed in the Bibliographic Notes).
Next we want to give one example of how such bounds can be carried over
to monotone systems. As an immediate consequence of the last lemma we
obtain the following corollary.
Actually the inequality holds true for t < min{tµ1 , ..., tµn }. The idea of
this inequality is to give a bound on the reliability of the system at time t
only based on h and µi and the knowledge that the Fi are of IFR type. If
2.2 Basic Notions of Aging 45
A general set-up should include all basic failure time models, should take
into account the time-dynamic development, and should allow for different
information and observation levels. Thus, one is led in a natural way to the
theory of stochastic processes in continuous time, including (semi-) martin-
gale theory, in the spirit of Arjas [3], [4] and Koch [119]. As was pointed out
in Chapter 1, this theory is a powerful tool in reliability analysis. It should
be stressed, however, that the purpose of this chapter is to present and
introduce ideas rather than to give a far reaching excursion into the theory
of stochastic processes. So the mathematical technicalities are kept to the
minimum level necessary to develop the tools to be used. Also, a number of
remarks and examples are included to illustrate the theory. Yet, to benefit
from reading this chapter a solid basis in stochastics is required. Section
3.1 summarizes the mathematics needed. For a more comprehensive and
in-depth presentation of the mathematical basis, we refer to Appendix A
and to monographs such as by Brémaud [52], Dellacherie and Meyer [67],
[68], Kallenberg [112], or Rogers and Williams [143].
ft = lim D(t, h)
h→0+
C2 For all t ∈ R+ , (hD(t, h)), h ∈ R+ , has P -a.s. paths, which are abso-
lutely continuous.
The following theorem shows that these conditions are sufficient for a
smooth semimartingale representation.
3.1 Notation and Fundamentals 51
Proof. We have to show that with (ft ) from condition C1 the right-
t
continuous process Mt = Zt − Z0 − 0 fs ds is an F-martingale, i.e., that for
all A ∈ Ft and s ≥ t, s, t ∈ R+ , E[IA Ms ] = E[IA Mt ], where IA denotes the
indicator variable. This is equivalent to
s
E[IA (Ms − Mt )] = Zs − Z t − fu du dP = 0.
A t
where the second equality follows from Fubini’s theorem. Then (3.2) and
(3.3) together yield
s s
E[IA (Zs − Zt )] = E[IA fu ]du = E IA fu du ,
t t
which is, for each realization ω, a right-continuous step function with jumps
of magnitude 1 and N0 (ω) = 0. Nt counts the number of time points Tn ,
which occur up to time t. Since (Nt ), t ∈ R+ , and (Tn ), n ∈ N, obviously
carry the same information, the associated counting process is sometimes
also called a point process.
A slight generalization is the notion of a multivariate point process. Let
(Tn ), n ∈ N, be a point process as before and (Vn ), n ∈ N, a sequence of
random variables with values in a finite set {a1 , ..., am }. Then the sequence
of pairs (Tn , Vn ), n ∈ N, is called a multivariate point process and the
associated m-variate counting process Nt = (Nt (1), ..., Nt (m)) is defined
by
Nt (i) = I(Tk ≤ t)I(Vk = ai ), i ∈ {1, ..., m}.
k≥1
which gives
E[IB (Ns − As )] = E[IB (Nt − At )].
Hence, N − A is a martingale.
Conversely, if N − A is a martingale, then A is integrable and we obtain
(3.5). In the general case, (3.4) can be established using the monotone class
theorem.
λt dt = E[dNt |Ft− ]
Ft = σ(Ns , Ys , 0 ≤ s ≤ t).
Markov Processes
The question whether Markov processes admit semimartingale representa-
tions can generally be answered in the affirmative: (most) Markov processes
and bounded functions of such processes have an SSM representation.
Let (Xt ), t ∈ R+ , be a right-continuous homogeneous Markov process on
(Ω, F, P x ) with respect to the (internal) filtration Ft = σ(Xs , 0 ≤ s ≤ t)
3.1 Notation and Fundamentals 57
with values in a measurable space (S, B(S)). For applications we will often
confine ourselves to S = R with its Borel σ-field B. Here P x , x ∈ S,
denotes the probability measure on the set of paths, which start in X0 = x:
P x (X0 = x) = 1.
Let B denote the set of bounded, measurable functions on S with values in
R and let E x denote expectation with respect to P x . Then the infinitesimal
generator A is defined as follows: If for f ∈ B the limit
1 x
lim (E f (Xh ) − f (x)) = g(x)
h→0+ h
exists for all x ∈ S with g ∈ B, then we set Af = g and say that f belongs
to the domain D(A) of the infinitesimal generator A. It is known that if
f ∈ D(A), then
t
f
Mt = f (Xt ) − f (X0 ) − Af (Xs )ds
0
defines a martingale (cf., e.g., [112], p. 328). This shows that a function
Zt = f (Xt ) of a homogeneous Markov process has an SSM representation
if f ∈ D(A).
Nt
Xt = Yn
n=1
if E τ < ∞ and g ∈ D(A) (see Dynkin [74], p. 133). This formula can
x
For some R > 0 and |x| < R we consider the stopping time σ = inf{t ∈
R+ : |Bt | ≥ R} with respect to the internal filtration, which is the first exit
time of the ball KR = {y ∈ Rk : |y| < R}. By means of the Dynkin formula
we can determine the expectation E x σ in the following way. Let us assume
E x σ < ∞ and choose g(x) = |x|2 . Dynkin’s formula then yields
σ
1
E x g(Bσ ) = R2 = |x|2 + E x 2k ds
2 0
= |x|2 + kE x σ,
3.1 Notation and Fundamentals 59
Random Stopping
One example is the stopping of a process Z, i.e., the transformation from
Z = (Zt ) to the process Z ζ = (Zt∧ζ ), where ζ is some stopping time. If
Z = (f, M ) is an F-SSM and ζ is an F-stopping time, then Z ζ is again an
F-SSM with representation
t
Ztζ = Z0 + I(ζ > s)fs ds + Mt∧ζ , t ∈ R+ .
0
A Product Rule
A second example of a transformation is the product of two SSMs. To see
under which conditions such a product of two SSMs again forms an SSM,
some further notations and definitions are required, which are presented in
Appendix A. Here we only give the general result. For the conditions and
a detailed proof we refer to Appendix A.6, Theorem 108, p. 239.
Let Z = (f, M ) and Y = (g, N ) be F-SSMs with M, N ∈ M20 and
M N ∈ M0 . Then, under suitable integrability conditions, ZY is an F-SSM
with representation
t
Zt Yt = Z0 Y0 + (Ys fs + Zs gs )ds + Rt ,
0
Remark 5 (i) If Z = (f, M ) and Y = (g, N ) are two SSMs and f and
g are considered as “derivatives,” then Y f + Zg is the “derivative” of the
product ZY in accordance with the ordinary product rule.
(ii) Martingales M, N , for which M N is a martingale are called orthog-
onal. This property can be interpreted in the sense that the increments of
the martingales are “conditionally uncorrelated,” i.e.,
E[(Mt − Ms )(Nt − Ns )|Fs ] = 0
for all 0 ≤ s ≤ t.
60 3. Stochastic Failure Models
A Change of Filtration
Another transformation is a certain change of the filtration, which allows
the observation of a stochastic process on different information levels.
is an A-SSM, where
(i) Ẑ is A-adapted with a.s. right-continuous paths with left-hand limits
and Ẑt = E[Zt |At ] for all t ∈ R+ ;
(ii) fˆ is A-progressively measurable with fˆt = E[ft |At ] for almost all
t ∈ R+ (Lebesgue measure);
(iii) M̄ is an A-martingale.
∞ ∞
If in addition Z0 , 0 |fs |ds ∈ L2 and M ∈ M20 , then Ẑ0 , 0 |fˆs |ds ∈ L2
and M̄ ∈ M20 .
In Section 5.2.1, Theorem 71, p. 175, conditions are given under which
the stopping problem for an SSM Z can be solved. If these conditions
apply to Ẑ, then we can solve this optimal stopping problem on the A-level
according to Theorem 71. Could the stopping problem be solved on the
F-level, then we get a bound for the stopping value on the A-level in view
of the inequality
The general lifetime model is then defined by the filtration F and the cor-
responding F-SSM representation of the indicator process.
is just the well-known relation between the standard failure rate and the
distribution function. This shows that if the hazard rate process λ is de-
terministic, it coincides with the ordinary failure rate.
T = inf{t ∈ R+ : Φ(Xt ) = 0}
for all nonnegative predictable processes C. Since (λt (J)) are the F-failure
rate processes corresponding to TJ , we have for all J ⊂ {1, ..., n}
∞ ∞
E Cs (J)dVs (J) = E Cs (J)I(TJ > s)λs (J)ds
0 0
and therefore
∞ ∞
E Cs (J)dVs (J) = E Cs (J)I(TJ > s)λs (J)ds
0 J⊂{1,...,n} 0 J⊂{1,...,n}
(3.11)
holds true for all nonnegative predictable processes (Ct (J)). If we especially
choose for some nonnegative predictable process C
Ct (J) = Ct ft− ,
Remark 6 (i) The proof follows the lines of Arjas (Theorem 4.1 in [6])
except the definition of the set ΓΦ (t) of the critical failure patterns at
time t. In [6] this set includes on {T > t} all cut sets, whereas in our
definition those cut sets J are excluded for which at time t “it is known”
that TJ = ∞. However, this deviation is harmless because in [6] only extra
zeros are added.
(ii) We now have a tool that allows us to determine the failure rate
process corresponding to the lifetime T of a complex system in an easy
way: Add at time t the failure rates of those patterns that are critical at t.
As an immediate consequence we obtain the following corollary.
Then the F-failure rate processes λ({i}) are deterministic, λt ({i}) = λt (i)
and on {T > t}
n
λt = (Φ(1i , Xt ) − Φ(0i , Xt ))λt (i) = λt (i), t ∈ R+ . (3.12)
i=1 {i}∈ΓΦ (t)
λt ({2}) = βI(T1 ≤ t)
To see how formula (3.12) can be used we resume Example 25, p. 62.
Using the results of Example 28, p. 65, the F-failure rate process of the
system lifetime is seen to be
I(T > t)λt = β12 I(T1 > t, T2 > t) + (β1 + β12 )I(T1 > t, T2 ≤ t)
+(β2 + β12 )I(T1 ≤ t, T2 > t),
I(T > t)λt = β12 I(T > t) + β1 I(T1 > t, T2 ≤ t) + β2 I(T1 ≤ t, T2 > t).
Definition 22 Let an F-SSM representation (3.8) hold true for the positive
random variable T with failure rate process λ. Then λ is called F-increasing
(F-IFR, increasing failure rate) or F-decreasing (F-DFR, decreasing failure
rate), if λ has P -a.s. nondecreasing or nonincreasing paths, respectively,
for t ∈ [0, T ).
Proof. Under the assumptions of the lemma no patterns with two or more
components are critical. Since the system is monotone, the number of ele-
ments in ΓΦ (t) is nondecreasing in t. So from (3.12), p. 68, it can be seen
that if all component failure rates are nondecreasing, the F-failure rate
process λ is also nondecreasing for t ∈ [0, T ).
70 3. Stochastic Failure Models
Such a closure theorem does not hold true for the ordinary failure rate
of the lifetime T as can be seen from simple counterexamples (see Section
2.2.1 or [34], p. 83). From the proof of Proposition 16 it is evident that we
cannot draw an analogous conclusion for decreasing failure rates.
where λ̂t = E[λt |At ]. Note that, in general, this formula only holds for
almost all t ∈ R+ . In all our examples we can find A-progressive versions
of the conditional expectations. The projection theorem shows that it is
possible to obtain the failure rate on a lower information level merely by
forming conditional expectations under some mild technical conditions.
Remark 9 This is a more technical remark to show how one can proceed
if F̄ is not continuous. Let (F̄t− ), t ∈ R+ , be the left-continuous version of
F̄ . Equation (3.15) can be rewritten as
t
F̄t = 1 − F̄s− λs ds − M̄t .
0
Under mild conditions an A-martingale L can be found such that M̄ can
t
be represented as the (stochastic) integral M̄t = 0 F̄s− dLs , take
t
I(F̄s− > 0)
Lt = dM̄s .
0 F̄s−
3.3 Point Processes in Reliability − Failure Time and Repair Models 73
t
With the semimartingale Z, Zt = − 0
λs ds − Lt , equation (3.15) becomes
t
F̄t = 1 + F̄s− dZs .
0
The mark Vn describes the event occurring at time Tn , for example the
magnitude of the shock arriving at a system at time Tn (see Figure 3.1).
For each A ∈ S we associate the counting process (Nt (A)), t ∈ R+ ,
∞
Nt (A) = I(Vn ∈ A)I(Tn ≤ t),
n=1
74 3. Stochastic Failure Models
S 6 V3
V1
V2
- t
T1 T2 T3
where [a] denotes the integer part of a. The mark sequence is deterministic
and alternating between 0 and 1:
1
Vn = (1 + (−1)n ).
2
We see that Nt ({0}) counts the number of failures and Nt ({1}) the number
of completed repairs up to time t.
We now want to extend the concept of stochastic intensities from point
processes to marked point processes. The internal filtration FN of (Tn , Vn )
3.3 Point Processes in Reliability − Failure Time and Repair Models 75
is defined by
FtN = σ(Ns (A), 0 ≤ s ≤ t, A ∈ S).
This filtration is equivalently generated by the history {(Tn , Vn ), Tn ≤ t}
of the marked point process.
where gn (ω, s, B) is, for fixed B, a measurable function and, for fixed (ω, s),
a finite measure on (S, S). Then the process given by
gn (t − Tn , C) gn (t − Tn , C)
λt (C) = = t−T
Gn ([t − Tn , ∞), S) 1 − 0 n gn (s, S)ds
is an FN -martingale.
is an F-martingale.
In the following subsections we consider some examples and particular
cases. As was mentioned in Example 33 a point process (Tn ) and its asso-
ciated counting process (Nt ) are special cases of marked point processes.
Point process models in our SSM set-up require the assumption that the
counting process (Nt ), t ∈ R+ , on a filtered probability space (Ω, F, F, P )
has an absolutely continuous compensator or, what amounts to the same,
admits an F-SSM representation
t
Nt = λs ds + Mt . (3.17)
0
This point process model is consistent with the general lifetime model con-
sidered in Section 3.2. If the process N is stopped at T1 , then (3.17) reduces
to (3.13):
t∧T1
Nt∧T1 = I(T1 ≤ t) = λs ds + Mt∧T1
0
t
= I(T1 > s)λs ds + Mt ,
0
whereas the repair times follow a distribution G with density g and hazard
rate η(t) = g(t)/Ḡ(t). Note that the failure/hazard rate is always set to 0
outside the support of the distribution. Then Nt ({0}) counts the number
of failures up to time t with an intensity λt ({0}) = ρ(t − Tn )X(t) on
(Tn , Tn+1 ], where X(t) = Vn on (Tn , Tn+1 ] indicates whether the system
is up or down at t. The corresponding internal intensity for Nt ({1}) is
λt ({1}) = η(t − Tn )(1 − X(t)). If the operating times are exponentially
distributed with rate ρ > 0, the expected number of failures up to time t
is given by
t
ENt ({0}) = ρ EX(s)ds.
0
m
λt (Γ) = {Φ(1i , Xt ) − Φ(0i , Xt )}ρt (i)Xt (i)
i=1
Proof. We know that ρt (i)Xt (i) are intensities of Nt (i) and thus
t
Mt (i) = Nt (i) − ρs (i)Xs (i)ds
0
defines a martingale (also with respect to the internal filtration of the super-
position because of the independence of the component processes). Define
and let ∆Φt− (i) be the left-continuous and therefore predictable version of
this process. Since at a jump of Nt (i) no other components change their
status (P -a.s.), we have
t t
∆Φs (i)dNs (i) = ∆Φs− (i)dNs (i).
0 0
It follows that
t t
m
Nt (Γ) − λs (Γ)ds = ∆Φs (i)dMs (i)
0 0 i=1
tm
= ∆Φs− (i)dMs (i).
0 i=1
with N (t) = N (t, R), which gives the total damage up to t, and we want
to derive the infinitesimal characteristics or the “intensity” of this process,
i.e., to establish an SSM representation. We might also think of repair
models, in which failures occur at random time points Tn . Upon failure,
repair is performed. If the cost for the nth repair is Vn , then Xt describes
the accumulated costs up to time t.
To derive an SSM representation of X, we first assume that we are given
a general intensity λt (C) of the marked point process with respect to some
filtration F. The main point now is to observe that
t
Xt = zN (ds, dz).
0 S
Then we can use Theorem 18, p. 76, with the predictable process H(s, z) =
z to see that t
MtF = z(N (ds, dz) − λs (dz)ds)
0 S
t
is a martingale if E 0 S |z|λs (dz)ds < ∞. Equivalently, we see that X has
the F-SSM representation X = (f, M F ), with
fs = zλs (dz).
S
and
If the failure probability does not depend on the number of failures N
the shock magnitudes are deterministic, Yn = 1, then we have
T = inf{Tn : Wn = 1},
N (t,S)
n
T = inf{t ∈ R+ : Yi ≥ K} = inf{Tn : Yi ≥ K}.
i=1 i=1
This failure model seems to be quite different from the previous one.
However, we see that it is just a special case setting the failure probability
function pk (x) of (3.19) for all k equal to the indicator of the interval
[K, ∞) :
pk (x) = p(x) = I[K,∞) (x).
Then we get
Example 36 Let us again consider the compound Poisson case with shock
arrival rate ν and Fn = F for all n ∈ N0 . Since rn (s − Tn ) = ν and
(K − XTn ) = (K − Xt ) on {Tn < t < Tn+1 }, we get
t
I(T ≤ t) = I(T > s)ν F̄ ((K − Xs )−)ds + Mt .
0
• “The ... assumption is made that the system failure rate is not dis-
turbed after performing minimal repair. For instance, after replacing
a single tube in a television set, the set as a whole will be about
as prone to failure after the replacement as before the tube failure”
(Barlow and Hunter [32]).
3.3 Point Processes in Reliability − Failure Time and Repair Models 83
• “A minimal repair is one which leaves the unit in precisely the con-
dition it was in immediately before the failure” (Phelps [139]).
The definition of the state of the system immediately before failure de-
pends to a considerable degree on the information one has about the sys-
tem. So it makes a difference whether all components of a complex system
are observed or only failure of the whole system is recognized. In the first
case the lifetime of the repaired component (tube of TV set) is associated
with the residual system lifetime. In the second case the only information
about the condition of the system immediately before failure is the age.
So a minimal repair in this case would mean replacing the system (the
whole TV set) by another one of the same age that as yet has not failed.
Minimal repairs of this kind are also called black box or statistical mini-
mal repairs, whereas the component-wise minimal repairs are also called
physical minimal repairs.
2 − e−(t+x)
P (T2 − T1 ≤ x|T1 = t) = P (T1 ≤ t + x|T1 > t) = 1 − e−x .
2 − e−t
It is (perhaps) no surprise that the total lifetime after a black box mini-
mal repair is stochastically greater than after a physical minimal repair:
Theorem 20 Assume that P (Tn < ∞) = 1 for all n ∈ N and that there
exist versions of conditional probabilities Ft (n) = E[I(Tn ≤ t)|Ftλ ] such
that for each n ∈ N (Ft (n)), t ∈ R+ , is an (Fλ -progressive) stochastic
process.
(i) Then the point process (Tn ) is an MRP if and only if for each n ∈ N
there exists some t ∈ R+ such that
P (0 < Ft (n) < 1) > 0.
(ii) If furthermore (Ft ) = (Ft (1)) has P -a.s. continuous paths of bounded
variation on finite intervals, then
t
1 − Ft = exp{− λs ds}.
0
Proof. (i) To prove (i) we show that P (Ft (n) ∈ {0, 1}) = 1 for all t ∈ R+
is equivalent to Tn being an Fλ -stopping time. Since we have F0 (n) = 0
and by the dominated convergence theorem for conditional expectations
lim Ft (n) = 1,
t→∞
the assumption that P (Ft (n) ∈ {0, 1}) = 1 for all t ∈ R+ is equivalent to
Ft (n) = I(Tn ≤ t) (P -a.s.). But as (Ft (n)) is adapted to Fλ this means that
Tn is an Fλ -stopping time. This shows that under the given assumptions
P (0 < Ft (n) < 1) > 0 is equivalent to Tn being no Fλ -stopping time.
(ii) For the second assertion we apply the exponential formula (3.16) as
described on p. 72.
86 3. Stochastic Failure Models
T1 = X1 ∨ X2 , Tn+1 = Tn + Xn+2 , n ∈ N.
where
∞
Xt = Yk I(Tk∗ ≤ t)
k=1
Theorem 21 If 0 < p(x) < 1 for all x holds true, then the point process
(Tn ) driven by the intensity (3.21) is an MRP.
Our aim is to extend the definition of λt also on {T1 ≤ t}. To this end
we extend the definition of Xt (i) on {Zi ≤ t} following the idea that upon
system failure the component which caused the failure is repaired minimally
in the sense that it is restored and operates at the same failure rate as it
had not failed before. So we define Xt (i) = 0 on {Zi ≤ t} if the first failure
of component i caused no system failure, otherwise we set Xt (i) = 1 on
{Zi ≤ t} (Note that in the latter case the value of Xt (i) is redefined for
t = Zi ). In this way we define Xt and by (3.22) the process λt for all t ∈ R+ .
This completed intensity λt induces a point process (Nt ) which counts
the number of minimal repairs on the component level. The corresponding
complete information filtration F = (Ft ), t ∈ R+ , is given by
which describe the time when component i becomes critical, i.e., the time
from which on a failure of component i would lead to system failure. It
follows that
m
λt = I(Yi ≤ t)λt (i),
i=1
Ftλ = σ(I(Yi ≤ s), 0 ≤ s ≤ t, i = 1, ..., m).
it is the ordinary failure rate of T1 . For the time to the first system failure
we have the two representations
t
I(T1 ≤ t) = I(T1 > s)λs ds + Mt F − level
0
t
= I(T1 > s)λ̂s ds + M̄t AT − level.
0
describes the MRP on the AT -level. Comparing these two information lev-
els, Example 37 suggests ENt ≥ ENt for all positive t. A general compar-
ison, also for arbitrary subfiltrations, seems to be an open problem (cf. [4]
and [135]).
To interpret this result one should note that on the component level only
the critical component which caused the system to fail is repaired. A black
box repair, which is a replacement by a system of the same age that has
not yet failed, could be a replacement by a system with both components
working.
1
qi = lim P (Yh
= i|Y0 = i),
h
h→0+
1
qij = lim P (Yh = j|Y0 = i), i, j ∈ S, i
= j,
h→0+ h
qii = −qi = − qij .
j=i
and N = (Nt ), t ∈ R+ , is
• The time points (Tn ) form a point process
the corresponding counting process Nt = n≥1 I(Tn ≤ t), which has
a stochastic intensity λYt depending on the unobservable state, i.e.,
92 3. Stochastic Failure Models
Nt
Rt = u + ct − Xn
n=1
describes the available capital at time t as the difference of the income and
the total amount of costs for minimal repairs up to time t.
The process R is commonly used in other branches of applied probability
like queueing or collective risk theory. In risk theory one is mainly interested
in the distribution of the time to ruin τ = inf{t ∈ R+ : Rt ≤ 0}.
Ft = σ(Ns , Ys , Xi , 0 ≤ s ≤ t, i = 1, ..., Nt ).
The failure rate process h = (ht ), t ∈ R+ , can be derived in the same way
as was done for shock models with failures of threshold type (cf. p. 81).
3.3 Point Processes in Reliability − Failure Time and Repair Models 93
Note that ruin can only occur at a failure time; therefore, the ruin time is
a hitting time of a compound point process:
Nt
τ = inf{t ∈ R+ : At = Bn ≥ u} = inf{Tn : ATn ≥ u},
n=1
Lemma 22 Let τ = τ (u) be the ruin time and F the distribution of the
claim sizes, F̄ (x) = F ((x, ∞)) = P (X1 > x), x ∈ R. Then the F-failure
rate process h is given by
m
ht = λYt F̄ (Rt −) = λi I(Yt = i)F̄ (Rt −), t ∈ R+ .
i=1
Note that the paths of R have only countable numbers of jumps such that
under the integral sign Rs − can be replaced by Rs . Taking expectations
on both sides of (3.24) one gets by Fubini’s theorem
t
ψ(u, t) = E[I(τ (u) > s)λF̄ (Rs )]ds (3.25)
0
t
= (1 − ψ(u, s))λE[F̄ (Rs )|τ (u) > s]ds.
0
This shows that the (possibly defective) distribution of τ (u) has the
hazard rate
λE[F̄ (Rt )|τ (u) > t].
Now let N X be the renewal process generated by the sequence (Xi ), i ∈
k t
N, NtX = sup{k ∈ N0 : i=1 Xi ≤ t}, and A(u, t) = 0 a(u, s)ds, where
X
a(u, s) = λP (Nu+cs = Ns ). Then bounds for ψ(u, t) can be established.
Proof. For the lower bound we use the representation (3.26) and simply
observe that E[F̄ (Rs )|τ (u) > s] ≥ F̄ (u + cs).
3.3 Point Processes in Reliability − Failure Time and Repair Models 95
For the upper bound we start with formula (3.24). Since {τ (u) > t} ⊂
{Rt ≥ 0}, we have
t
Vt = I(τ (u) > s)λF̄ (Rs )ds + Mt
0
t
≤ I(Rs ≥ 0)λF̄ (Rs )ds + Mt .
0
It remains to show that a(u, t) = λE[I(Rs ≥ 0)F̄ (Rs )]. Denoting the k-fold
k
convolution of F by F ∗k and Tk = i=1 Xi it follows by the independence
of the claim arrival process and (Xi ), i ∈ N,
Nt
E[I(Rt ≥ 0)F̄ (u + ct − Xi )]
i=1
∞
k
k
= E[I(u + ct − Xi ≥ 0)F̄ (u + ct − Xi )]P (Nt = k)
k=0 i=1 i=1
∞ u+ct
= F̄ (u + ct − x)dF ∗k (x)P (Nt = k)
k=0 0
∞
= {F ∗k (u + ct) − F ∗(k+1) (u + ct)}P (Nt = k)
k=0
∞
X
= P (Nu+ct = k)P (Nt = k)
k=0
X
= P (Nu+ct = Nt ),
P (NJ ≤ k), k ∈ N0 ,
M (J) = ENJ ,
A[u, v] = P (Φt = 1, ∀t ∈ [u, v])
= P (Φu = 1, N(u,v] = 0).
P (YJ ≤ y), y ∈ R+ ,
EYJ
AD (J) = ,
|J|
where |J| denotes the length of the interval J. The measure AD (J) is
in the literature sometimes referred to as the interval unavailability,
but we shall not use this term here.
Xt
0 t
T1 R1 T2 R2 T3
FIGURE 4.1. Time Evolution of a Failure and Repair Process for a One-Compo-
nent System Starting at Time t = 0 in the Operating State
n−1
Sn = T1 + (Rk + Tk+1 ), n ∈ N,
k=1
100 4. Availability Analysis of Complex Systems
and
n
Sn◦ = (Tk + Rk ), n ∈ N.
k=1
By convention, S0 = S0◦ = 0, and sums over empty sets are zero. We see
that Sn represents the nth failure time, and Sn◦ represents the completion
time of the nth repair.
The Sn sequence generates a modified (delayed) renewal process N with
renewal function M . The first interarrival time has distribution F . All other
interarrival times have distribution F ∗ G (convolution of F and G), with
mean µF + µG . Let H (n) denote the distribution function of Sn . Then
H (n) = F ∗ (F ∗ G)∗(n−1) ,
(cf. (B.2), p. 244, in Appendix B). The Sn◦ sequence generates an ordinary
renewal process N ◦ with renewal function M ◦ . The interarrival times, Tk +
Rk , have distribution F ∗ G, with mean µF + µG . Let H ◦(n) denote the
distribution function of Sn◦ . Then
H ◦(n) = (F ∗ G)∗n .
Let αt denote the forward recurrence time at time t, i.e., the time from t
to the next event:
αt = SNt +1 − t on {Xt = 1}
and
◦
α t = SN ◦
t +1
− t on {Xt = 0}.
Hence, given that the system is up at time t, the forward recurrence time
αt equals the time to the next failure time. If the system is down at time
t, the forward recurrence time equals the time to complete the repair. Let
Fαt and Gαt denote the conditional distribution functions of αt given that
Xt = 1 and Xt = 0, respectively. Then we have for x ∈ R
and
◦
Gαt (x) = P (αt ≤ x|Xt = 0) = P (SN ◦
t +1
− t ≤ x|Xt = 0).
Similarly for the backward recurrence time, we define βt , Fβt , and Gβt . The
backward recurrence time βt equals the age of the system if the system is
4.2 One-Component Systems 101
up at time t and the duration of the repair if the system is down at time t,
i.e.,
◦
βt = t − SN ◦
t
on {Xt = 1}
and
βt = t − SNt on {Xt = 0}.
which gives
∞
t
A(t) = EXt = F̄ (t) + F̄ (t − x)dH ◦(n) (x)
n=1 0
t
= F̄ (t) + F̄ (t − x)dM ◦ (x).
0
Some closely related results are stated below in Propositions 24 and 25.
Proof. The result clearly holds for n = 0. For n ≥ 1, the result follows by
observing that
If the downtimes are much smaller then the uptimes in probability (which
is the common situation in practice), then N is close to a renewal process
generated by all the uptimes. Hence, if the times to failure are exponen-
tially distributed, the process N is close to a homogeneous Poisson process.
Formal asymptotic results will be established later, see Section 4.4.
In the following two propositions we relate the distribution of the forward
and backward recurrence times and the renewal functions M and M ◦ .
Proof. The results follow by applying the Key Renewal Theorem (see
Appendix B, p. 247) to formulas (4.6), (4.7), (4.10), and (4.11).
Let us introduce
w
0
F̄ (x) dx
F∞ (w) = , (4.13)
µF
w
0
Ḡ(x) dx
G∞ (w) = . (4.14)
µG
and
lim Ḡαt (w) = lim Ḡβt (w) = Ḡ∞ (w). (4.15)
t→∞ t→∞
Proof. To establish these formulas, we use (4.2) (see p. 101), Theorem 29,
and identities like
P (Xt = 1, αt > w)
P (αt > w|Xt = 1) = .
A(t)
Theorem 31 For n ∈ N0 ,
Proof. The result follows from the expression for the distribution of the
number of failures given in Theorem 26, p. 103, combined with the limiting
availability formula (4.2), p. 101, and Proposition 30.
106 4. Availability Analysis of Complex Systems
The expected number of system failures can be found from the distribu-
tion function. Obviously, M (v) ≈ M ◦ (v) for large v. The exact relationship
between M (v) and M ◦ (v) is given in the following proposition.
where λ is the failure rate function and βv is the backward recurrence time
at time v, i.e., the relative age of the system at time v, cf. Section 3.3.2, p.
77. We have m(v) = Eηv , where m(v) is the renewal density of M (v). Thus
if the system has an exponential lifetime distribution with failure rate λ,
4.2 One-Component Systems 107
This bound can be used to establish an upper bound also for the unavail-
ability Ā(t).
It follows that
t
Ā(t) ≤ sup λ(s) Ḡ(t − x)dx
s≤t 0
t
= sup λ(s) Ḡ(u)du ≤ [sup λ(s)]µG ,
s≤t 0 s≤t
Now suppose at time t that the system is functioning and the relative
age is u. What can we then say about the intensity process at time t + v
(v > 0)? The probability distribution of ηt+v is determined if we can find
the distribution of the relative age at time t + v. But the relative age is
given by (4.10), p. 104, slightly modified to take into account that the first
uptime has distribution given by Fu (x) = 1 − F̄ (u + x)/F̄ (u) for 0 ≤ u ≤ t:
EYu
lim AD (u) = lim = Ā. (4.27)
u→∞ u→∞ u
With probability one,
Yu
lim = Ā. (4.28)
u→∞ u
Proof. Using the definition of Yu and Fubini’s theorem we find that
u
EYu = E (1 − Φt )dt
u0
= E(1 − Φt )dt
0
u
= Ā(t)dt.
0
This proves (4.26). Formula (4.27) follows by using (4.26) and the limiting
availability formula (4.2), p. 101. Alternatively, we can use the Renewal
Reward Theorem (Theorem 122, p. 250, in Appendix B), interpreting Yu
as a reward. From this theorem we can conclude that EYu /u converges
to the ratio of the expected downtime in a renewal cycle and the expected
length of a cycle, i.e., to the limiting unavailability Ā. The Renewal Reward
Theorem also proves (4.28).
Now we look into the problem of finding formulas for the downtime
distribution.
Let Nsop denote the number of system failures after s units of operational
time, i.e.,
∞
n
op
Ns = I( Tk ≤ s).
n=1 k=1
Note that
n
Nsop ≥ n ⇔ Tk ≤ s, n ∈ N. (4.29)
k=1
Let Zs denote the total downtime associated with the operating time s,
but not including s, i.e.,
op
Ns−
Zs = Ri ,
i=1
where
op
Ns− = lim Nuop .
u→s−
110 4. Availability Analysis of Complex Systems
Define
Cs = s + Z s .
We see that Cs represents the calendar time after an operation time of s
time units and the completion of the repairs associated with the failures
occurred up to s but not including s.
The following theorem gives an exact expression of the probability dis-
tribution of Yu , the total downtime in [0, u].
We have used that the repair times are independent of the process Nsop and
that F is continuous. This proves (4.30). Formula (4.31) follows by using
(4.29).
In the case that F is exponential with failure rate λ the following simple
bounds apply
The lower bound follows by including only the first two terms of the sum
in (4.30), observing that Ntop is Poisson distributed with mean λt, whereas
the upper bound follows by using (4.30) and the inequality
In the case that the interval is rather long, the downtime will be approxi-
mately normally distributed, as is shown in Theorem 38 below.
where
µ2F σG
2
+ µ2G σF2
τ2 = .
(µF + µG )3
Proof. The result follows by applying Theorem 123, p. 251, in Appendix B,
observing that the length of the first renewal cycle equals S1◦ = T1 + R1 ,
the downtime in this cycle equals YS1◦ = R1 and
Xt
0 t
R1 T1 R2 T2 R3 T3
FIGURE 4.2. Time Evolution of a Failure and Repair Process for a One-Compo-
nent System Starting at Time t = 0 in the Failure State
respectively, and A(0) = A, then it can be shown that the process (Xt , αt )
is stationary; see Birolini [46]. This means that we have, for example,
A(t) = A, ∀t ∈ R+ ,
∞
F̄ (x) dx
A[u, u + w] = w
, ∀u, w ∈ R+ ,
µF + µG
w
M (u, u + w] = , ∀u, w ∈ R+ .
µF + µ G
Theorem 39 The system availability at time t, A(t), and the limiting sys-
tem availability, limt→∞ A(t), are given by
A(t) = h(A1 (t), A2 (t), . . . , An (t)) = h(A(t)), (4.33)
lim A(t) = h(A1 , A2 , . . . , An ) = h(A). (4.34)
t→∞
n u
= [h(1i , A(t)) − h(0i , A(t))] mi (t) dt
i=1 0
n u
= [h(1i , A(t)) − h(0i , A(t))] Eηt (i)dt,
i=1 0
≤ uλ̃,
n
where λ̃ = i=1 λi .
Next we will generalize the asymptotic results (4.23)−(4.25), p. 108.
ENu n
h(1i , A) − h(0i , A)
lim = , (4.37)
u→∞ u i=1
µFi + µGi
EN(u,u+w] n
h(1i , A) − h(0i , A)
lim = . (4.38)
u→∞ w i=1
µFi + µGi
Nu h(1i , A) − h(0i , A)
n
lim = . (4.39)
u→∞ u µFi + µGi
i=1
Proof. To prove these results, we make use of formula (4.35). Dividing this
formula by u and using the Elementary Renewal Theorem (see Appendix
B, p. 247), formula (4.37) can be shown to hold noting that E[Φ(1i , Xt ) −
Φ(0i , Xt )] → [h(1i , A) − h(0i , A)] as t → ∞. Let h∗i (t) = E[Φ(1i , Xt ) −
Φ(0i , Xt )] and h∗i its limit as t → ∞. Then we can write formula (4.35)
divided by u in the following form:
n
u
Mi (u) 1
h∗i + [h∗i (t) − h∗i ]dMi (t) .
i=1
u u 0
where mi (t) denotes the renewal density of Mi (t), we see that EλΦ (t) → λΦ
as t → ∞ provided that mi (t) → 1/(µFi + µGi ). From renewal theory, see
Theorem 118, p. 248, in Appendix B, we know that if the renewal cycle
lengths Tik + Rik have a density function h with h(t)p integrable for some
p > 1, and h(t) → 0 as t → ∞, then Mi has a density mi such that
mi (t) → 1/(µFi +µGi ) as t → ∞. See the remark following Theorem 118 for
other sufficient conditions for mi (t) → 1/(µFi + µGi ) to hold. If component
i has an exponential lifetime distribution with parameter λi , then mi (t) =
λi Ai (t), (cf. (4.18), p. 107), which converges to 1/(µFi + µGi ).
It is intuitively clear that the process X is regenerative if the components
have exponential lifetime distributions. Before we prove this formally, we
formulate a result related to ENu◦ : the expected number of visits to the best
state (1, 1, . . . , 1) in [0, u]. The result is analogous to (4.35) and (4.37).
116 4. Availability Analysis of Complex Systems
Furthermore,
ENu◦ n n
1
lim = Aj . (4.43)
u→∞ u j=1
µ
i=1 Fi
ENu◦
n
1
lim = Aj
u→∞ u i=1 j=i
µFi + µGi
n
n
1
= Aj .
j=1 i=1
µFi
The above result can be shown heuristically using the same type of argu-
ments as in Remark 11. For highly available components we have Ai ≈ 1,
hence the limit (4.43) is approximately equal to
n
1
.
µ
i=1 Fi
For all t ∈ R+ ,
P (Nt◦ ≥ i) ≤ P (τi < ∞),
and it follows that
∞
ENt◦ = P (Nt◦ ≥ i)
i=1
∞
≤ (1 − )i
i=1
1− 1−
= = < ∞.
1 − (1 − )
Under the given set-up the regenerative property only holds true if the
lifetimes of the components are exponentially distributed. However, this
can be generalized by considering phase-type distributions with an enlarged
state space, which also includes the phases; see Section 4.7.1, p. 156.
If the repair times are small compared to the lifetimes and the lifetimes
are exponentially distributed with parameter λi , then clearly the number
of failures of component i in the time interval (u, u + w], Nu+w (i) − Nu (i),
is approximately Poisson distributed with parameter λi w. If the system
is a series system, and we make the same assumptions as above, it is also
clear that the number of system failures in the interval
n (u, u + w] is ap-
proximately Poisson distributed with parameter i=1 λi w. The number
of systemfailures in [0, t], Nt , is approximately a Poisson process with
n
intensity i=1 λi .
If the system is highly available and the components have constant failure
rates, the Poisson distribution (with the asymptotic rate λΦ ) will in fact also
produce good approximations for more general systems. As motivation, we
observe that EN(u,u+w] /w is approximately equal to the asymptotic system
failure rate λΦ , and N(u,u+w] is “nearly independent” of the history of N up
to u, noting that the process X frequently restarts itself probabilistically,
i.e., X re-enters the state (1, 1, . . . , 1).
Refer to [24, 91] for Monte Carlo simulation studies of the accuracy of
the Poisson approximation. As an illustration of the results obtained in
these studies, consider a parallel system of two identical components where
the failure rate λ is equal to 0.05, the repair times are all equal to 1, and
the expected number of system failures is equal to 5. This means, as shown
below, that the time interval is about 1000 and the expected number of
component failures is about 100. Using the definition of the system failure
rate λΦ (cf. (4.41), p. 115) with µG = 1, we obtain
ENu 5 1 µG 1
= ≈ λΦ = 2Ā1 =21 · 1
u u µF1 + µG1 λ + µG λ + µG
≈ 2λ2 = 0.005.
Hence u ≈ 1000 and 2 ENu (i) ≈ 2λu ≈ 100. Clearly, this is an approximate
steady-state situation, and we would expect that the Poisson distribution
gives an accurate approximation. The Monte Carlo simulations in [24] con-
firm this. The distance measure, which is defined as the maximum distance
between the Poisson distribution (with mean λΦ u) and the “true” distri-
bution obtained by Monte Carlo simulation, is equal to 0.006. If we take
instead λ = 0.2 and ENu = 0.2, we find that the expected number of
component failures is about 1. Thus, we are far away from a steady-state
situation and as expected the distance measure is larger: 0.02. But still the
Poisson approximation produces relatively accurate results.
In the following we look at the problem of establishing formalized asymp-
totic results for the distribution of the number of system failures. We first
consider the interval reliability.
4.4 Distribution of the Number of System Failures 119
sults from Monte Carlo simulations presented in [24] show that the factors
q/E0 S, q/ES, and 1/ETΦ typically give slightly better results (i.e., better
fit to the exponential distribution) than the system failure rate λΦ . From
a computational point of view, however, λΦ is much more attractive than
the other factors, which are in most cases quite difficult to compute. We
therefore normally use λΦ as the normalizing factor.
The basic idea of the proof of the asymptotic exponentiality of αTΦ is as
follows: If we assume that X is a regenerative process and the probability
that a system failure occurs in a renewal cycle, i.e., q, is small (converges
to zero), then the time to the first system failure will be approximately
equal to the sum of a number of renewal cycles having no system failures;
and this number of cycles is geometrically distributed with parameter q.
Now if q → 0 as j → ∞, the desired result follows by using Laplace
transformations. The result can be formulated in general terms as shown
in the lemma below.
Note that series systems are excluded since such systems have q = 1. We
will analyze series systems later in this section; see Theorem 54, p. 135.
ν−1
S∗ = Si .
i=1
q→0 (4.45)
and
qc2S → 0, (4.46)
[L(qx/a) − 1 + qx]/q → 0.
≤ E[(qx/a)S]2 /2q
x2 q
= ES 2
2 a2
x2
= q(1 + c2S ).
2
The desired conclusion (4.47) follows now since q → 0 and qc2S → 0 (as-
sumptions (4.45) and (4.46)).
q → 0, (4.48)
qc20S → 0, (4.49)
qE1 S
→ 0, (4.50)
ES
E1 (NS − 1) → 0. (4.51)
Then
A[0, u/λΦ ] → e−u , i.e., λΦ TΦ → Exp(1).
D
(4.52)
122 4. Availability Analysis of Complex Systems
Proof. Using Lemma 45, we first prove that under conditions (4.48)−(4.50)
we have
TΦ q D
→ Exp(1). (4.53)
E0 S
Let ν denote the renewal cycle index associated with the time of the first
system failure, TΦ . Then it is seen that TΦ has the same distribution as
ν−1
S0k + Wν ,
k=1
where (S0k ) and (Wk ) are independent sequences of i.i.d. random variables
with
P (S0k ≤ s) = P0 (S ≤ s)
and
P (Wk ≤ w) = P1 (TΦ ≤ w).
Both sequences are independent of ν, which has a geometrical distribution
with parameter q = P (NS ≥ 1). Hence, (4.53) follows from Lemma 45
provided that
Wν q P
→ 0. (4.54)
E0 S
By a standard conditional probability argument it follows that
ES = (1 − q)E0 S + qE1 S,
and by noting that
qE1 Tφ qEW qE1 S qE1 S(1 − q)
= ≤ =
E0 S E0 S E0 S ES − qE1 S
ES (1 − q)
qE1 S
= → 0, (4.55)
1 − qE
ES
1S
E0 S 1 − q EES
1S
= →1
ES 1−q
4.4 Distribution of the Number of System Failures 123
for α equal to λφ . But the result also holds for the normalizing factors
q/E0 S, q/ES, and 1/ETφ . For q/E0 S and q/ES this is seen from the proof
of the theorem. To establish the result for 1/ETφ , let
ν−1
S∗ = S0i .
i=1
n ∞
≤ λi tdGi (t) = d.
i=1 0
Consequently, d → 0 implies q → 0.
It remains to show (4.56). Clearly, the busy period will only increase if we
assume that the flow of failures of component i is a Poisson flow with pa-
rameter λi , i.e., we adjoin failures that arise according to a Poisson process
on intervals of repair of component i, assuming that repair begins immedi-
ately for each failure. This means that the process can be regarded as an
M/G/∞ queueing process, where the Poisson input flow has parameter λ̃
and there are an infinite number of devices with servicing time distributed
according to the law
n
G(t) = λ̃i Gi (t).
i=1
Note that the probability that a “failure is due to component i” equals λ̃i .
It is also clear that the busy period increases still more if, instead of an
infinite number of servicing devices, we take only one, i.e., the process is a
queueing process M/G/1. Thus, E(S )2 ≤ E(S̃ )2 , where S̃ is the busy
period in a single-line system with a Poisson input flow λ̃ and servicing
distribution G(t). It is a well-known result from the theory of queueing
processes (and branching processes) that the second-order moment of the
2
busy period (extinction time) equals ERG /(1 − λ̃ERG )3 , where RG is the
service
time having distribution G, see, e.g., [89]. Hence, by introducing
d2 = i=1 λi ERi2 we obtain
n
λ̃d2 n2 c21 c2
λ̃2 E(S )2 ≤ 3
≤ .
(1 − d) (1 − d)3
126 4. Availability Analysis of Complex Systems
where t∗ = sup{t ∈ R+ : Ḡi (t) > 0}. We see that µ̆i expresses the maximum
expected residual repair time of component i We might have µ̆i = ∞, but
we shall in the following restrict attention to the finite case. We know from
Section 2.2, p. 39, that if Gi has the NBUE property, then
µ̆i ≤ µGi .
If the repair times are bounded by a constant c, i.e., P (Rik ≤ c) = 1, then
µ̆i ≤ c. Let
n
µ̃ = µ̆i .
i=1
≤ λ̃µ̃.
Since NS ≥ 2 implies Lt ≥ 1, formula (4.57) is shown for k = 2 and
P1 conditional on the event that the first system failure occurs at time t.
4.4 Distribution of the Number of System Failures 127
Integrating over the failure time t, we obtain (4.57) for k = 2. Now assume
that P1 (NS ≥ k) ≤ (λ̃µ̃)k−1 for a k ≥ 2. We must show that
P1 (NS ≥ k + 1) ≤ (λ̃µ̃)k .
We have
Suppose that the kth system failure in the renewal cycle occurs at time t.
Then if at least one more system failure occurs in the renewal cycle, there
must be at least one component failure before all components are again
functioning, i.e., Lt ≥ 1. Repeating the above arguments for k = 2, the
inequality (4.58) follows.
Remark 13 The inequality (4.57) states that the number of system fail-
ures in a renewal cycle when it is given that at least one system failure
occurs is bounded in distribution by a geometrical random variable with
parameter λ̃µ̃ (provided this quantity is less than 1).
Theorem 49 Assume that the system has no components in series with the
rest of the system. Furthermore, assume that component i has an exponen-
tial lifetime distribution with failure rate λi > 0, i = 1, 2, . . . , n. If d → 0,
where d = λ̃µ̃, and there exist constants c1 and c2 such that λi ≤ c1 < ∞
and ERi2 ≤ c2 < ∞ for all i, then the conditions (4.48) − (4.51) of The-
orem 46 (p. 121) are all met, and, consequently, the limiting result (4.52)
D
holds, i.e., λΦ TΦ → Exp(1).
E1 (NS − 1) ≤ d /(1 − d ),
The above results show that the time to the first system failure is ap-
proximately exponentially distributed with parameter q/E0 S ≈ q/ES ≈
1/ETΦ ≈ λΦ . For a system comprising highly available components, it is
clear that P (Xt = 1) would be close to one, hence the above approxima-
tions for the interval reliability can also be used for an interval (t, t + u].
128 4. Availability Analysis of Complex Systems
TΦ (1) + TΦ (2)(1 − I(N(1) ≥ 2)) ≤ TΦ∗ (1) + TΦ∗ (2) ≤ TΦ (1) + TΦ (2) + Sν ,
where N(1) = equals the number of system failures in the first renewal cycle
having one or more system failures, and Sν equals the length of this cycle
(ν denotes the renewal cycle index associated with the time of the first
system failure). For α being one of the normalizing factors (i.e., q/E0 S,
q/ES, 1/ETΦ , or λΦ ), we will prove that αTΦ (2)I(N(1) ≥ 2) converges in
probability to zero. It is sufficient to show that P (N(1) ≥ 2) → 0 noting
that
P (αTΦ (2)I(N(1) ≥ 2) > ) ≤ P (N(1) ≥ 2).
But
where the last expression converges to zero in view of (4.51), p. 121. The dis-
tribution of Sν is the same as the conditional probability of the cycle length
given a system failure occurs in the cycle, cf. Theorem 46 and its proof.
Thus, if (4.48)−(4.51) hold, it follows that α(TΦ∗ (1) + TΦ∗ (2)) converges in
distribution to the sum of two independent exponentially distributed ran-
dom variables with parameter 1, i.e.,
Asymptotic Normality
Now we turn to a completely different way to approximate the distribution
of Nt . Above, the up and downtime distributions are assumed to change
such that the system availability increases and after a time rescaling Nt
converges to a Poisson variable. Now we leave the up and downtime dis-
tribution unchanged and establish a central limit theorem as t increases to
infinity. The theorem generalizes (4.16), p. 106.
Below we argue that if the system failure rate is small, then we have
2
γΦ ≈ λΦ .
We obtain
2
γΦ E(N − λΦ S)2
=
λΦ λΦ ES
q −1 EN 2 + q −1 (λΦ )2 ES 2 − 2q −1 λΦ E[N S]
=
q −1 λΦ ES
E1 N 2 + q −1 (λΦ )2 ES 2 − 2q −1 λΦ E[N S]
= .
q −1 λΦ ES
Since the denominator converges to 1 (the denominator equals the ratio
between two normalizing factors), the result follows if we can show that
E1 N 2 converges to 1 and all the other terms of the numerator converge to
zero. Writing
This gives ∞
1 1
E0 S = + re−λr dG(r).
2λ 1 − q 0
From the Taylor formula we have e−x = 1−xO(1), x → 0, where |O(1)| ≤ 1.
Using this and noting that
∞
re−λr dG(r) = µG [1 + λµG (c2G + 1)O(1)],
0
it can be shown that if the failure rate λ and the squared coefficient of
variation c2G are bounded by a finite constant, then the normalizing factor
q/E0 S is asymptotically given by
q
= 2λ2 µG + o(λµG ), λµG → 0.
E0 S
4.4 Distribution of the Number of System Failures 133
Now we will show that the system failure rate λΦ , defined by (4.41), p.
115, is also approximately equal to 2λ2 µG . First note that the unavailability
of a component, Ā, is given by Ā = λµG /(1 + λµG ). It follows that
2Ā
λΦ = = 2λ2 µG + o(λµG ), λµG → 0, (4.64)
λ−1 + µG
provided that the failure rate λ is bounded by a finite constant.
Next we will compute the exact distribution and mean of TΦ . Let us
denote this distribution by FTΦ (t). In the following FX denotes the distri-
bution of any random variable X and FiX (t) = Pi (X ≤ t), i = 0, 1, where
P0 (·) = P (·|NS = 0) and P1 (·) = P (·|NS ≥ 1). Observe that the length of
a renewal cycle S can be written as S + S , where S represents the time
to the first failure of a component, and S represents the “busy” period,
i.e., the time from when one component has failed until the process returns
to the best state (1, 1). The variables S and S are independent and S is
exponentially distributed with rate λ̃ = 2λ. Now, assume a component has
failed. Let R denote the repair time of this component and let T denote
the time to failure of the operating component. Then
1 ∞
F1T (t) = P (T ≤ t|T ≤ R) = (1 − e−λ(t∧r) )dG(r),
q 0
where a ∧ b denotes the minimum of a and b. Furthermore,
1 t −λr
F0R (t) = P (R ≤ t|R < T ) = e dG(r),
q̄ 0
where q̄ = 1 − q. Now, by conditioning on whether a system failure occurs
in the first renewal cycle or not, we obtain
FTΦ (t) = qP (TΦ ≤ t|NS ≥ 1) + q̄P (TΦ ≤ t|NS = 0)
= qF1TΦ (t) + q̄F0TΦ (t). (4.65)
To find an expression for F1TΦ (t) we use a standard conditional probability
argument, yielding
t
F1TΦ (t) = P1 (TΦ ≤ t|S = s)dFS (s)
0
t
= P (T ≤ t − s|T ≤ R)dFS (s)
0
t
= F1T (t − s)dFS (s).
0
Consider now F0TΦ (t). By conditioning on S = s, we obtain
t
F0TΦ (t) = P0 (TΦ ≤ t|S = s)dF0S (s)
0
t
= FTΦ (t − s)dF0S (s).
0
134 4. Availability Analysis of Complex Systems
where t
h(t) = q F1T (t − s)dFS (s). (4.66)
0
Hence, FTΦ (t) satisfies a renewal equation with the defective distribution
q̄F0S (s), and arguing as in the proof of Theorem 111, p. 245, in Appendix B,
it follows that t
FTΦ (t) = h(t) + h(t − s)dM0 (s), (4.67)
0
where the renewal function M0 (s) equals
∞
∗j
q̄ j F0S (s).
j=1
2λ2 1 − LG (v + λ)
LFTΦ (v) = · .
λ + v v + 2λ(1 − LG (v + λ))
The mean ETΦ can be found from this formula, or alternatively by using
a direct renewal argument. We obtain
ETΦ = ES + E(TΦ − S )
1
= + Emin{R, T } + (1 − q)ETΦ ,
2λ
noting that the time one component is down before system failure occurs
or the renewal cycle terminates equals min{R, T }. If a system failure does
not occur, the process starts over again. It follows that
1 Emin{R, T }
ETΦ = + .
2qλ q
4.4 Distribution of the Number of System Failures 135
Note that
∞ ∞
Emin{R, T } = F̄ (t)Ḡ(t)dt = e−λt Ḡ(t)dt.
0 0
3 1 − 23 LG (λ)
ETΦ = .
2λ 1 − LG (λ)
Theorem 54 Assume that Φ is a series system and the lifetimes are ex-
ponentially distributed. Let λi be the failure rate of component i. If d → 0
(as j → ∞), then (as j → ∞)
D
Nt/λ̃ → Poisson(t).
Proof. Let NtP (i) be the Poisson process with intensity λi generated by
the consecutive uptimes of component i. Then it is seen that
n
n
P
Nt/ λ̃
(i) − D = Nt/λ̃ ≤ P
Nt/ λ̃
(i),
i=1 i=1
136 4. Availability Analysis of Complex Systems
where
n
D= P
Nt/ λ̃
(i) − Nt/λ̃ .
i=1
n
n
P
E Nt/ λ̃
(i) = (t/λ̃)λi = t. (4.68)
i=1 i=1
which gives
n
t/λ̃
ENt/λ̃ = Ak (s)λi Ai (s)ds
i=1 0 k=i
t/λ̃
n
= λ̃ Ak (s)ds.
0 k=1
Using this expression together with (4.68), the inequalities 1 − i (1 − qi ) ≤
i qi , and the component unavailability bound (4.22) of Proposition 34, p.
107, (Āi (t) ≤ λi µGi ), we find that
t/λ̃
n
ED = λ̃ [1 − Ai (s)]ds
0 i=1
t/λ̃
n
≤ λ̃ Āi (s)ds
0 i=1
n
≤ λ̃(t/λ̃) λi µGi
k=1
= td.
where aj = λ̃A /αB . Now in view of Remark 14 above and the conditions
of the theorem, it is sufficient to show that D∗ , defined as the expected
number of times system A fails while system B is down, or vice versa,
converges to zero. But noting that the probability that system A (B) is
not functioning is less than or equal to d (the unreliability of a monotone
system is bounded by the sum of the component unreliabilities, which in
its turn is bounded by d, cf. (4.22), p. 107), it is seen that
D∗ ≤ d[ENaA t/λ̃A + ENt/α
B
B ] ≤ d[λ̃ aj t/λ̃
A A B
+ ENt/α B]
j
B
= d[aj t + ENt/α B ].
B
To find a suitable bound on ENt/α B , we need to refer to the argumentation
in the proof of Theorem 59, formulas (4.88) and (4.93), p. 148. Using these
138 4. Availability Analysis of Complex Systems
∗
B → t. Hence, D → 0 and the theorem is
B
results we can show that ENt/α
proved.
assuming that the limit exists. It turns out that it is quite simple to
establish the asymptotic (steady-state) downtime distribution of a parallel
system, so we first consider this category of systems.
where Gαt (y) = P (αt (i) > y|Xi (t) = 0) denotes the distribution of the
forward recurrence time in state 0 of a component. But we know from
(4.14) and (4.15), p. 105, that the asymptotic distribution of Gαt (y) is
given by ∞
y
Ḡ(x)dx
lim Ḡαt (y) = = Ḡ∞ (y). (4.69)
t→∞ µG
Thus we have proved the following theorem.
4.5 Downtime Distribution Given System Failure 139
Theorem 57 Let mi (t) be the renewal density function of Mi (t), and as-
sume that mi (t) is right-continuous and satisfies
1
lim mi (t) = . (4.71)
t→∞ µFi + µGi
where
1/µGi
ci = n (4.72)
k=1 1/µGk
Proof. The proof follows the lines of the proof of Theorem 56, the difference
being that we have to take into consideration which component causes
system failure and the probability of this event given system failure. Clearly,
y∞ Ḡk (x) dx
1 − Ḡi (y)
µ Gk
k=i
(i)
1
λΦ = Āk
µFi + µGi
k=i
represents the expected number of system failures per unit of time caused
by failures of component i, an intuitive argument gives that the asymptotic
140 4. Availability Analysis of Complex Systems
1
n
µGi k=1 Āk
= n 1
n = ci .
l=1 µGl k=1 Āk
Then
c
P (N[t,t+h) (i) = 1)
ci (t) = lim c
h→0+ P (N[t,t+h) = 1)
1
h EN[t,t+h) (i) − oi (1)
c
= lim 1 , (4.73)
h EN[t,t+h) − o(1)
h→0+ c
where
c
oi (1) = E[N[t,t+h) c
(i))I(N[t,t+h) (i) ≥ 2)]/h
and
c
o(1) = E[N[t,t+h) c
)I(N[t,t+h) ≥ 2)]/h.
Hence it remains to study the limit of the ratio of the first terms of (4.73).
Using that
c
EN[t,t+h) (i) = (h(1i , A(s)) − h(0i , A(s))mi (s)ds,
[t,t+h)
where
λK
rk = k .
l λKl
Here λKk and GKk denote the asymptotic (steady-state) failure rate of min-
imal cut set Kk and the asymptotic (steady-state) downtime distribution
of minimal cut set Kk , respectively, when this set is considered in isolation
(i.e., we consider the parallel system comprising the components in Kk ).
We see that rk is approximately equal to the probability that minimal cut
set Kk causes system failure. Refer to [25, 80] for more detailed analyses in
the general case. In [80] it is formally proved that the asymptotic downtime
distribution exists and is equal to the steady-state downtime distribution.
Di (y)
0.014
..
.............. ...........................
0.012 ............ ......
...... ......
...
........ ......
......
...... .....
.
..
....... .....
0.010 ..
..... .....
.....
..
...
. .....
..
....
. .....
.... .....
... ....
0.008 ....
. ...
.
... ...
...
.. .........................
..
.
..
. i=1 ...
...
.. ......................... ...
0.006 .
..
.
.
..
. i=2 ...
...
.. ... ... ...
.... ....... .............. ...
.. ......
. ........
. .
.
...
...
0.004 .
...
.
.
.
.
.... ..
....
. . .........
............
..
...
...
. .. ...
........
..
. .
......
. .............
...
...
.... ......
. ...................................... ...
0.002 ... ...
.... ............ ...
..
. ..
....
. .
. ... ...
. ........
.... ............... ........
........
...
..
.........
. ...............................
0.000 .
.... . y
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
FIGURE 4.3. The Distance Di (y), i = 1, 2, as a Function of y for a Parallel
System of Two Components with Constant Repair Times, µG = 1, λ = 0.1
where Ĝi,Φ (y) equals the “true” downtime distribution of the ith system
failure obtained by Monte Carlo simulations. In Figure 4.3 the distance
measure of the first and second system failure have been plotted as a func-
tion of y for a parallel system of two identical components with constant
repair times and exponential lifetimes. As we can see from the figure, the
distance is quite small; the maximum distance is about 0.012 for i = 1 and
0.004 for i = 2.
Only for some special cases are explicit expressions for the downtime dis-
tribution of the ith system failure known. Below we present such expressions
for the downtime distribution of the first failure for a two-component paral-
lel system of identical components with exponentially distributed lifetimes.
It is seen that the downtime distribution G1,p2 (y) equals the conditional
distribution of Y given that Y > 0. The equality (4.74) follows if we can
4.5 Downtime Distribution Given System Failure 143
show that
∞ s
P (Y > y) = Ḡ(y) 2Ḡ(y + s − x)dF (x) dF (s). (4.76)
0 0
Introducing r = y + s − x gives
∞ ∞
1− 2λe−2λx G(r)λe−λ(r−y) drdx
0 y
∞
=1− G(r)λe−λ(r−y) dr
y
∞
= (1 − e−λ(r−y) )dG(r).
y
Thus the formulas (4.75) and (4.74) in the theorem are identical. This
completes the proof of the theorem.
Now what can we say about the limiting downtime distribution of the
first system failure as the failure rate converges to 0? Is it equal to the
steady-state downtime distribution GΦ ? Yes, for the above example we
can show that if the failure rate converges to 0, the distribution G1,p2 (y)
converges to the steady-state formula, i.e.,
∞
y
Ḡ(r)dr
lim G1,p2 (y) = 1 − Ḡ(y) = GΦ (y).
λ→0 µG
This is seen by noting that
∞ ∞
y
[1 − e−λ(r−y) ]dG(r) y
(r − y)dG(r)
lim ∞ −λr
= ∞
λ→0
0
[1 − e ]dG(r) 0
rdG(r)
144 4. Availability Analysis of Complex Systems
∞
y
Ḡ(r)dr
= ∞ .
0
Ḡ(r)dr
Nu
Yu ≈ Yi ≈ CPu . (4.77)
i=1
To motivate this result, we note that the expected number of system failures
per unit of time when considering calendar time is approximately equal to
the asymptotic (steady-state) system failure rate λΦ , given by (cf. formula
(4.41), p. 115)
n
h(1i , A)−h(0i , A)
λΦ = .
i=1
µFi + µGi
Then observing that the ratio between calendar time and operational time
is approximately 1/h(A), we see that the expected number of system fail-
ures per unit of time when considering operational time, EN op (u, u+w]/w,
op
is approximately equal to λΦ /h(A)Furthermore, N(u,u+w] is “nearly in-
op
dependent” of the history of N up to u, noting that the state pro-
cess X frequently restarts itself probabilistically, i.e., X re-enters the state
(1, 1, . . . , 1). It can be shown by repeating the proof of the Poisson limit
Theorem 50, p. 128, and using the fact that h(A) → 1 as λi µGi → 0,
op
that Nt/α has an asymptotic Poisson distribution with parameter t. The
system downtimes given system failure are approximately identically dis-
tributed with distribution function G(y), say, independent of N op , and
approximately independent observing that the state process X with a high
probability restarts itself quickly after a system failure. The distribution
146 4. Availability Analysis of Complex Systems
0.98
.......
0.96 ........
..........
........
..
..
..
...........
........
.........
0.94 ..........
.........
...
...
.........
.......
.........
0.92 .........
.........
..
....
..........
.......
.........
0.90 ...........
.........
...
..
..
.
.
.............
.
..............
..............
.......... ....
0.88 ......... . ....
...
...
................. ...
..... ..
......... . ....
......... . ...
......... .. ...
0.86 ......... ... ..
......... .... .... .
.... .
... ................................. P (Y 10 ≤ y)
0.84
.... .... .... .... .... P10 (y)
0.82
0.8 y
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
FIGURE 4.4. P10 (y) and P (Y10 ≤ y) for a Parallel System of Two Components
with Constant Repair Times, µG = 1, λ = 0.1
where the equality is given by definition. Formula (4.78) gives good approx-
imations for “typical real life cases” with small component availabilities; see
[91]. Figure 4.4 presents the downtime distribution for a parallel system of
two components with the repair times identical to 1 and µF = 10 using
the steady-state formula GΦ for G (formula (4.70), p. 139). The “true”
distribution is found using Monte Carlo simulation. We see that formula
(4.78) gives a good approximation.
Note that if the first cycle is a “fiasco” cycle, then U1 = 0. Starting from
the beginning of the renewal cycle ν(1) + 1, we define U2 as the time to
the starting point of the next “fiasco” renewal cycle. Similarly we define
U3 , U4 , . . .. The random variables Ui are equal the interarrival times of the
renewal process Nt , i.e.,
∞
k
Nt = I( Ui ≤ t).
k=1 i=1
Using that the process Nt and the random variables Yi are independent,
D
and the fact that Yi1 → G∗Φ (assumption (4.83)), it follows that
Nt/α
Yi1 → CP(t, G∗Φ ).
D
(4.86)
i=1
A formal proof of this can be carried out using Moment Generating Func-
tions.
Next we introduce Nt as the renewal process having interarrival times
with the same distribution as U1 +Sν(1) , i.e., the renewal cycle also includes
the “fiasco” cycle. It follows from the proof of Theorem 46, using condition
(4.81), that Nt/α has the same asymptotic Poisson distribution as Nt/α .
It is seen that
where N(i) equals the number of system failures in the ith “fiasco” cycle.
Note that Nt is at least the number of “fiasco” cycles up to time t, including
the one that is possibly running at t, and Nt equals the number of finished
“fiasco” cycles at time t without the one possibly running at t.
4.6 Distribution of the System Downtime in an Interval 149
Now to prove the result (4.84) we will make use of the following inequal-
ities:
Nt/α Nt/α
Yt/α ≤ Yi1 + Yi2 , (4.89)
i=1 i=1
Nt/α Nt/α
Yt/α ≥ Yi1 − Yi1 . (4.90)
i=1
i=Nt/α +1
In view of (4.86), and the inequalities (4.89) and (4.90), we need to show
that
Nt/α
P
Yi2 → 0, (4.91)
i=1
Nt/α
P
Yi1 → 0. (4.92)
i=Nt/α +1
since
P (Yi2 > ) ≤ P1 (NS ≥ 2) ≤ E1 (NS − 1) → 0
by (4.82). Using Moment Generating Functions it can be shown that (4.91)
holds.
The key part of the proof of (4.92) is to show that (Nt/α ) is uniformly
D
integrable in j (t fixed). If this result is established, then since Nt/α →
Poisson(t) by (4.85) it follows that
ENt/α → t. (4.93)
And because of the inequality (4.87), (Nt/α ) is also uniformly integrable so
that ENt/α → t, and we can conclude that (4.92) holds noting that
P (Nt/α − Nt/α ≥ 1) ≤ ENt/α − ENt/α → 0.
Thus it remains to show that (Nt/α ) is uniformly integrable.
l
Let FU denote the probability distribution of U and let Vl = i=1 Ui .
Then we obtain
∞
E[Nt/α I(Nt/α ≥ k)] = P (Nt/α ≥ l) + (k − 1)P (Nt/α ≥ k)
l=k
150 4. Availability Analysis of Complex Systems
∞
= P (Vl ≤ t/α) + (k − 1)P (Vk ≤ t/α)
l=k
∞
= FU∗l (t/α) + (k − 1)FU∗k (t/α)
l=k
∞
≤ (FU (t/α))l + (k − 1)(FU (t/α))k
l=k
(FU (t/α))k
= + (k − 1)(FU (t/α))k .
1 − FU (t/α)
Since FU (t/α) → 1−e−t , as j → ∞, it follows that for any sequence Fij , Gij
satisfying the conditions (4.79)−(4.82), (Nt/α ) is uniformly integrable. To
see this, let be given such that 0 < < e−t . Then for j ≥ j0 (say) we
have
(1 − e−t + )k
sup E[Nt/α I(Nt/α ≥ k)] ≤ + (k − 1)(1 − e−t + )k .
j≥j0 e−t −
Consequently,
lim sup E[Nt/α I(Nt/α ≥ k)] = 0,
k→∞ j
i.e., (Nt/α ) is uniformly integrable, and the proof is complete.
Asymptotic Normality
We now study convergence to the normal distribution. The theorem below
is “a time average result” − it is not required that the system is highly
available. The result generalizes (4.32), p. 111.
Proof. The result (4.94) follows by applying the Central Limit Theorem
for renewal reward processes, Theorem 123, p. 251, in Appendix B.
where Y1 is the downtime of the first system failure (note that Y1 = Y11 ).
The idea used to establish (4.97) is the following: As before, let S be equal
to the time of the first return to the best state (1, 1, . . . , 1). Then (4.97)
follows by using (4.95), (4.96), Ā ≈ 0, the fact that Y ≈ Y1 if a system
failure occurs in the renewal cycle, the probability of two or more failures
occurring in the renewal cycle is negligible, and λΦ = ENS /ES (by the
Renewal Reward Theorem, p. 250). We obtain
Φt = Φ(Xt , Dt ).
Performance Measures
The performance measures introduced in Section 4.1, p. 98, can now be
generalized to the above model.
P (NJ ≤ k), k ∈ N0 ,
ENJ ,
P (Φt ≥ Dt , ∀t ∈ J) = P (NJ = 0).
c) Let
YJ = (Dt − Φt ) dt
J
= Dt dt − Φt dt.
J J
P (YJ ≤ y), y ∈ R+ ,
EYJ
,
|J|
E J Φt dt
, (4.98)
E J Dt dt
where |J| denotes the length of the interval J. The measure (4.98) is
called throughput availability.
4.7 Generalizations and Related Models 153
d) Let
1
ZJ = I(Φt ≥ Dt ) dt.
|J| J
The random variable Z represents the portion of time the throughput
rate equals (or exceeds) the demand rate. We consider the following
performance measures related to ZJ
P (ZJ ≤ y), y ∈ R+ ,
EZJ .
As in the binary case we will often use in practice the limiting values of
these performance measures.
The above performance measures are the most common measures used in
reliability studies of offshore oil and gas production and transport systems,
see, e.g., Aven [16]. In particular, the throughput availability is very much
used when predicting the performance of various design options. For eco-
nomic analysis and as a basis for decision-making, however, it is essential
to be able to compute the total distribution of the throughput loss, and not
only the mean. The measures related to the number of times the system is
below a certain demand level is also useful, but more from an operational
and safety point of view.
Computation
We now briefly look into the computation problem for some of the mea-
sures defined above. To simplify the analysis we shall make the following
assumptions:
Assumptions
1. J = [0, u].
2. The demand rate Dt equals the maximum throughput rate ΦM for
all t.
3. The n component processes (Xt (i)) are independent. Furthermore,
with probability one, the n component processes (Xt (i)) make no
transitions (“jumps”) at the same time.
4. The process (Xt (i)) generates an alternating renewal process Ti1 , Ri1 ,
Ti2 , Ri2 , . . ., as described in Section 4.2, p. 99, where Tim represents
the time spent in the state xiMi during the mth visit to this state,
and Rim represents the time spent in the states {xi0 , xi1 , . . . , xi,Mi −1 }
during the mth visit to these states.
154 4. Availability Analysis of Complex Systems
exist.
Arguing as in the binary case we can use results from regenerative and
renewal reward processes to generalize the results obtained in the previous
sections. To illustrate this, we formulate some of these extensions below.
The proofs are omitted. We will focus here on the asymptotic results. Refer
to Theorems 39, p. 113, and 42, p. 114, for the analogous results in the
binary case. We need the following notation:
µi = ETim + ERim
k
Nt = N[0,t] (k is fixed)
pir (t) = P (Xt (i) = xir ); if t is fixed, we write pir and X(i)
p = (p10 , . . . , pnMn )
a = (a10 , . . . , anMn )
Φk (X) = I(Φ(X) ≥ Φk )
hk (p) = EΦk (X)
h(p) = EΦ(X)
(1ir , p) = p with pir replaced by 1 and pil = 0 for l
= r.
Theorem 62 Let
γilr = hk (1il , a) − hk (1ir , a)
and let filr denote the expected number of times component i makes a tran-
sition from state xil to state xir during a cycle of component i. Assume
filr < ∞. Then the expected number of times the system state is below Φk
per unit of time in the long run equals
The limit (4.99) is denoted λΦ . If the random variables Tim are expo-
nentially distributed, then X is regenerative, cf. Theorem 44, p. 116.
It is also possible to extend the asymptotic results related to the distri-
bution of the number of system failures at level k, and the distribution of
the lost volume (downtime). We can view the system as a binary system of
binary components, and the asymptotic results of Sections 4.4−4.6 apply.
From this we find that the asymptotic unavailability Ā, given by formula
(4.2), p. 101, for a train equals 0.027, assuming 8760 hours per year. The
number of system failures per unit of time is given by the system failure
rate λΦ . For alternative i) there is only one failure level and λΦ = 13. For
alternative ii) we must distinguish between failures resulting in production
below 100% and below 50%. The system in these two cases can be viewed
as a series system of the two trains and a parallel system of the two trains,
respectively. Hence the system failure rate for these levels is approximately
equal to 26 and 0.7, respectively. Note that for the latter case (cf. (4.64),
p. 133),
λΦ ≈ 2 · Ā · 13.
Using that the number of system failures is approximately Poisson dis-
tributed, we can compute the probability that a certain number of failures
156 4. Availability Analysis of Complex Systems
occurs during a specific period of time. For example, we find that for alter-
native ii) there is a probability of about e−0.7 = 0.50 of having no complete
shutdowns during a year.
Let EY denote the asymptotic mean lost production relative to the de-
mand. For alternative i) it is clear that EY equals 0.027, observing that a
failure results in 100% loss and the unavailability equals 0.027. For alterna-
tive ii), we obtain the same value for the asymptotic mean lost production,
as is seen from the following calculation
The first term in the sum represents the contribution from failures leading
to 50% loss, whereas the second term represents the contribution from fail-
ures leading to 100% loss. The latter contribution is in practice negligible
compared to the former one. To compute the distribution of the lost pro-
duction, we need to know more about the distribution of the repair time
R of the train. It was assumed in this application that ER2 = 1000, which
corresponds to a squared coefficient of variation equal to 1.9 and a standard
deviation equal to 25.7. The unit of time is hours. This assumption makes
it possible to approximate the distribution of the lost production during a
year, using the normal approximation. We know the mean (EY = 0.027)
and need to estimate the variance of Y . To do this we make use of (4.97),
p. 151, stating that the variance in the binary case is approximately equal
to λΦ EY12 /t, where t is the length of the time period considered and Y1 is
the downtime of the first system failure. For alternative i) we find that the
variance equals approximately
and for alternative ii) (we ignore situations with both components down so
that the lost production is approximately 50% of the downtime)
From this we estimate, for example, that the probability that the lost pro-
duction during one year is more than 4% of demand, to be 0.16 for alter-
native i) and 0.08 for alternative ii).
scale parameter), cf., e.g., Asmussen [9] and Tijms [168]. It is common to use
a mixture of two Erlang distributions with the first two moments matching
the distribution considered. Now assume that the lifetime of component i,
Fi , can be described by the sum of Mi random variables, each of which
is exponentially distributed with rate λi0 , i.e., the lifetime of component
i is Erlangian distributed with parameters λi0 and Mi . Then we have a
situation that fits into the above multistate framework and the asymptotic
results can be applied. The state space for component i is {0, 1, ..., Mi }.
The component process (Xt (i)) starts in state Mi , it stays there a time
governed by an exponential random variable with rate λi0 and jumps to
state Mi − 1, it stays there a time governed by an exponential random
variable with rate λi0 and jumps to state Mi − 2, and this continues until
the process reaches state 0. After a duration having distribution Gi in state
0 it returns to state Mi . We see that filr = 1 if l = r + 1 and filr = 0
otherwise (for r < l). Furthermore,
1
µi = Mi + µGi = µFi + µGi ,
λi0
n
1 n
1
λΦ = [h1 (1i1 , a) − h1 (1i0 , a)] = [h(1i , A) − h(0i , A)] ,
i=1
µ i i=1
µi
µGi
ai0 = = Āi ,
µi
using the terminology from the binary theory. Remember that the formulas
established in Sections 4.2 and 4.3 for the expected cycle length and the
steady-state (un)availability of component i, and the system failure rate,
are applicable also for nonexponential distributions.
Thus by modifying the state space, we have been able to extend the
results, i.e., the Theorems 46 (p. 121), 50 (p. 128), and 59 (p. 147), in the
previous sections to Erlang distributions.
Now assume that the lifetime distribution of component i is a mixture of
Erlang distributions, i.e., with probability pir > 0 the distribution equals
an Erlang distribution with parameters λi0 and Mir , r = 1, 2, . . . , ri . This
situation can be analyzed as above with the state space for component
i given by {0, 1, ..., Mi }, where Mi =maxr {Mir }. If the component state
process (Xt (i)) is in state 0, it will go to state Mir with probability pir . Then
the component stays in this state for a time governed by an exponential
distribution with parameter λi0 , before it jumps to state Mir − 1, etc.
As above we can use the formulas for the binary case to compute the
expected cycle length and steady-state (un)availability of component i,
and the system failure rate. It is seen that
ri
1
µF i = pir Mir .
r=1
λi0
158 4. Availability Analysis of Complex Systems
We can conclude that the set-up also covers mixtures of Erlang distribu-
tions, and Theorems 46, 50, and 59 apply.
Note that we have not proved that the limiting results obtained in the
previous sections hold true for general lifetime distributions Fij . We have
shown that if the distributions Fij all belong to a certain class of mixtures
of Erlang distributions, then the results hold. Starting from general distri-
butions Fij , we can write Fij as a limit of Fijr , r → ∞, where Fijr are
mixtures of Erlang distributions. But interchanging the limits as j → ∞
and as r → ∞ is not justified in general. Refer also to Bibliographic Notes,
p. 166, for some comments related to the non-exponential case.
1 µG
ES = + .
2λ 1 − q
Hence
ENS q/(1 − q)
λΦ = =
ES ES
2λq
= .
1 − q + 2λµG
observing that if the state is i and the repair is completed at time s, then
the probability that the process jumps to state j, where j ≤ i ≤ n − 1,
equals the probability that i − j + 1 components fail before s and j − 1
components survive s; and, furthermore, if the state is i and the repair is
completed at time s, then the probability that the process jumps to state
i + 1 equals the probability that i components survive s. Now if the process
is in state n, it stays there for an exponential time with rate nλ, and jumps
to state n − 1.
Having established the transition probabilities, we can compute a number
of interesting performance measures for the system using results from semi-
Markov theory. For example, we have an explicit formula for the asymptotic
probability that P (Φt = k) as t → ∞, which depends on the mean time
spent in each state and the limiting probabilities of the embedded discrete-
time Markov chain; see Ross [145], p. 104.
Model
The following assumptions are made:
Let Φt denote the state of the system at time t, i.e., the number of
components functioning at time t (Φt ∈ {n, n−1, . . . , 0}). For repair regime
R1, Φ is generally a regenerative process, or a modified regenerative process.
For a two-component system it is seen that the time points when Φ jumps
to state 1 are regenerative points, i.e., the time points when i) the operating
component fails and the second component is not under repair (the process
jumps from state 2 to 1) or ii) both components are failed and the repair of
the component being repaired is completed (the process jumps from state 0
to 1). For n > 2, the points in time when the process jumps from state 0 to
1 are regenerative points, noting that the situation then is characterized by
one “new” component, and n − 1 in a repair queue. Assuming exponential
lifetimes, we can define other regenerative points, e.g., consecutive visits to
the best state n, or consecutive visits to state n − 1.
Also for a two-component system under repair regime R2, the process
generally generates a (modified) regenerative process. The regenerative
4.7 Generalizations and Related Models 161
points are given by the points when the process jumps from state 2 to
1 (case i) above). If the system has more than two components (n > 2),
the regenerative property is not true for a general failure time distribution.
However, under the assumption of an exponential time to failure, the pro-
cess is regenerative. Regenerative points are given by consecutive visits to
state n, or points when the process jumps from state n to state n − 1. In
the following, when considering a system of more than two components, we
assume an exponential lifetime distribution. Remember that a cycle refers
to the length between two consecutive regenerative points.
Performance Measures
The system can be considered as a special case of a multistate monotone
system, with the demand rate Dt set to n − 1. Hence the performance
measures defined in Section 4.7.1, p. 151, also apply to the system analyzed
in this section. Availability refers to the probability that at least n − 1
components are functioning, and system failure refers to the event that the
state process Φ is below n − 1. Note that we cannot apply the computation
results of Section 4.7.1 since the state processes of the components are
not stochastically independent. The general asymptotic results obtained in
Sections 4.4–4.6 for regenerative processes are however applicable.
Of the performance measures we will put emphasis on the limiting avail-
ability, and the limiting mean of the number of system failures in a time
interval.
We need the following notation for i = n, n − 1, . . . , 0:
pi (t) = P (Φt = i),
pi = lim pi (t),
t→∞
provided the limits exist. Clearly, the availability at time t, A(t), is given
by
A(t) = pn (t) + pn−1 (t)
and the limiting availability, A, is given by
A = pn + pn−1 .
Computation
First, we focus on the limiting unavailability Ā, i.e., the expected portion
of time in the long run that at least two components are not functioning.
Under the assumption of constant failure and repair rates this unavailability
can easily be computed using Markov theory, noting that Φ is a birth and
death process. The probability p̃i of having i components down is given by
(cf. [16], p. 303)
z
p̃i = pn−i = in , (4.100)
1 + j=1 zj
162 4. Availability Analysis of Complex Systems
where
(n−1)(n−1)!
i 1 δi i = 1, 2, . . . , n
zi = (n−i)! l=1 ul
1 i=0
δ = µG /µF
ul = 1 under repair regime R1 and l under repair regime R2.
(n − 1)2 2
Ā ≈ p̃2 ≈ δ . (4.101)
u2
We can also write
(n − 1)2 2
Ā = δ + o(δ 2 ), δ → 0.
u2
In general we can find expressions for the limiting unavailability by using
the regenerative property of the process Φ. Defining Y and S as the system
downtime in a cycle and the length of a cycle, respectively, it follows from
the Renewal Reward Theorem (Theorem 122, p. 250, in Appendix B) that
EY
Ā = . (4.102)
ES
Here system downtime corresponds to the time two or more of the com-
ponents are not functioning. Let us now look closer into the problem of
computing Ā, given by (4.102), under repair regime R1.
E[max{R − T, 0}]
Ā = (4.103)
ET + E[max{R − T, 0}]
(µG − w)
= , (4.104)
µF + (µG − w)
where
∞
w = E[min{R, T }] = F̄ (t)Ḡ(t) dt,
0
4.7 Generalizations and Related Models 163
δ 3 ER3
0 ≤ Ā − Ā ≤ (Ā )2 + . (4.107)
6 µ3G
Hence Ā overestimates the unavailability and the error term will be
negligible provided that δ = µG /µF is sufficiently small.
Next, let us compare the approximation formula Ā with the standard
“Markov formula” ĀM = δ 2 , obtained by assuming exponentially dis-
tributed failure and repair times (replace c2G by 1 in the expression (4.106)
for Ā , or use the Markov formula (4.101), p. 162). It follows that
1
Ā = ĀM · [1 + c2G ]. (4.110)
2
From this, we see that the use of the Markov formula when the squared
coefficient of variation of the repair time distribution, c2G , is not close to 1,
will introduce a relatively large error. If the repair time is a constant, then
c2G = 0 and the unavailability using the Markov formula is two times Ā . If
c2G is large, say 2, then the unavailability using the Markov formula is 2/3
of Ā .
Assume now n > 2. The repair regime is R1 as before. Assume that
δ is relatively small. Then it is possible to generalize the approximations
obtained above for n = 2.
Since δ is small, there will be a negligible probability of having Φ ≤
n − 3, i.e., three or more components not functioning at the same time. By
neglecting this possibility we obtain a simplified process that is identical
to the process for the two-component system analyzed above, with failure
rate (n − 1)λ. Hence by replacing λ with (n − 1)λ, formula (4.105) is valid
for general n, i.e., Ā ≈ Ā , where
[(n − 1)δ]2
Ā = [1 + c2G ].
2
The error bounds are, however, more difficult to obtain, see [30].
The relation between the approximation formulas Ā and ĀM , given by
(4.101), p. 162, are the same for all n ≥ 2. Hence Ā = ĀM · 12 [1 + c2G ]
(formula (4.110)) holds for n > 2 too.
Next we will establish results for the long run average number of system
failures. It follows from the Renewal Reward Theorem that ENt /t and
E[Nt+s − Nt ]/s converge to λΦ = EN/ES as t → ∞, where N equals the
number of system failures in one renewal cycle and S equals the length of
the cycle as before. With probability one, Nt /t converges to the same value.
Under repair regime R1, N ∈ {0, 1}. Hence EN equals the probability that
4.7 Generalizations and Related Models 165
the system fails in a cycle, i.e., EN = q using the terminology of Sections 4.3
and 4.4. Below we find expressions for λΦ in the case that the repair regime
is R1. The regenerative points are consecutive visits to state n − 1.
Theorem 65 If n = 2, then
q
λΦ = , (4.111)
µF + EY
where
∞
q = F (t)dG(t), (4.112)
0
∞
EY = F (t)Ḡ(t) dt.
0
Proof. First note that EY equals the expected downtime in a cycle and
is given by
ES = µF + EY. (4.113)
Suppose the system has just jumped to state 1. We then have one compo-
nent operating and one undergoing repair. Now if a system failure occurs
(i.e., T ≤ R), then the cycle length equals R, and if a system failure does
not occur (i.e., T > R), then the cycle length equals T . Consequently,
We see from (4.111) that if F (t) is exponential with rate λ and the
components are highly available, then
λΦ ≈ λ2 µG .
If n > 2 and the repair regime is R1, it is not difficult to see that q is
given by (4.112) with F (t) replaced by 1 − e−(n−1)λt . It is however more
difficult to find an expression for ES. For highly available components,
166 4. Availability Analysis of Complex Systems
we can approximate the system with a two-state system with failure rate
(n − 1)λ; hence,
λΦ ≈ [(n − 1)λ]2 µG ,
1
ES ≈ .
(n − 1)λ
When the state process of the system jumps from state n to n − 1, it will
return to state n with a high probability and the sojourn time in state n−1
will be relatively short; consequently, the expected cycle length is approx-
imately equal to the expected time in the best state n, i.e., 1/(n − 1)λ.
1
N t
Ct = Zτ ,
t i=1 i
The objective is to find the replacement age that minimizes this long run
average cost per unit time. Inserting the cost function Zt = c + kI(T ≤ t)
we get
c + kF (s)
Ks = s . (5.1)
0
(1 − F (x))dx
Now elementary analysis can be used to find the optimal replacement age
s, i.e., to find s∗ with
In most cases the renewal function is not known explicitly. In such a case
asymptotic expansions like Theorem 114, p. 247 in Appendix B or numerical
methods have to be used. As is to be expected in the case of an Exp(λ)
distribution, preventive replacements do not pay: M (s) = λs and s∗ = ∞.
d M (s) c
M (s) = + .
ds s s(c + k)
The solution s∗ is finite if and only if c/(c + k) < 1/4, i.e., if failure replace-
ments are at least four times more expensive than preventive replacements.
The age and block replacement policies will result in a finite optimal value
of s only if there is some aging and wear-out of the units, i.e., in probabilistic
terms the lifetime distribution F fulfills some aging condition like IFR,
NBU, or NBUE (see Chapter 2 for these notions). To judge whether it pays
to follow a certain policy and in order to compare the policies it is useful
to consider the number of failures and the number of planned preventive
replacements in a time interval [0, t].
This result says that IFR is characterized by the reasonable aging con-
dition that the number of failures is growing with increasing replacement
age. Somewhat weaker results hold true for the block policy (see Shaked
and Zhu [154] for proofs):
c + R(s)
Ks = , (5.3)
s
s
where c is the cost of a preventive replacement and R(s) = 0 r(x)dx
denotes the total expected costs due to deterioration over an interval of
length s. The derivative r, called the (marginal ) deterioration cost rate,
is assumed to be continuous and piecewise differentiable. If in the block
replacement model of the preceding Section 5.1.2 the lifetime distribution
function F has a bounded density f, then it is known (see Appendix B, p.
248) that also the corresponding
s renewal function M admits a density m
and we have R(s) = 0 (c + k)m(x)dx, which shows that this is a special
case of this block-type model. Now certain properties of the marginal cost
rate can be carried over to the cost function K. The proof of the following
theorem is straightforward and can be found in [69].
(a) lim r(t) = ∞ or (b) lim r(t) = a and lim (at − R(t)) > c,
t→∞ t→∞ t→∞
s
c+ r(x)F̄ (x)dx
Ks = 0s , (5.4)
0
F̄ (x)dx
This shows that behind these two quite different models the same opti-
mizations mechanism works. This has been exploited by Aven and Bergman
in [22] (see also [23]). They recognized that for many replacement models
the optimization criterion can be written in the form
% τ &
E 0 at ht dt + c0
% τ & , (5.5)
E 0 ht dt + p0
Without any conditions on the structure of the process Z one cannot hope
to find an explicit solution of the stopping problem. A condition called
monotone case in the discrete time setting can be transferred to continuous
time as follows.
ζ = inf{t ∈ R+ : ft ≤ 0}
EZζ = sup{EZτ : τ ∈ C F }.
176 5. Maintenance Optimization
For the latter equality we make use of our agreement that σ(·) denotes
the completion
of the generated σ-algebra so that, for instance, the event
{ρ = t} = n∈N {t − n1 < ρ ≤ t} is also included in σ(ρ ∧ t). Then we define
so that the more general conditions mentioned in the above remark are
fulfilled with Mζ = 0. This yields
EZζ = 1 = sup{EZτ : τ ∈ C F }.
For τ ∈ CTF we consider the ratio Kτ in (5.7). The stopping problem is then
to find a stopping time σ ∈ CTF , with
As in Section 3.1 we use the short notation Z = (f, M ) and X = (g, L). Al-
most all of the stochastic processes used in applications without predictable
jumps admit such SSM representations. The following general assumption
is made throughout this section:
5.2 A General Replacement Model 179
Assumption (A). Z = (f, M ) and X = (g, L) are SSMs with EZ0 >
0, EX0 ≥ 0, gs > 0 for all s ∈ R+ and M T , LT ∈ M0 are uniformly
integrable martingales, where MtT = Mt∧T , LTt = Lt∧T .
Remember that all relations between real random variables hold (only)
P -almost surely. The first step to solve the optimization problem is to
establish bounds for K ∗ in (5.8)
Proof. Because T ∈ CTF only the lower bound has to be shown. Since the
martingales M T and LT are uniformly integrable, the optional sampling
theorem (see Appendix A, p. 231) yields EMτ = ELτ = 0 for all τ ∈ CTF
and therefore
EZ0 + qE[Xτ − X0 ] EZ0 − qEX0
Kτ ≥ = + q ≥ bl .
EXτ EXτ
The lower bound is derived observing that EX0 ≤ EXτ ≤ EXT , which
completes the proof.
The following example gives these bounds for the basic age replacement
policy.
where λ is the usual failure rate of the lifetime T . It follows that the pro-
cesses Z and X have representations
t
Zt = c + I(T > s)kλ(s)ds + Mt , Mt = kMt
0
180 5. Maintenance Optimization
and t
Xt = t = ds.
0
EZT c+k
bu = = ,
EXT ET
c
bl = + kλ(0).
ET
These bounds could also be established directly by using (5.1), p. 170.
The benefit of Lemma 72 lies in its generality, which also allows the bounds
to be found in more complex models as the following example shows.
Nt
Rt = Vn ,
n=1
∞
where Nt = n=1 I(Tn ≤ t) is the number of shocks arrived until t. The
lifetime of the system is modeled as the first time Rt reaches a fixed thresh-
old S > 0 :
T = inf{t ∈ R+ : Rt ≥ S}.
We stick to the simple cost structure of the basic age replacement model,
i.e.,
Zt = c + kI(T ≤ t).
But now we want to minimize the expected costs per number of arrived
shocks in the long run, i.e.,
Xt = N t .
This cost criterion is appropriate if we think, for example, of systems which
are used by customers at times Tn . Each usage causes some random dam-
age (shock). If the customers arrive with varying intensities governed by
external circumstances, e.g., different intensities at different periods of a
day, it makes no sense to relate the costs to time, and it is more reasonable
to relate the costs to the number of customers served.
5.2 A General Replacement Model 181
Equally for x < K and v(x) = −EZ0 we have v(x) < v(K ∗ ) = 0 because
∗
Corollary 74 If (ft ) and (gt ) are deterministic with inverse of the ratio
r−1 (x) = inf{t ∈ R+ : rt = ft /gt ≥ x}, x ∈ R, and X0 ≡ 0, then σ = t∗ ∧ T
is optimal with t∗ = r−1 (K ∗ ) ∈ R+ ∪ {∞} and
r −1 (x)
K ∗ = inf{x ∈ R : (xgs − fs )P (T > s)ds ≥ EZ0 }.
0
EZ0
K∗ = + r0 and σ = T.
EXT
EZτ E Ẑτ
Kσ = inf{Kτ = : τ ∈ CζA } = inf{Kτ = : τ ∈ CζA }.
EXτ E X̂τ
since all A-stopping times are also F-stopping times, and the question, to
what extent the information level influences the cost minimum, has to be
investigated.
5.3 Applications
The general set-up to minimize the ratio of expectations allows for many
special cases covering a variety of maintenance models. Some few of these
will be presented in this section, which show how the general approach can
be exploited.
EZσ
K ∗ = Kσ = = inf{Kσ : τ ∈ CTF },
EXσ
c c+1
bl = + λmin ≤ K ∗ ≤ bu = .
ET ET
186 5. Maintenance Optimization
2α 2α
bl = c, bu = (c + 1).
3 3
Since Ab and Ac are subfiltrations of F and include Ad as a subfiltration,
we must have for the optimal stopping values
equivalent to c ≤ 12 ;
3
• α < x. In this case we have ρx = T, Eρx = 2α , EZρx = c + 1, such
that x∗ = bu , x∗ > α is equivalent to c > 12 .
The other information levels are treated in a similar way. Only case
b) needs some special attention because the failure rate process λ̂b is no
longer monotone but only piecewise nondecreasing. To meet the (bl , bu )-
increasing condition, we must have λ̂bh < bl , i.e., 2α(1−(2−e−αh )−1 ) < 2α
3 c.
3
This inequality
holds for all h ∈ R + , if c ≥ 2 and for h < h(α, c) =
− α1 ln 3−2c 3
3−c , if 0 < c < 2 .
We summarize these considerations in the following proposition the proof
of which follows the lines above and is elementary but not straightforward.
1
Proposition 75 For 0 < c ≤ 2 the optimal stopping times and values K ∗
are
a) Ka∗ = 2αc, σa = X1 ∧ X2 ;
αh 2
c+(1−e )
b) Kb∗ = α 0.5+(1−e αh )2 , σb = ((X1 ∧ X2 ) ∨ h) ∧ T, if 0 < h < h(α, c);
√ √
c) Kc∗ = α 2c, σc = X1 ∧ − α1 ln 1 − 2c − 2c ;
'
'
∗ c2 1 c2
4 + c − 2 , σd = T ∧ − a ln 1 − 2 −
c c
d) Kd = 2α 4 +c .
1
For c > 2 we have on all levels K ∗ = bu and σ = T.
188 5. Maintenance Optimization
For decreasing c the differences between the cost minima increase. If the
costs c for a preventive replacement are greater than half of the penalty
costs, i.e., c > 12 k = 12 , then extra information and preventive replacements
are not profitable.
Now F is the internal history generated by (Tn , Vn ) and (λt ) the F-intensity
of (Nt ). The costs of a preventive replacement are c > 0 and for a replace-
ment at failure c + k, k > 0, which results in a cost process Zt = c + kI(T ≤
t). The aim is to minimize the expected cost per arriving shock in the long
run, i.e., to find σ ∈ CTF with
EZτ
K ∗ = Kσ = inf{Kτ = , τ ∈ CTF },
EXτ
where Xt = Nt . The only assumption concerning the shock arrival process
is that the intensity λ is positive: λt > 0 on [0, T ). According to Example
46 and Section 3.3.3 we have the following SSM representations:
t
Zt = c + I(T > s)kλs F̄ ((S − Rs )−)ds + Mt ,
0
t
Xt = λs ds + Lt .
0
Then the cost rate process r is given on [0, T ) by rt = k F̄ ((S −Rt )−), which
is obviously nondecreasing. Under the integrability assumptions of Theorem
73, p. 183, we see that the optimal stopping time is σ = ρx∗ = inf{t ∈ R+ :
rt ≥ x∗ }, where the limit x∗ = inf{x ∈ R : xEXρx − EZρx ≥ 0} = K ∗ has
to be found numerically. Thus the optimal stopping time is a control-limit
rule for the process (Rt ) : Replace the system the first time the accumulated
damage hits a certain control limit.
5.3 Applications 189
ρx = inf{t ∈ R+ : k exp{−ν(S − Rt )} ≥ x} ∧ T
1 x
= inf{t ∈ R+ : Rt ≥ ln( ) + S} ∧ T.
ν k
We set g(x) = ν1 ln( xk ) + S and observe that ρx = inf{t ∈ R+ : Rt ≥ g(x)},
if 0 < x ≤ k. For such values of x we find
EXρx = νg(x) + 1,
EZρx = c + kP (T = ρx ) = c + ke−ν(S−g(x)) = c + x.
If all failure rates λt (i) are of IFR-type, then the F-failure rate process λ
and the ratio process r are nondecreasing. Therefore, Theorem 73, p. 183,
can be applied to yield σ = ρx∗ . So the optimal stopping time is among
the control-limit rules
ρx = inf{t ∈ R+ : rt ≥ x} ∧ T
α
= inf{t ∈ R+ : λt ≥ (c + x)} ∧ T.
k
This means: Replace the system the first time the sum of the failure rates of
critical components reaches a given level x∗ . This level has to be determined
as
If the A-failure rate process λ̂t = E[λt |At ] is (bl , bu )-increasing, then the
stopping problem can also be solved on the lower information level by
means of Theorem 73. We want to carry out this in more detail in the next
section, allowing also for dependencies between the component lifetimes. To
keep the complexity of the calculations on a manageable level, we confine
ourselves to a two-component parallel system.
In the following it is assumed that βi ≤ β̄i , i ∈ {1, 2}, and β̄1 ≤ β̄2 ,
i.e., after failure of one component the stress placed on the surviving one
is increased. Without loss of generality the penalty costs for replacements
after failures are set to k = 1. The solution of the stopping problem will be
outlined in the following. More details are contained in [93].
Inserting q = −c + β12 α−1 in (5.15) we get the bounds for the stopping
value K ∗
cv β12 (c + 1)v
bl = + and bu = ,
1−v α 1−v
where v = E[e−αT ] can be determined by means of the distribution H.
Since the failure rate process is monotone on [0, T ) the optimal stopping
194 5. Maintenance Optimization
1
x∗1 = (cβ + β12 ),
α
1 (c + 1)β2 β̄1 − cβ1 β2
x∗2 = c(β1 + β12 ) + β12 + ,
α β̄1 + β2 + β12 + α
x∗3 = bu .
The explicit formulas for the optimal stopping value were only presented
here to show how the procedure works and that even in seemingly sim-
ple cases extensive calculations are necessary. The main conclusion can be
drawn from the structure of the optimal policy. For small values of c (note
that the penalty costs for failures are k = 1) it is optimal to stop and
replace the system at the first component failure. For mid-range values of
c, the replacement should take place when the “better” component with a
lower residual failure rate (β̄1 ≤ β̄2 ) fails. If the “worse” component fails
first, this results in an replacement after system failure. For high values of
c, preventive replacements do not pay, and it is optimal to wait until system
failure. In this case the optimal stopping value is equal to the upper bound
x∗ = b u .
The paths of the failure rate process λ depend only on the observable
component lifetime T1 and not on T2 . The paths are nondecreasing so that
the same procedure as before can be applied. For γ1 = β1 + β2 − β̄1 > 0
the following results can be obtained:
⎧
⎨ T1 ∧ b∗ for 0 < c ≤ c1
ρx∗ = T1 for c1 < c ≤ c2
⎩
T for c2 < c
⎧ ∗
⎨ x1 for 0 < c ≤ c1
x∗ = x∗ for c1 < c ≤ c2
⎩ 2∗
x3 for c2 < c.
The constants c1 , c2 and the stopping values x∗2 , x∗3 are the same as in the
complete information case. What is optimal on a higher information level
196 5. Maintenance Optimization
Information about T
On this lowest level B, no additional information about the state of the
components is available up to the time of system failure. The failure rate
is deterministic and can be derived from the distribution H:
d
λt = − (ln(1 − H(t))).
dt
In this case the replacement times ρx = T ∧ b, b ∈ R+ ∪ {∞}, are the
well-known age replacement policies. Even if λ is not monotone, such a
policy is optimal on this B-level. The optimal values b∗ and x∗ have to be
determined by minimizing Kρx as a function of b.
Numerical Examples
The following tables show the effects of changes of two parameters, the
replacement cost parameter c and the “dependence parameter” β12 . To be
able to compare the cost minima K ∗ = x∗ , both tables refer to the same
set of parameters: β1 = 1, β2 = 3, β̄1 = 1.5, β̄2 = 3.5, α = 0.08. The optimal
replacement times are denoted:
Table 1 shows the cost minima x∗ for different values of c. For small
values of c, the influence of the information level is greater than for mod-
erate values. For c > 1.394 preventive replacements do not pay, additional
information concerning T is not profitable.
Table 2 shows how the cost minimum depends on the parameter β12 . For
increasing values of β12 the difference between the cost minima on different
information levels decreases, because the probability of a common failure of
both components increases and therefore extra information about a single
component is not profitable.
assume the items to have different failure rates during and after burn-in,
λ0j (t) and λ1j (t), respectively, where it is supposed that λ0j (t) ≥ λ1j (t) for all
t ≥ 0. We assume that the lifelength Tj of the jth item admits the following
representation:
t
I(Tj ≤ t) = I(Tj > s)λYj s (s)ds + Mt (j), j = 1, . . . , m, (5.18)
0
where Yt = I(τ < t), τ is the burn-in time and M (j) ∈ M is bounded in
L2 .
This representation can also be obtained by modeling the lifelength of
the jth item in the following way:
Since we assume that the failure time of any item can be observed during
the burn-in phase, the observation filtration, generated by the lifelengths
of the items, is given by
In other words, at any time t the observer has to decide whether to stop
or to continue with burn-in with respect to the available information up to
time t. Since Z is not adapted to F, i.e., Zt cannot be observed directly, we
consider the conditional expectation
m
Ẑt = E[Zt |Ft ] = c I(Tj > t)E[(Tj − t)+ |Tj > t] − mcF
j=1
m
+(cF − cB ) I(Tj ≤ t). (5.21)
j=1
As an abbreviation we use
∞
+ 1
µj (t) = E[(Tj − t) |Tj > t] = F̄j (x)dx, t ∈ R+ ,
F̄j (t) t
for the mean residual lifelength. The derivative with respect to t is given
by µj (t) = −1 + λ1j (t)µj (t). We are now in a position to apply Theorem
71, p. 175, and formulate conditions under which the monotone case holds
true.
Then
m
ζ = inf{t ∈ R+ : I(Tj > t)gj (t) ≤ 0}
j=1
EZζ = sup{EZτ : τ ∈ C F }.
Substituting
s
I(Tj > s) = 1 + (−I(Tj > x)λ0j (x))dx + Mj (s)
0
t
µj (t)I(Tj > t) = µj (0) + [−µj (s)I(Tj > s)λ0j (s) + I(Tj > s)µj (s)]ds
0
t
+ µj (s)dMj (s)
0
t
% &
= µj (0) + I(Tj > s) −1 − µj (s)(λ0j (s) − λ1j (s)) ds
0
+M̃j (t),
m t
m
= −mcF + c µj (0) + I(Tj > s)gj (s)ds + Lt
j=1 0 j=1
Sincefor all ω ∈ Ω and all t ∈R+ , there exists some J ⊆ {1, . . . , m} such
m
that j=1 I(Tj > t)gj (t) = j∈J gj (t), condition (5.22) in the theorem
ensures that the monotone case (MON), p. 175, holds true. Therefore we
get the desired result by Theorem 71 and the proof is complete.
Remark 21 The structure of the optimal stopping time shows that high
rewards per unit operating time lead to short burn-in times whereas great
differences cF − cB between costs for failures in different phases lead to
long testing times, as expected.
5.3 Applications 201
and therefore we know that, if there exists an optimal finite stopping time
σ, then it is among the indexed stopping times
x
ρx = inf{t ∈ R+ : λt ≥ }, 0 ≤ x ≤ bu ,
c
provided λ has nondecreasing paths. We summarize this in a corollary to
Theorem 73, p. 183.
5.4 Repair Replacement Models 203
Nt
Rt = u + ct − Xn ,
n=1
describes the available capital at time t as the difference of the income and
the total amount of costs for minimal repairs up to time t.
The process R is well-known in other branches of applied probability like
queueing or collective risk theory, where the time to ruin τ = inf{t ∈ R+ :
Rt < 0} is investigated (cf. Section 3.3.9). Here the focus is on determining
the optimal operating time with respect to the given reward structure. To
5.4 Repair Replacement Models 205
achieve this goal one has to estimate the unobservable state of the system
at time t, given the history of the process R up to time t. This can be done
using results in filtering theory as is shown below. Stopping at a fixed time
t results in the net gain
m
Zt = Rt − kj Ut (j),
j=1
EZτ ∗ = sup{EZτ : τ ∈ C A }.
Ft = σ(Ys , Ns , Xi , 0 ≤ s ≤ t, i = 1, ..., Nt ).
m t
m
Zt = u − kj U0 (j) + Us (j)rj ds + Mt , t ∈ R+ , (5.25)
j=1 0 j=1
m
The integrand j=1 Ûs (j)rj with Ûs (j) = E[Us |As ] = P (Ys = j|As ) is the
conditional expectation of the net gain rate at time s given the observations
up to time s. If this integrand has nonincreasing paths, then we know that
we are in the “monotone case” (cf. p. 175) and the stopping problem could
be solved under some additional integrability conditions. To state mono-
tonicity conditions for the integrand in (5.26), an explicit representation
of Ût (j) is needed, which can be obtained by means of results in filtering
theory (see [52] p. 98, [104]) in the form of “differential equations”:
• at jumps
λj ÛTn − (j)
ÛTn (j) = m , (5.28)
i=1 λi ÛTn − (i)
where UTn − (j) denotes the left limit.
The following conditions ensure that the system ages, i.e., it moves from
the “good” states with high net gains and low failure rates to the “bad”
states with low and possibly negative net gains and high failure rates, and
it is never possible to return to a “better” state:
the first time the conditional expectation of the net gain rate falls below 0.
Theorem 79 Let τ ∗ be the A-stopping time (5.30) and assume that con-
ditions (5.29) hold true. If, in addition, qim > λm − λi , i = 1, ..., m − 1,
then τ ∗ is optimal:
EZτ ∗ = sup{EZτ : τ ∈ C A }.
Proof. Because of EZτ = E Ẑτ for all τ ∈ C A we can apply Theorem 71,
p. 175, of Chapter 3 taking the A-SSM representation (5.26) of Ẑ. We will
proceed in two steps:
a) First, we prove that the monotone case holds true.
b) Second, we show that the martingale part M̄ in (5.26) is uniformly
integrable. m
a) We start showing that the integrand j=1 Ûs (j)rj has nonincreasing
paths. A simple rearrangement gives
m
m−1
Ûs (j)rj = rm + (rm−1 − rm ) Ûs (j) + · · · + (r1 − r2 )Ûs (1).
j=1 j=1
m
j
j
= Ûs (i)qiν + Ûs (ν)(λ̄(s) − λν )
i=1 ν=1 ν=1
⎛ ⎞
j
m
= Ûs (i) ⎝− qik + λ̄(s) − λi ⎠
i=1 k=j+1
m
using qij = 0 for i > j and qii = − k=i+1 qik , i = 1, ..., m − 1.
From qim > λm − λi ≥ λ̄(s) − λi it follows that
( j )
d
Ûs (ν) ≤ 0, j = 1, ..., m − 1.
ds ν=1
208 5. Maintenance Optimization
j
j
λv − λ̄(Tn −)
(ÛTn (ν) − ÛTn − (ν)) = ÛTn − (ν) .
ν=1 ν=1
λ̄(Tn −)
m
λv − λ̄(Tn −)
j
λv − λ̄(Tn −)
0= ÛTn − (ν) ≥ ÛTn − (ν) .
ν=1
λ̄(Tn −) ν=1
λ̄(Tn −)
with a nonnegative integrand. This shows that Ût (m) is a bounded sub-
martingale. Thus, the limit
m
τ ∗ = inf{t ∈ R+ : Ût (j)rj ≤ 0} ≤ inf{t ∈ R+ : Ût (m) ≥ 1 − } < ∞.
j=1
e−gn (t)
Ût (2) = P (σ ≤ t|At ) = 1 − t , Tn ≤ t < Tn+1 ,
dn + (λ2 − λ1 ) Tn e−gn (s) ds
λ2 ÛTn − (2)
ÛTn (2) = ,
λ1 + (λ2 − λ1 )ÛTn − (2)
−1
where dn = 1 − ÛTn (2) , gn (t) = (q − (λ2 − λ1 ))(t − Tn ). The stopping
time τ ∗ in (5.30) can now be written as
r1
τ ∗ = inf{t ∈ R+ : Ût (2) > z ∗ }, z ∗ = .
r1 − r2
For 0 < q < λ2 − λ1 , Ût (2) increases as long as Ût (2) < q/(λ2 − λ1 ) = r.
When Ût (2) jumps above this level, then between jumps Ût (2) decreases
but not below the level r. So even in this case under conditions (5.31) the
monotone case holds true if z ∗ ≤ q/(λ2 −λ1 ). As a consequence of Theorem
79 we have the following corollary.
At the complete information level F the change time point σ can be ob-
served, and it is obvious that under conditions (5.31) the F-stopping time
σ is optimal in C F . Thus, we have the following upper and lower bounds
bu and bl :
bl ≤ sup{EZτ : τ ∈ C A } ≤ bu
with
bl = sup{EZt : t ∈ R+ },
bu = sup{EZτ : τ ∈ C F } = EZσ .
theory. Shaked and Szekli [153] and Last and Szekli [127] compare replace-
ment policies via point process methods. A good source for overviews of the
vastly increasing literature on replacement and maintenance optimization
models is the book Reliability and Maintenance of Complex Systems edited
by Özekici in the NATO ASI Series.
The presentation in Section 5.2 follows the lines of [107]. A general set-
up for cost minimizing problems is introduced in Jensen [107] similar to
Bergman [40] and Aven and Bergman [22]. It allows for specialization in
different directions. As an example the model presented by Aven [21] cov-
ering the total expected discounted cost criterion is included. What goes
beyond the results in [22] is the possibility to take different information
levels into account.
There are lots of multivariate extensions of the univariate exponential
distribution, for an overview see Hutchinson and Lai [101] or Basu [35],
which also cover the models of Freund [77] and Marshall and Olkin [132].
A detailed derivation, statistical properties, and methods of parameter es-
timation of the combined exponential distribution can be found in [92].
The optimization problem also for more general cost structures is treated
in Heinrich and Jensen [94]. An alternative approach to solve optimization
problems of the kind treated in this chapter is to use Markov decision pro-
cesses. It has not been within the scope of this book to develop this theory
here. An introduction to this theory can be found in the books of Puterman
[141], Bertsekas [43], Davis [65], and Van der Duyn Schouten [171], which
also contains applications in reliability.
An overview of several problems related to burn-in and the corresponding
literature is given in the review articles by Block and Savits [48], Kuo and
Kuo [124], and Leemis and Beneke [129]. The problem of sequential burn-in,
where the failures of the items are observed and the burn-in time depends
on these failures, is treated in the article of Marcus and Blumenthal [131].
In the papers of Costantini and Spizzichino [61] and Spizzichino [161] the
assumption that the component lifelengths are independent is dropped and
replaced by certain dependence models.
The problem of finding optimal replacement times for general repair
processes has been treated by Aven in [15] and [18]. The presentation
of Markov-modulated minimal repair processes follows the lines of [104]
and [106] which include the technical details. A similar model considering
interest rates has been investigated by Schöttl [147].
Appendix A
Background in Probability and
Stochastic Processes
|f (h)|
lim sup < ∞.
h→h0 |g(h)|
An integral f (s)ds of a real-valued measurable function is always an
integral with respect to Lebesgue-measure. Integrals over finite intervals
b
a
, a ≤ b, are always integrals [a,b] over the closed interval [a, b].
The indicator function of a set A taking only the values 1 and 0 is
denoted I(A). This notation is preferred rather than IA or IA (a) in the
case of descriptions of sets A by means of random variables.
In the following we always refer to a basic probability space (Ω, F, P ),
where
and O the collection of its open sets. Then the σ-algebra generated by O is
called Borel-σ-algebra and denoted B(S), especially we denote B = B(R).
If A and C are two sub-σ-algebras of F, then A ∨ C denotes the σ-algebra
generated by the union of A and C. The product σ-algebra of A and C,
generated by the sets A × C, where A ∈ A and C ∈ C, is denoted A ⊗ C.
P ( lim Xn = X) = 1.
n→∞
P
• Convergence in probability: We say Xn → X in probability, if for
every > 0,
lim P (|Xn − X| > ) = 0.
n→∞
D
• Convergence in distribution: We say Xn → X in distribution, if for
every x of the set of continuity points of F,
sup Yr − Ys p → 0 for k → ∞.
r,s≥k
Xp ≤ Xq .
The short proof of this result can be found in Williams [176]. The theo-
rem states that there is one unique element in the subspace K that has the
218 Appendix A. Background in Probability and Stochastic Processes
P (Y = Y ∗ ) = 1.
A random variable Y ∈ L1 (Ω, A, P ) with property (A.1) is called (a
version of) the conditional expectation E[X|A] of X given A. We write
Y = E[X|A] noting that equality holds P -a.s.
The standard proof of this theorem uses the Radon– Nikodym theo-
rem (cf. for example Billingsley [44]). A more constructive proof is via
the Orthogonal Projection Theorem 81. In the case that EX 2 < ∞, i.e.,
X ∈ L2 (Ω, F, P ), we can use Theorem 81 directly with K = L2 (Ω, A, P ).
Let Y be the projection of X in K. Then property (ii) of Theorem 81
yields E[(X − Y )Z] = 0 for all Z ∈ K. Take Z = IA , A ∈ A. Then
E[(X − Y )IA ] = 0 is just condition (A.1), which shows that Y is a version
of the conditional expectation E[X|A]. If X is not in L2 , we split X as
X + − X − and approximate both parts by sequences Xn+ = X + ∧ n and
Xn− = X − ∧ n, n ∈ N, of L2 -random variables. A limiting argument for
n → ∞ yields the desired result (see [176] for a complete proof).
Conditioning with respect to a σ-algebra is in general not very concrete,
so the idea of projecting onto a subspace may give some additional insight.
Another point of view is to look at conditioning as an averaging operator.
The sub-σ-algebra A lies between the extremes F and G = {∅, Ω}, the triv-
ial σ-field. As can be easily verified from the definition, the corresponding
conditional expectations of X are X = E[X|F] and EX = E[X|G]. So for
A with G ⊂ A ⊂ F the conditional expectation E[X|A] lies “between” X
(no averaging, complete information about the value of X) and EX (overall
average, no information about the value of X). The more events of F are
included in A the more is E[X|A] varying and the closer is this conditional
expectation to X in a sense made precise in the following proposition.
(i) X − Y2 2 ≤ X − Y1 2 ≤ X − Y2 2 + Y2 − Y1 2 .
(ii) Y1 − EX2 ≤ Y2 − EX2 ≤ Y1 − EX2 + Y2 − Y1 2 .
Proof. The right-hand side inequalities are just special cases of the triangle
law for the L2 -norm or Minkowski’s inequality. So we need to prove the left-
hand inequalities.
(i) Since Y2 is the projection of X on L2 (Ω, A2 , P ) and
Y1 ∈ L2 (Ω, A1 , P ) ⊂ L2 (Ω, A2 , P ),
Y2 22 = Y2 − Y1 + Y1 22 = Y2 − Y1 22 + Y1 22 ,
is a version of E[g(X)|A].
222 Appendix A. Background in Probability and Stochastic Processes
E[g(X)|Y ] = E[g(X)|σ(Y )]
E[g(X)|Y ] = h(Y ).
But even if the set {Y = y} has probability 0, we are now able to deter-
mine the conditional expectation of g(X) given that Y takes the value y
(provided we know h). Consider the case that a joint density fXY (x, y)
of X and Y is known. Let fY (y) = R fXY (x, y)dx be the density of the
(marginal) distribution of Y and
fXY (x, y)/fY (y) if fY (y)
= 0
fX|Y (x|y) =
0 otherwise
equals
E[h(Y )IB (Y )] = h(y)IB (y)fY (y)dy
for all B ∈ B. But this follows directly from Fubini’s Theorem, which proves
the assertion.
A.3 Stochastic Processes on a Filtered Probability Space 223
{τ ≤ t} = {ω : τ (ω) ≤ t} ∈ Ft .
The first three events of this union are clearly in Ft . The fourth event
{0 < σ < t, σ + τ > t} = {r < σ < t, τ > t − r}
r∈Q∩[0,t)
For a sequence of stopping times (τn ) the random variables sup τn , inf τn
are stopping times, so that lim sup τn , lim inf τn and lim τn (if it exists) are
also stopping times.
We now define the σ-algebra of the past of a stopping time τ.
Fτ = {A ∈ F∞ : A ∩ {τ ≤ t} ∈ Ft for all t ∈ R+ }.
B ∩ {τ ≤ t} = B ∩ {σ ≤ t} ∩ {τ ≤ t} ∈ Ft ,
A ∩ {σ ≤ τ } ∩ {τ ≤ t} = A ∩ {σ ≤ t} ∩ {τ ≤ t} ∩ {σ ∧ t ≤ τ ∧ t}.
Fσ∧τ ⊂ Fσ ∩ Fτ .
A ∩ {σ ∧ τ ≤ t} = (A ∩ {σ ≤ t}) ∪ (A ∩ {τ ≤ t}) ∈ Ft ,
228 Appendix A. Background in Probability and Stochastic Processes
This theorem shows that some of the properties known for fixed time
points s, t also hold true for stopping times σ, τ. Next we consider the link
between a stochastic process X = (Xt ), t ∈ R+ , and a stopping time σ.
It is natural to investigate variables Xσ(ω) (ω) with random index and the
stopped process Xtσ (ω) = Xσ∧t (ω) on {σ < ∞}. To ensure that Xσ is a
random variable, we need that Xt fulfills a measurability requirement in t.
Most important for applications are those random times σ that are de-
fined as first entrance times of a stochastic process X into a Borel set B:
σ = inf{t ∈ R+ : Xt ∈ B}. In general, it is very difficult to show that σ is a
stopping time. For a discussion of the usual conditions in this connection,
see Rogers and Williams [143], pp. 183– 191. For a complete proof of the
following theorem we refer to Dellacherie and Meyer [67], p. 116.
is an F-stopping time.
Note that the right-continuity of the paths was used to express {σ < t}
as the union of events {Xr ∈ B} and that we could restrict ourselves to a
countable union because B is an open set.
A.5 Martingale Theory 229
Xt ≥ E[Xs |Ft ],
Xt ≤ E[Xs |Ft ].
and Xt − t is a martingale.
230 Appendix A. Background in Probability and Stochastic Processes
The last lemma is often applied with functions g(x) = |x|p , p ≥ 1. So, if
M is a square integrable martingale, then X = M 2 defines a submartingale.
One key result in martingale theory is the following convergence theorem
(cf. [68], p. 72).
A.5 Martingale Theory 231
X = A + M,
Remark 25 1. Several proofs of this and more general results, not re-
stricted to uniformly integrable processes, are known (cf. [68], p. 198 and
[112], p. 412). Some of these also refer to local martingales, which are not
needed for the applications we have presented and which are therefore not
introduced here.
2. The process A in the theorem above is often called compensator.
3. In the case of discrete time such a decomposition is easily constructed
in the following way. Let (Xn ), n ∈ N0 , be a submartingale with respect to
a filtration (Fn ), n ∈ N0 . Then we define
Xn = An + Mn ,
where
The continuous time result needs much more care and uses several lem-
mas, one of which is interesting in its own right and will be presented here.
or in heuristic form:
E[dMt |Ft− ] = 0.
234 Appendix A. Background in Probability and Stochastic Processes
Furthermore,
yielding
as n → ∞ and the span of the partition 0 = t0 < t1 < ... < tn = t tends to
0.
[M ] with
[M ]t = M c t + Ms2
s≤t
1
[M, L] = ([M + L] − [M − L]) .
4
The following proposition helps to understand the name quadratic co-
variation.
A.6 Semimartingales
A decomposition of a stochastic process into a (predictable) drift part and
a martingale, as presented for submartingales in the Doob–Meyer decom-
position, also holds true for more general processes. We start with the moti-
vating example of a sequence (Xn ), n ∈ N0 , of integrable random variables
adapted to the filtration (Fn ). This sequence admits a decomposition
n
Xn = X0 + fi + M n
i=1
n
0n =
Mn − M (fi − fi )
i=1
The function g is said to have finite variation if Vg (t) < ∞ for all t ∈ R+ .
The class of cadlag processes A with finite variation starting in A0 = 0 is
denoted V.
For any A ∈ V there is a decomposition At = Bt − Ct with increasing
processes B, C ∈ V and
t
Bt + Ct = VA (t) = |dAs |.
0
Zt = Z0 + At + Mt ,
where A ∈ V and M ∈ M0 .
There is a rich theory based on semimartingales that relies on the re-
markable property that semimartingales are stable under many sorts of
operations, e.g., changes of time, of probability measures, and of filtrations
preserve the semimartingale property, also products and convex functions
of semimartingales are semimartingales (cf. Dellacherie and Meyer [68],
pp. 212–252). The importance of semimartingales lies also in the fact that
stochastic integrals t
Hs dZs
0
of predictable processes H with respect to a semimartingale Z can be in-
troduced replacing Stieltjes integrals. It is beyond the scope of this book
to present the whole theory of semimartingales; we confine ourselves to the
A.6 Semimartingales 237
is an F-semimartingale.
Theorem 105 If Z is an F-semimartingale, then Z
Zt2 = M t + Rt
The first term of the sum is an ordinary Stieltjes integral. Since the paths
of Z have at most countably many jumps, it follows that
s t
Zs− d gu du = Zs gs ds.
(0,t] 0 0
240 Appendix A. Background in Probability and Stochastic Processes
where
Rt = Zs− dLs + Ys− dMs + [M, L]t
(0,t] (0,t]
Sometimes the product rule is used for a product one factor of which
is the one point process I(ζ ≤ t) with a stopping time ζ. Because of the
special structure of this factor less restrictive conditions are necessary to
establish a product rule.
where R ∈ M0 .
Proof. The product ZY can be represented in the form
t
Zt Yt = Zt − Zt∧ζ + Zs dYs
0
In this appendix we present some definitions and results from the theory
of renewal processes, including renewal reward processes and regenerative
processes. Key references are [1, 9, 46, 64, 145, 168].
The purpose of this appendix is not to give an all-inclusive presentation
of the theory. Only definitions and results needed for establishing the results
of Chapters 1−5 (in particular Chapter 4) is covered.
j
S0 = 0, Sj = Ti , j∈N
i=1
and define
Nt = sup{j : Sj ≤ t},
244 Appendix B. Renewal Processes
or equivalently,
∞
Nt = I(Sj ≤ t). (B.1)
j=1
λt 1 − e−2λt
M (t) = − .
2 4
Refer to [1, 33, 34] for more general formulas for the renewal function of the
Gamma distribution and expressions and bounds for other distributions.
In Proposition 110 we show how M can be determined (at least in theory)
from F . It turns out that M uniquely determines F .
or equivalently
LM (s)
LF (s) = .
1 + LM (s)
B.1 Basic Theory of Renewal Processes 245
Theorem 111 If the function g satisfies the equation (B.4) and h is bounded
on finite intervals, then
g(t) = h(t) + (h ∗ M )(t)
is a solution to (B.4) and the unique solution which is bounded on finite
intervals.
246 Appendix B. Renewal Processes
where the second last equality follows from (B.3). Since the Laplace trans-
form uniquely determines the function, this gives the desired result.
Using the (strong) law of large numbers, many results related to renewal
processes can be established, including the following.
Nt 1
→ as t → ∞.
t µ
SNt ≤ t ≤ SNt +1 .
Hence,
S Nt t SNt +1
≤ ≤ .
Nt Nt Nt
Now the strong law of large numbers states that with probability one,
Sj /j → µ as j → ∞. As can be easily shown, Nt → ∞ as t → ∞, and thus
SN t
→ µ as t → ∞ (P -a.s.).
Nt
By the same argument, we also see that with probability one,
SNt +1 SNt +1 Nt + 1
= → µ · 1 = µ as t → ∞.
Nt Nt + 1 Nt
where f ∗1 = f and
t
f ∗j (t) = f ∗(j−1) (t − s)f (s)ds, j = 2, 3, . . . .
0
αt = SNt +1 − t,
βt = t − SN t .
The recurrence times αt and βt are the time intervals from t forward to the
next renewal point and backward to the last renewal point (or to the time
origin), respectively. Let Fαt and Fβt denote the distribution functions of
αt and βt , respectively. The following result is a consequence of the Key
Renewal Theorem.
Theorem 120 Assume that the distribution F is not periodic. Then the
asymptotic distribution of the forward and backward recurrence times are
given by
x
F̄ (s) ds
lim Fαt (x) = lim Fβt (x) = 0 .
t→∞ t→∞ µ
This asymptotic distribution of αt and βt is called the equilibrium dis-
tribution.
A simple formula exists for the mean forward recurrence time; we have
Formula B.5 is a special case of Wald’s equation (see, e.g., Ross [145]), and
follows by writing
k
ESNt +1 = E Sk I(Nt + 1 = k) = E Tj I(Nt + 1 = k)
k≥1 k≥1 j=1
= E Tj I(Nt + 1 ≥ j) = E Tj I(Sj−1 ≤ t)
j≥1 j≥1
= ETj EI(Sj−1 ≤ t) = µ F ∗j (t) = µ(1 + M (t)).
j≥1 j≥0
Finally in this section we prove a result used in the proof of Theorem 42,
p. 114.
Then t
1 g
lim g(s)dM (s) = .
t→∞ t 0 µ
250 Appendix B. Renewal Processes
The limiting value of the average return is established using the law of large
numbers and is given by the following result (cf. [145]).
Zt EY
→ as t → ∞,
t ET
(ii)
EZt EY
→ as t → ∞.
t ET
where
EY
τ2 = Var Y − T
ET
2
EY EY
= Var[Y ] + Var[T ] − 2 Cov[Y, T ].
ET ET
this quantity is also equal to the average probability that the process is in
state k.
M (t + h) − M (t) = h/ET.
References
[2] Andersen, P. K., Borgan, Ø., Gill, R. and Keiding, N. (1992) Statis-
tical Models Based on Counting Processes. Springer, New York.
[7] Arjas, E. and Norros, I. (1989) Change of life distribution via hazard
transformation: An inequality with application to minimal repair.
Mathematics of Operations Research 14, 355–361.
[11] Asmussen, S., Frey, A., Rolski, T. and Schmidt, V. (1995) Does
Markov-modulation increase risk? Astin Bulletin 25, 49–66.
[16] Aven, T. (1992) Reliability and Risk Analysis. Elsevier Applied Sci-
ence, London.
[23] Aven, T. and Dekker, R. (1997) A useful framework for optimal re-
placement models. Reliability Engineering and System Safety 58, 61–
67.
References 255
[55] Chow, Y., Robbins, H., Siegmund, D. (1971) Great Expectations, the
Theory of Optimal Stopping. Houghton Mifflin, Boston.
[71] Doksum, K. (1991) Degradation rate models for failure time and sur-
vival data. CWI Quarterly 4, 195–203.
[91] Haukås, H. and Aven, T. (1996) Formulae for the downtime distribu-
tion of a system observed in a time interval. Reliability Engineering
and System Safety 52, 19–26.
[98] Hokstad, P. (1997) The failure intensity process and the formulation
of reliability and maintenance models. Reliability Engineering and
System Safety 58, 69–82.
260 References
[115] Keilson, J. (1966) A limit theorem for passage times in ergodic re-
generative processes. Ann. Math. Stat. 37, 866–870.
[118] Kijima, M. (1989) Some results for repairable systems. J. Appl. Prob.
26, 89–102.
[119] Koch, G. (1986) A dynamical approach to reliability theory. Proc. Int.
School of Phys. “Enrico Fermi,” XCIV. North-Holland, Amsterdam,
pp. 215–240.
[124] Kuo, W. and Kuo, Y. (1983): Facing the headaches of early failures: a
state-of-the-art review of burn-in decisions. Proceedings of the IEEE
71, 1257–1266.
[126] Last, G. and Brandt, A. (1995) Marked Point Processes on the Real
Line - The Dynamic Approach. Springer, New York.
[128] Last, G. and Szekli, R. (1998) Time and Palm stationarity of re-
pairable systems. Stoch. Proc. Appl., to appear.
[139] Phelps, R. (1983) Optimal policy for minimal repair. J. Opl. Res. 34,
425–427.
[144] Rolski, T., Schmidli, H., Schmidt, V. and Teugels, J. (1999) Stochastic
Processes for Insurance and Finance. Wiley, Chichester.
[156] Smeitink, E., Dijk, N., and Haverkort, B. R. (1992) Product forms
for availabilty models. Applied Stochastic Models and Data Analysis
8, 283–302.
[159] Smith, M., Aven, T., Dekker, R. and van der Duyn Schouten, F.A.
(1997) A survey on the interval availability of failure prone systems.
In: Proceedings ESREL’97 conference, Lisbon, 17–20 June, 1997, pp.
1727–1737.
[160] Solovyev, A.D. (1971) Asymptotic behavior of the time to the first
occurrence of a rare event. Engineering Cybernetics 9 (6), 1038–1048.
[171] Van der Duyn Schouten, F. A. (1983) Markov Decision Processes with
Continuous Time Parameter. Math. Centre Tracts 164, Amsterdam.