Monte Carlo Sampling of Dynamic Fault Trees For Reliability Prediction

MonteCarlosamplingofDynamicFaultTreesfor
reliabilityprediction
G.V.Berg
g.v.berg@student.utwente.nl
ABSTRACT
In the past few decades the field of reliability engineering has
developed several useful techniques to achieve their main
objective of being able to systematically analyze systems for
reliability and robustness. One important technique is Fault
Tree Analysis and over time it has been extended into the more
versatile method of Dynamic Fault Tree (DFT) Analysis. With
a DFT one can compute the probability of failure during a
certain mission time. Calculating this probability of failure can
be computationally expensive. In this paper we describe a tool
which reduces the computational complexity of calculating the
failure probability for a DFT. The tool will compute DFT
failure probabilities using Monte Carlo simulation techniques.
Keywords
Monte Carlo simulation, reliability prediction, dynamic fault
tree
1. INTRODUCTION
The IEEE Reliability Society defines reliability as: Reliability
is a design engineering discipline which applies scientific
knowledge to assure a product will perform its intended
function for the required duration within a given environment
[IRS07]. One aspect of reliability engineering is reliability
prediction. NASA, for example, is eager to know the chance of
having one failed component in their satellites and how this
situation affects other components (and thus the functioning of
the satellite as a whole).
One established theory for calculating such probabilities
(based on known failure rates of individual components) is a
technique called Fault Tree Analysis (FTA). The Fault Tree
Handbook published by the United States Nuclear Regulatory
Commission [VGR+81] has set the basic standard for
analyzing the safety of mission critical systems such as nuclear
reactors. Since then much progress has been made in making
FTs more expressive. In particular, Dugan et al [DBB92] have
extended FTs with what are called dynamic gates, which gave
rise to the dynamic fault tree formalism. The DFT formalism
puts order of occurrence of events into FTs.
The most notable implementation for analysing (Dynamic) FTs
is Galileo [BSC99]. Galileo attempts to analytically solve
DFTs. It does so by trying to calculate the exact reliability
value of a DFT using a combination of Markov Chains and
Binary Decision Diagrams (BDDs) [And97]. The BDDs are
used when the FT contains no dynamic gates.
For DFTs Galileo computes the system reliability by solving
the underlying Markov Chains. These Markov Chains suffer
from state space explosions. A linear increase in DFT size will
make the state space of the Markov Chain exponentially
greater. For complex DFTs Galileo needs lots of memory and
time before it can compute the answer.
To counter these state space explosions Monte Carlo sampling
techniques are useful. Instead of trying to compute the answer
sampling techniques approach the answer in a computationally
less expensive way. The main idea is that, when taken enough
samples, the answer will be accurate enough (i.e. close to the
analytically computed value).
To sample system reliabilities several approaches have been
tried. Boyd and Bavuso [BB93] used a variation reduction
technique called importance sampling. They write: [..]
analytical solution techniques are preferable whenever the
model is small enough [..] simulation is preferred [..] when
the model is too large or exhibits system behaviour too
complex to be accommodated by analytical solution
techniques.
Gedam and Beaudet [GB00] also did work on using sampling
techniques in the field of reliability engineering. They used
Monte Carlo sampling to solve Reliability Block Diagrams
(RBDs). This diagram technique is a combinatorial one. A
static FT can be translated in a RBD and vice versa. For DFTs
this is not possible.
In this paper we describe a tool we have implemented. The tool
uses Monte Carlo sampling to compute the reliability of DFTs.
It does not use Markov Chains or BDDs. It works directly on
the FT. Gedam and Beaudets approach will be shown as being
an effective approach. The computational complexity and state
space explosion of the traditional Galileo methods are
countered by our tool. This will be proven using the results of
our case studies. Our cases are based upon the ones Boyd and
Bavuso [BB93] used to test their Galileo implementation.
2. BACKGROUND
2.1 Fault trees and dynamic fault trees
A Fault Tree (FT) is a Directed Acyclic Graph (DAG) in which
the leaves are basic events (BEs) and the other elements are
gates. This definition is based on [BCS07]. BEs model
component failures whereas the gates model how component
failures induce a system failure. Fault Trees have three types of
gates:
1.
AND gate (Figure 1.a) which fails if all inputs fail
2.
OR gate (Figure 1.b) which fails if at least one of its

inputs fails.
3.
K/M gate, also known as VOTING gate (Figure 1.c) which

fails if at least K (the threshold) out of M inputs fails.
2.2 Example of a DFT

Suppose a cyclist has a unicycle which fails when its tire fails.
The cyclist has also brought a spare tire: if a tire fails he can
replace the tire with the spare one. After the spare tire is used
failure of the tire will bring the system (i.e. the unicycle)
down.
This situation is modelled by a DFT as shown in Figure 2.
unicycle
fails
sp
tire fails
Figure 1: DFT gates
spare tire fails
Dynamic Fault Trees (DFTs) are a superset of (Static) Fault

Trees1. They have the ability to also specify ordering in which
the BEs occur. DFTs extend SFTs with the following gates2
1.
PAND gate (Figure 1.d) is a Priority AND gate which fails

when all its inputs fail from left to right in order.
2.
SPARE gate3 (Figure 1.e) which has one primary input and
zero or more spare inputs. All inputs are BEs. When the
primary input fails it is replaced by the first available
spare input. When that one fails it is replaced by the next
available spare input, etc. If the primary and all the spares
have failed the SPARE gate fails.
3.
FDEP gate (Figure 1.f) is a functional dependency gate

which has a trigger event (i.e. a failure) and a set of
dependent events [BCS07]. When the trigger event occurs
all dependent components become inaccessible or
unusable. Dependant events are BEs and are assumed to
have failed after the trigger event takes place. An FDEP
gate has a dummy output which is not taken into account
when calculating the system failure probability.
The gates extert influences towards each other depending on

their type and they define how failures propagate through the
entire system. A reliability engineer can compute the chance of
a system failing, during a specified mission time, with a FT
and a list of failure rates, for every BE present. This
information can be used to support engineering and
management decisions, trade-off analysis and risk assessments.
Each BE has a failure distribution. This is a statistical
distribution and the ones mostly used are the Weibull and the
exponential distribution.
For static FTs computing the reliability value is straightforward
since no ordering of events takes place. This makes the
reliabilities solvable by using Binary Decision Diagrams
(BDDs). These BDDs are data structures (rooted DAGs) used
to represent boolean functions [And97]. For DFTs we need the
ordering of events. Computing the system failure of a DFT is
done by solving the corresponding Markov Chain. This tends
to get difficult since it suffers from state space explosion as the
DFT gets bigger.
1
From now on we will use the term Static Fault Trees (SFTs)
when referring to FTs without DFT extensions
A Sequence Enforcing (SEQ) gate has been defined in

[DBB92] but since it is also expressible in terms of SPARE
gates we ignored it
In most literature ([BCS07],[DBB92]) SPARE inputs are

divided into cold, warm and hot SPARE inputs. We will
return on this subject later on in section 3.2.6.
Figure 2: Unicycle DFT

It is also a clear example on how reliability engineers have to
make choices on which BEs to include in their FTs and which
ones not to include. If the cyclist loses the spare tire he carries
on him, the unicycle will fail immediately after the collapse of
the first tire (since there would be no spare to replace it with).
The DFT in Figure 2 does not take this situation (losing the
spare tire, i.e. failure of that BE before the primary) into
account.
2.3 Monte Carlo simulation

Monte Carlo simulation is a technique which can help us
solving DFTs. It is easy to use lots of rounds of random
sampling to produce worthwhile results [KW86].
The main components of Monte Carlo sampling are:
probability distribution functions (PDF)
random number generators (RNG)
2.3.1 Example
We take the unicycle DFT in figure 3. We assume both tires
(i.e. BEs) have the same failure distribution. It is the
exponential failure distribution we use. This means that the
PDF of the BE is defined as:
and the cumulative distribution function (CDF - i.e. the

cumulative failure distribution of our tires) is defined as:
Both tires have lambda = 2.

We use a random number generator (RNG) to select a random
number u in the range [0, 1]. We then solve x for CDF(x) =
F(x) = u. When applied to a failure distribution we will get x
which denotes the time the BE failed. If we do this for lots of
random numbers and then compute the average time the BE
failed we can compute the Mean Time To Failure (MTTF).
A few samples are shown in the table below.
Table 1. Unicycle samples

Primary tire fails
RNG = u
Spare tire fails
Time t
RNG = u
Time t
0.9
1.151
0.8
0.8047
0.4
0.2554
0.6
0.4581
0.5
0.3466
0.3
0.1783
0.6
0.4581
0.7
0.6020
We can now approximate the MTTF of the first tire failing.

This is the same as the average of the four sampled times. The
approximated MTTF of the first tire is 0.5528. For the second
tire it is 0.5108. If we had taken more samples the answer
would come closer to 0.5. Please note that for exponential
distributions the MTTF is 1 / lambda. Which in this case is 1 /
2.
To compute the value of the spare gate we need to take one
more thing into account. The spare tire cannot fail before the
first tire, as we have established in section 2.3. So we have to
take the time of the primary tire failing into account.
yields a list of failure times for all the BEs. We now move on
to the propagation of these failure times through the tree.
We take the BEs we just sampled and move up the tree
according to the arcs coming out of the BEs. We hand these
values to the gates. After this we will compute the values of all
the gates upon where we will move these values up the tree
according to the arcs. This is done until we arrive at the top
node at which point we have a sampled failure time of the
entire tree.
The methodology for calculating the failure time of each gate is
described below:
3.1.1AND gate
This gate only fails after all its inputs have failed. So the
output failure time of an AND gate is equal to the largest
failure time of its inputs (or infinity if not all of its inputs fail).
3.1.2 OR gate
This gate fails after one or more of its inputs have failed. The
output failure time of an OR gate is equal to the lowest failure
time of its inputs (or infinity if none of its inputs fail).
3.1.3 K/M gate
This yields the following PDF definition for the spare input
with pfail denoting the time at which the primary tire failed:
The K/M gate, also known as VOTING gate, fails if at least K

out of M inputs failed. We take M inputs and sort the failure
times ascendingly. The Kth value (start counting at 0) of the
sorted failure times denotes the time at which the K/M gate
fails.
This PDF describes the failure distribution of the spare tire

with parameter lambda at time x under the condition that the
primary tire failed at time pfail.
3.1.4 PAND gate
We do not have to resample all the values for the spare tire
since the inverse PDF just returns the time (denoted by x in the
previous equations) at which the component failed. For the
spare tire we need to take x + pfail to adjust for the failing of
the primary tire. In short, for each row we add the value in
column 2 and column 4 together. For each row this value
denotes the time at which the spare tire failed (after the
primary already being in a failed state). Doing this delivers the
following values:
Table 2. Sampled values
1.9557
0.7135
0.5249
1.0601
From these values we can compute the approximated MTTF

for the spare gate as a whole since the unicycle spare gate fails
as soon as the one and only spare fails. This yields an
approximated MTTF value for the unicycle DFT of 1.0637 (the
mean of the values in the table above).
For a example mission time of 0.6 we have got only one
sampled value failing before the mission time. Out of four
only one sample fails before the mission ends so the sampled
unreliability of the unicycle DFT is 1 / 4 = 0.25.
3.MONTE CARLO SIMULATION OF

DYNAMIC FAULT TREES
In this section we will describe the method used to sample the
reliability of a Dynamic Fault Tree. All the BEs have
exponentially failure distributions since our research only
focussed on those types of BEs.
3.1Sampling
The sampling starts by sampling all the BEs. For each BE we
generate a random number r and solve (with the inverse CDF)
for Fcdf(t) = r as described before in the unicycle example. This
The PAND gate fails if all its inputs from left to right in order
have failed. So we take all the inputs of the gate (in order) and
check if the values are sorted ascendingly. If they are we return
the last value. If they are not we return infinity since the failure
of its inputs occurred in another order than specified so the
PAND gate will not fail.
3.1.5 FDEP gate

For the FDEP gate things become more complicated. The
FDEP gate doesn't propagate failure times up the tree. In other
words; it doesn't have any functional output. It exterts its
influence from the first input BE towards the rest of the input
BEs. As soon as the first input BE fails all the other input BEs
have also failed. This means, when propagating, that we have
to be careful when we encounter a BE which is also an input
for an FDEP gate. Take the following example (depicted in
Figure 3):
An FDEP gate has failure input BE1 and one other functional
dependent input named BE2.
BE1 is in the sampling step sampled as failing at time t =

3.
BE2 is in the sampling step sampled as failing at time t =

4.
spare inputs with dormancy factors other than 1) equal like an

AND gate since its inputs may fail at any time whereas the
gate will fail if all of them have failed.
Sampling a SPARE gate is easy when there are only CSP and
HSP inputs attached. For CSP inputs the process is the same as
described in the unicycle example. For HSP inputs the whole
gate becomes an AND gate. In other cases our methodology
does the following:
Figure 3: FDEP gate

In this case we cannot, for BE2, just take the time t=4 as the
sampled value and continue propagating. Because of the FDEP
gate BE2 fails at the time the input of the FDEP gate also fails.
In this case BE1 fails at t = 3 so this means BE2 also fails at t
= 3 and not at t = 4. If BE2 would have a sampled failure time
t = 2 we would not have to worry about the FDEP dependency
since BE2 already fails before the FDEP gate.
To incorporate the illustrated behavior above in our
methodology we do the following; after the sampling, and
before propagating any of the failure times, we check all the
FDEP gates present in the system. Since their input events are
already sampled we can also track all the dependent BEs for
the FDEP gate. If the sampled failure time of the dependent
BEs (like our example before) is higher than the one outputted
by the FDEP gate we reset the failure time of the BE to the one
supplied by the FDEP. Otherwise the initial sampled failure
time of the BE stays the same.
3.1.6 SPARE gate

The SPARE gate is the most complicated one. It basically
consists of three subtypes, also present in Galileo, namely the
Cold Spare (CSP), Warm Spare (WSP) and Hot Spare (HSP)
gate. Since we would like to stick to the correct terminology
we will talk about SPARE gates here only.
The spare input BEs to a spare gate also have a parameter
called the dormancy factor. This expresses whether the spare
input can even fail before the primary input has failed. The
dormancy factor influences the failure rate of the spare input if
the primary input has not failed yet. It is denoted by mu. The
dormancy factor is a number between 0 and 1 inclusive and is
only taken into account when the primary has not failed yet.
With pfail denoting the time at which the primary gate failed a
spare input's failure distribution becomes:
Take a sample t' from the failure distribution of the spare

input and assume it fails before the primary fails (which
fails at pfail)
Take a sample t'' from the failure distribution of the spare

input and assume the primary already failed.
Take the sampled time at which the primary fails denoted

by pfail.
If pfail > t' we assume the primary fails after the spare
input so the failure time of the spare gate as a whole
becomes pfail + t'.
If pfail < t' we assume the primary fails before the spare
input so the failure time of the spare gate as a whole
becomes pfail + t''.
The failure time of the spare gate will be computed as

described before and will be propagated up the tree.
3.2Calculating system reliability

After all the sampling and propagation has been done we can
calculate the system reliability and the mean time to failure
(MTTF). Suppose we have calculated N failure times of the
top node of the tree. The MTTF is the sum of all the failure
times divided by N. To get the system unreliability measure we
count the number of failure times in our list which is below our
threshold (the mission time). Call this M. The system
unreliability is denoted by M divided by N. To get the system
reliability (the chance it stays up during the given mission
time) we just do 1 unreliability.
3.3Confidence interval
The confidence interval is a measurement of the range in
which the real system reliability lies based upon all the
samples. It is used to determine if an answer is accurate
enough. To compute the confidence interval we first need to
compute the standard deviation of all the samples we have
taken.
We first calculate the mean of all the samples:
The standard deviation is given by:
There are two special cases:

The first one is where mu is 0. This means the spare cannot
fail before the primary. This is the same as in the unicycle
example in section 2. In Galileo this type of gate is also
dubbed a CSP gate. The spare itself can also be referred to as
being a cold spare.
The second one is where mu is 1. This means the spare can fail
before the primary but it will do so with effectively the
standard failure distribution and not with a changed failure
rate. This makes the spare gate (if it doesn't have any other
The confidence offset is denoted by:
z / n
Z is a value we can look up in tables for the normal
distribution. According to Moore and McCabe [MM03] we are
allowed to assume that for a large enough n the distribution
will behave like a normal distribution.
3.3.1Example
Suppose we take 1000 samples. The standard deviation is
equal to 50. The mean of the sampled values is 300. For a
confidence of 99.9% the table in Moore and McCage yields
3.291 as a value for z.
This yields a confidence offset of:
3.29150/ 300=9.5003
With a confidence of 99% we can now say that the real mean
value of the distribution lays between:
(300 9.5003, 300 + 0.5003)
4.IMPLEMENTATION
In this section we will briefly describe our implementation.
The tool we have built requires the following input:
the DFT in Galileo text format
mission time for which we want to calculate the reliability

of the DFT
number of samples to take
As output it will give:
Mean Time to Failure of the DFT
Unreliability for the inputted mission time
99.9% confidence interval of the unreliability
The tool has been implemented in Python. Python was chosen

because it's an agile programming language in which one can
prototype programs with relative ease. For the sampling we
used SciPy which is a python package for scientists in order to
do complex scientific calculations [Sci07]. It contains, for
example, a lot of standard probability distributions.
Each case will detail the DFT analyzed and give the
unreliability output by our tool. These unreliabilities will also
be accompanied with the confidence interval and the number
of samples taken. We will also analyze the DFTs with several
mission times.
Our results will be compared against the unreliability
measures computed by Galileo. Based upon this we are going
to show the methodology and its implementation are working.
5.1 Case Study #1: DFT without FDEP gates

This DFT is based upon the one depicted in appendix B,
except that the FDEP gates included there were ignored.
The DFT contains:
1 K/M gate (2/3)
3 OR gates
3 AND gates
3 PAND gates
12 BEs
Compared with appendix B the FDEP gates named FDEP1 and

FDEP2 are excluded. So are the NE1 and NE2 Basic Events.
All the Basic Events have exponential failure distributions. For
every BE the parameter lambda is 0.003.
The table below depicts the number of samples taken, the
corresponding mission time and the computed unreliability )
(including the confidence interval). As expected, the
confidence interval becomes smaller when more samples are
been taken.
Table 3. Case #1: results
mission time
The parser for the Galileo format files is based upon supplied
ANTLR grammar files. They were written to produce Java
code, so we had to translate the ANTLR files into Python code
producing parsers.
100
5.CASE STUDIES
For comparing the tool to existing techniques we performed
several case studies. These case studies are based upon two
case studies from the paper by Dugan et al [DBB992]. The
DFTs in that paper model several types of Fault-Tolerant
Computer Systems. These systems were explicitly designed to
be as redundant and reliable as possible. The Dugan case
studies are depicted in appendix A.
The first case in Dugans paper was chosen because it has four
FDEP and twelve PAND gates. This means that with
traditionally solving of that DFT an enormous state space
explosion will take place when trying to solve the underlying
Markov Chain. This makes it difficult to completely compute
the system reliability so our methodology might prove a good
alternative.
The second case in Dugans paper was chosen because it has
four SPARE gates. The SPARE gates also have shared spare
inputs. This means for analytically computing the system
reliability of the DFT the Markov Chain will be very complex.
Especially because of the shared spare inputs since this makes
all the SPARE gates dependent on each other. Being able to
solve this faster with our methodology would be quite
desirable.
We have drawn a few DFT's based upon the selection criteria
we have mentioned before and simulated those. The cases
themselves and the results for each independent case is shown
below.
250
number of samples taken

1,000
10,000
0.018000
+/- 0.019989
0.028300
+/- 0.011081
0.349000
0.321300
+/- 0.112466
+/- 0.035109
Galileo computes the following unreliability measures:

0.0271989 for mission time 100 and 0.321822 for mission time
250. The tool clearly is converging to these answers and with
more samples we could approach the analytically computed
value of Galileo even better.
5.2 Case Study #2: DFT with FDEP gates

This DFT is the one in appendix B. It contains:
1 K/M gates (2/3)
3 OR gates
3 AND gates
3 PAND gates
2 FDEP gates
14 BEs
All the basic events have exponential failure distributions. The

basic events depicted by NE1 and NE2 have parameter lambda
is 0.004. The other basic events have parameter lambda is
0.003.
The table below shows the results of the analysis.

Galileo outputted for a mission time of 100 an unreliability of
0.339212 and for mission time 250 an unreliability of
0.860949. The answers by our tool seem to be somewhat of the
mark. The unreliability computed by Galileo always seems to
be at the end of our simulated confidence interval.

mission
time
100
250
1,000
10,000
0.09000
0.079400
+/- 0.054845 +/- 0.028019
0.528000
0.509200
+/- 0.12520
+/- 0.069478
100,000
0.082670
+/- 0.009373
0.512460
+/- 0.008879
Galileo was not able to compute these unreliabilities. It

continuously crashed after a few minutes no matter what type
of computation (simulation, calculation) was entered. We have
not validated the result analytically. Since the specific gates
have all been tested we assumed the new FDEP gate in our
tool doesn't introduce additional problems.
This yields the following range for the unreliability of the DFT:
for mission time 100: (0.073290, 0.092030)
for mission time 250: (0.503581, 0.521339)
This example shows a case where the conventional analytical

solution used by Galileo is not able to produce an answer. Our
tool is able to give an answer and with reasonable confidence
we can say it's near the real answer.
5.3 Case Study #3: DFT with warm spare

inputs
Appendix C shows the DFT for this case study. It's a DFT
specifically designed to test the sampling of multiple SPARE
gates in a DFT. It contains the following gates:
1 K/M gate (2/3)
2 spare gates
5 BEs
The spare gates have warm spare inputs. All the basic events
have dormancy factors of 0.5. For BE1, BE3 and BE5 who are
not spare input BEs the dormancy factor is set but it's not taken
into account at any point in the simulation.
The lambda's for each BE are shown in the table below:
Table 5. BE Lambda values
BE1
BE2
BE3
BE4
BE5
0.006
0.008
0.006
0.009
0.02
The results after running the simulation are in the following

table:
This behavior was only observed with SPARE gates with warm
spare inputs. Due to time constraints we were not able to
figure out whether there's a bug in our implementation or not.
Our methodology seems to work for simple DFTs with only
one SPARE gate and one warm spare input. It might be a side
effect of the simulation but it is more probable that we have
stumbled upon an implementation error. This will have to be
investigated further.
6.CONCLUSION
The methodology described in this paper seems to work well
for smaller cases. For the cases we have been able to test the
results are encouraging. The simulation seems to give accurate
enough answers and is able to calculate the unreliabilities for
systems which cannot even be analytically analyzed by Galileo.
The sampling of the warm spare inputs in larger DFTs does not
seem to match the rest of the test data. Most of our
methodology was proven and has been put to the test so this
might give other researchers a good start to further improve our
implementation and find more accurate ways of Monte Carlo
simulation for DFTs.
6.1 Future work

The tool should be extended with a feature where, instead of
the number of samples, it gets a confidence interval as input.
This way a user doesn't have to guess the number of samples
which are needed to get an accurate enough sampled answer.
The implementation should just continue sampling and
terminate as soon as the calculated confidence interval is
below the threshold specified by the user.
The tool should be extended with support for DFTs which
contain SPARE gates that share spare inputs. This is quite a
common occurrence (as can be seen in some of our case studies
mentioned before) in systems which need to be modeled.
One should look into the problems regarding warm spare
inputs (as mentioned before in the case study section).
7.ACKNOWLEDGMENTS
The author wants to thank his supervisor, Hichem Boudali for
making available several references and discussing several
aspects of this paper. Gratitude is also expressed towards my
supervisors, Marille Stoelinga and Lodewijk Bergmans, and
my fellow students for reviewing and commenting on this
paper.
8.REFERENCES
[And97]
H. R. Andersen
An Introduction to Binary Decision Diagrams
Lecture notes for 49285 Advanced Algorithms
E97, October 1997,
Dept. of Information Technology, Technical
University of Denmark,
http://www.itu.dk/people/hra/bdd97.ps, accessed at
21st of March 2008
[BB93]
Mark A. Boyd and Salvatore J. Bavuso.

Simulation modeling for long duration spacecraft
control systems.
Proceedings of the Annual Reliability and
Maintainability Symposium, pages 106-113, 1993

mission
time
100
250
1,000
10,000
100,000
0.333000
+/- 0.118184
0.293700
+/- 0.038013
0.294760
+/- 0.018844
0.765000
0.803100
+/- 0.264136
+/- 0.059303
0.8032900
+/- 0.025621
[BCS07]
[BSC99]
H. Boudali, P. Crouzen, M. Stoelinga,

A compositional semantics for Dynamic Fault Trees
in terms of Interactive Markov Chains
Dept. of Computer Science, University of Twente,
to be published
Joanne Bechta Dugan, Kevin J. Sullivan, and
David Coppit. Developing a low-cost, high-quality
software tool for dynamic fault tree analysis.
Transactions on Reliability, December 1999, pages
49-59.
[Dug04]
J.B. Dugan,
Fault Tree Analysis of Computer-Based systems,
Lecture at Reliability and Maintainability
Symposium, University of Virginia 2004
[DBB92]
J.B Dugan, S.J. Bavuso, M.A. Boyd

Dynamic Fault-Tree Models for Fault-Tolerant
Computer Systems, IEEE transactions on
reliability, vol 41; number 3
IEEE institute of electrical and electronics, USA
[Gen03]
Gentle, J.E.
Random number generation and Monte Carlo
methods, 2nd edition
Springer-Link, New York, 2003
[GB00]
G. Gedam, Steven T. Beaudet,

Monte Carlo Simulation using Excel Spreadsheet
for Predicting Reliability of a Complex System ,
Motorola Satellite Communications Group,
Chandler ,
Proceedings IEEE Communications and Reliability
Symposium 2000
[IRS07]
IEEE Reliability Society website,

www.ewh.ieee.org/soc/rs/, accessed at 11th of March
2008
[KW86]
Kalos H.M., Whitlock P.A.,

Monte Carlo methods, Volume 1: Basics
Courant Institute of Mathetmatical Sciences, New
York University
John Wiley & Sons, New York, 1986
[MM03]
Moore, D.S., McCabe, G.P.

Statistiek in de praktijk,3rd edition
Academic Service, the Hague, oktober 2003
[Sci07]
SciPy website
http://www.scipy.org/, accessed at 25th of March
2008
[VGR+81] W.E. Vesely, F.F. Goldberg, N.H. Roberts, D.F.

Haasl
Fault Tree Handbook, U.S. Nuclear Regulatory
Commission, January 1981
U.S. Government Printing Office, Washington
APPENDIX A: CASE STUDIES FROM DUGAN
APPENDIX B: DFT #1
APPENDIX C: DFT CASE #2

Monte Carlo Sampling of Dynamic Fault Trees For Reliability Prediction

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Monte Carlo Sampling of Dynamic Fault Trees For Reliability Prediction

Uploaded by

Copyright:

Available Formats

MonteCarlosamplingofDynamicFaultTreesfor

AND gate (Figure 1.a) which fails if all inputs fail

OR gate (Figure 1.b) which fails if at least one of its

K/M gate, also known as VOTING gate (Figure 1.c) which

2.2 Example of a DFT

Figure 1: DFT gates

spare tire fails

Dynamic Fault Trees (DFTs) are a superset of (Static) Fault

PAND gate (Figure 1.d) is a Priority AND gate which fails

FDEP gate (Figure 1.f) is a functional dependency gate

The gates extert influences towards each other depending on

A Sequence Enforcing (SEQ) gate has been defined in

In most literature ([BCS07],[DBB92]) SPARE inputs are

Figure 2: Unicycle DFT

2.3 Monte Carlo simulation

probability distribution functions (PDF)

random number generators (RNG)

and the cumulative distribution function (CDF - i.e. the

Both tires have lambda = 2.

Table 1. Unicycle samples

Spare tire fails

We can now approximate the MTTF of the first tire failing.

3.1.3 K/M gate

The K/M gate, also known as VOTING gate, fails if at least K

This PDF describes the failure distribution of the spare tire

3.1.4 PAND gate

From these values we can compute the approximated MTTF

3.MONTE CARLO SIMULATION OF

3.1.5 FDEP gate

BE1 is in the sampling step sampled as failing at time t =

BE2 is in the sampling step sampled as failing at time t =

spare inputs with dormancy factors other than 1) equal like an

Figure 3: FDEP gate

3.1.6 SPARE gate

Take a sample t' from the failure distribution of the spare

Take a sample t'' from the failure distribution of the spare

Take the sampled time at which the primary fails denoted

The failure time of the spare gate will be computed as

3.2Calculating system reliability

The standard deviation is given by:

There are two special cases:

The confidence offset is denoted by:

the DFT in Galileo text format

mission time for which we want to calculate the reliability

number of samples to take

As output it will give:

Mean Time to Failure of the DFT

Unreliability for the inputted mission time

99.9% confidence interval of the unreliability

The tool has been implemented in Python. Python was chosen

5.1 Case Study #1: DFT without FDEP gates

1 K/M gate (2/3)

Compared with appendix B the FDEP gates named FDEP1 and

number of samples taken

Galileo computes the following unreliability measures:

5.2 Case Study #2: DFT with FDEP gates

1 K/M gates (2/3)

All the basic events have exponential failure distributions. The

Table 4. Case #2: results

number of samples taken

Galileo was not able to compute these unreliabilities. It

for mission time 100: (0.073290, 0.092030)

for mission time 250: (0.503581, 0.521339)