You are on page 1of 12

IEEE TRANSACTIONS ON RELIABILITY, VOL. R-36, NO.

1, 1987 APRIL 133

Computer-Assisted Markov Failure Modeling


of Process Control Systems

Tunc Aldemir loops [1-5] are based on a non-dynamic description of


The Ohio State University, Columbus system operation (eg, by di-graphs [1,2], signal-flow-based
graphs [3]). The effect of control unit action on the evolution
Key Words-Markov failure analysis, Process control of process variables is represented by input-output relations.
system, Computer-assisted model construction, Dynamic failure The definition of input-output relations may require an ar-
modeling, Common-cause failure bitrary discretization of the control space (eg, large, small,
Reader Aids- zero gain [1,2]) and/or the assumption of a specific type of
Purpose: Widen state of the art relation between process variables due to control unit action
Special math needed for explanations: Elementary probability, (eg, proportional change [1,2,3]). The applicability and the
Elementary analysis accuracy of these non-dynamic schemes are discussed in
Special math needed to use results: None [5-10,20].
Results useful to: Reliability analysts The purpose of this paper is to present a systematic pro-
cedure for the failure modeling of PCS from the dynamic
Summary & Conclusions-Process control systems (PCS) are description of system operation. Section 4 lists the assump-
systems with control loops and continuous state dynamic variables
such as pressure, temperature, and liquid level. Existing computer- tions on PCS properties. Section 5 illustrates the PCS under
assisted failure modeling schemes for PCS are based on a static consideration by a hypothetical level control system (Exam-
description of system operation (eg, by digraphs, signal-flow-based ple System #1). Nomenclature is defined more precisely in
graphs). This paper presents a dynamic approach to the failure the next section.
modeling of PCS. The givens for the methodology are: 1) a set of Owing to the interaction between the process variables
first order differential equations with feedback describing the in- and control units, the PCS do not have the Markov property
teraction between system variables, 2) failure and repair rates for the [11-13] when regarded in continuous time and the con-
control units constituting the PCS. The methodology is based on the
discrete state space-discrete time representation of PCS dynamics.
Probabilistic system behavior is simulated by a Markov chain. An
tin state saeofte
construction involves
p
algorithm is developed for the mechanized construction of the tran- defining the discrete system states by specifying a par-
sition matrix. Input preparation for the algorithm is illustrated by . .
examples. Useful features of the methodology are: 1) failure model titioning [25] of the control space and its complement (sec-
accuracy can be verified or improved by a change in the input data tion 6.1)
for mechanized model construction, 2) effect of changes in system * choosing a time interval in which the control units do
parameters on PCS failure characteristics can be quantified. These not change states.
features are demonstrated on a simple level-control system. The
limitations of the methodology are discussed. Section 6.2 shows that such a state space-time discretization
leads to a homogeneous Markov chain [I1] to describe the
Editors' Note: It was not feasible to reconcile all the dif- probabilistic PCS behavior. An algorithm is developed for
ferences between all of the referees and the author. These the mechanized construction of the transition matrix.
differences centered on the practical utility of the method, The methodology provides an alternative to direct
and the best way of presenting it (but not on its correctness). system simulation (eg, Monte Carlo techniques) when the
Correspondence is invited on this topic. propagation of system disturbances through the control
loops is difficult to express in terms of time-independent
input-output relations. Useful features of the methodology
1. INTRODUCTION are:
Process control systems (PCS) are used in nuclear and 1. Failure model accuracy can be verified or improved
chemical plants to maintain the magnitudes of the process by a change in the input data for mechanized model con-
variables within specified ranges. Examples are: struction. (Section 7 describes this feature).
* Nitric acid cooler in [1] 2. The effect of changes in system parameters on PCS
* Cascade level control system in [20] failure characteristics can be quantified.
* Pressure tank system in [22] Section 8 illustrates both features on Example System #1.
. Emergency core cooling system in anuclear plant. Computational aspects of the methodology and the
PCS are dynamic systems. Existing computer-assisted capability to model common-cause failures are discussed in
schemes for the failure modeling of systems with control sections 9.1 and 9.2, respectively.

0018-9529/87/0400-0133$01 .00©()1987 IEEE


134 IEEE TRANSACTIONS ON RELIABILITY, VOL. R-36, NO. 1, 1987 APRIL

2. NOMENCLATURE f,,(x), f,e time rate of change in Xe for i(t) = S,,


process control system (PCS) a system with control f. (x) L-vectors with components f,,,f (x)
loops and controlled continuous state dynamical
variables such as pressure, temperature, and liquid
Xk(mkjlrk) rs J to
transition rate for unit k from mkj
I4k(mk1) repair rate for unit k failed in state ik1
level ak*)Q setpoint r for control unit k, e is the process
process variables continuous state dynamical variables variable initiating unit action
of PCS pn,,j(t) Pr{i(t) = Sn,x(t) E Vj}
control units single components or subsystems p(n,x,t) Pr{i(t) - Sn,x(t) = x}
regulating process variable evolution
control laws rules which specify the desired response 8n,,,n Kronecker delta, an ,n = 9 n
of control units to changes in system variables (O, otherwise
operational unit state a control unit state which is con- T mean time to failure in state 'y
sistent with the control laws F m(t)Cdf of failure in stateiyy
failed unit state a control unit state which is only partly Fy(t) Cdf of failure in state
consistent or is inconsistent with the control laws
control space a domain in the process variable space (IM)n,, S-iMPOrtance of the event that system fails in
bounded by the allowable magnitudes of process state -y with ik = nk
variables
control regions domains in the control space bounded subscripts
by system setpoints and/or boundaries of the con- n ordering of control unit state combinations,
trol space n-1 N
nominal control region range of process variables un- ' '.
der normal system operation
More notation is given in sections 5, 6.2, 9.2. Other, stan-
3. NOTATION dard notation is given in "Information For Readers &
L number of process variables Authors" at rear of each issue.
K number of control units
x, (t) magnitude of the process variable f at time t (f= 4. SYSTEM DESCRIPTION

x(t),x L-vectors with components xXt) and xe respectively variables. Consider a PCS with K control units and L process
Under normal operation, the process variables
Eftfi l-dienstonalEucghdestanlpae magnitudesofpro are within a predefined nominal control region. If the
lowestvandablehighespectiallow
magnitudes of
failure of a control unit causes system disturbance, the
pro-

cess variable e, respectively process variables can move out of the nominal control
V control space, V {x;;e<xe<<6e} region. However, unit failure does not necessarily lead to
V complement of V, VU V = E system failure. The process variables can be brought back
R number of control regions in V to the nominal control region or can be kept within the
Vr control region r, r =1,... ,R control space by the action of remaining operational units.
V1 pairwise disjoint intervals in V, Vj {x:Ce < ae,j The action of the operational control units are specified by
< Xi < be,j < f/e} the control laws. The system fails if any process variable is
rg Jr number of Vj in Vr and ordering of Vj in Vrespec- outside the control space.
tively, Jr = Jr-I + J r for r = 1,.. .,R; JO = 0 Assumptions on system properties
r number of system failure types
Vz pairwise disjoint intervals in V, y = JR +1......,JR 1. Control units have discrete states
2. The state of an operational control unit at a given
Ik number of states for unit k time depends only on the magnitude of the process vari-
K ables at that time and not on the states of other control
N II*k units. Each control unit has ie,one operational state
k=1 .associated with a control region,
rkr, rkl operational states of unit k for xe Vr and xe V/1,
respectively
mkr, ink1 failed states of unit k for xe Vr and xe V. re =kt_kx()V) -rr 1
spectively (mkr, otherwise, mkr = 1,..,I*- 1
ik (t), ik state of unit k at time t ________
.. .3. Time rates of change for process variables are not
i(t), i ordered set of unit states {ii (t),i2(t),. ......... ......,ijd(t)}
S,,, set of i with ik = n*k explicit functions of time or the history of system opera-
5,, ordered set\{ni, ..,....nK} tion, ie.
ALDEMIR: COMPUTER-ASSISTED MARKOV FAILURE MODELING OF PROCESS CONTROL SYSTEMS 135

dxt) =-fj(x), for i(t)


dt
= Sn,XE V; e= I
9 9..L 1..S

(2) Level Sinnial Level Sional


Un
3
~~~~~~ ~~~~~Unit
~Liud
supy
4. Unit failures and repairs are mutually statistically Norrialy
independent. On Off
5. Unit failure and repair rates are constant Level Signal Liquid Level (x)

The level control system (Example System #1) in section 5


illustrates the PCS. Assumption #4 implies that: a) the con-
trol units do not share common elements, and b) common- ll2
cause failures are not considered. However, control units e a__a _
can interact through the process variables (characteristic #1
for Example System #1). The methodology can be some-
times extended to handle common-cause failures (section
9.2). Assumptions #3 and #5 allow the description of pro- Outflow Unt
babilistic PCS behavior as a homogenous Markov chain lorally Tank
(section 6.2).

5. A HYPOTHETICAL LEVEL CONTROL


SYSTEM-EXAMPLE SYSTEM #1 Fig. 1. Example #1 - Hypothetical Level Control System
Notation TABLE 1
x(t),x liquidleveldin
level in the tank at time t Example System #1 Operational Unit States As A
-

Function of Liquid Level


a,/; lowest and highest allowable liquid levels, respec-
tively Liquid Control Unit State
at, a2 system
Xk
setpoints
setpoints
Cgl, °l2system
magnitude of flow rate through unit k
~~~~~~~Level
(x) Unit I Unit 2 Unit 3

rate of liquid level change in the tank as a function a5<X<a2 on on off


of control unit states ce2<X on off off
ik= 1,2 on and off states of control unit k, respectively x<al off on on
subscripts
TABLE 2
-y = o,d system failure by overflow and dryout respectively Example System #1 - Rate of Liquid Level Change As A Function
of Control Unit States and Corresponding Sets Sn,
Figure 1 shows a simple PCS. The process variable is Control Unit State n Rate of Set S,,
the liquid level in the tank. The control units consist of a
drain unit (unitl) and two supply units (units 2 and 3). Unit 1 Unit 2 Unit 3 Level Change (f/) n0 n2 n3
Each unit has a separate level sensor. Unit definition in- on on on 1 -Xn +X2 +X3 1 1 1
cludes the level sensor. on on off 2. -x1 +x2 1 1 2
The control units are actuated by level signals and res- on off on 3 -x +X3 1 2 1
pond instantaneously (ie, negligible time delay). Under on off off 4 -xl 1 2 2
normal operation al <x< a2 (nominal control region), off on on 5 x?2 +X3 2 1 1
with unit 1 on, unit 2 on and unit 3 off. The system fails by off on off 6 x2 2 1 2
dryout if x < a or by overflow if x> 6. off off off 8 0 2 2 2
Given
1. Control units are either on or off 4. Units are not repaired and unit failure rates do not
2. System setpoints and control laws are as shown in depend on the mode of failure, viz,
Table 1
3. Rate of liquid level change in the tank satisfies MAk(mkrlrkr) = XA*, I/lk(mkr) = 0, for rkr, mkr = 1,2. (4)
d t)=n freV{;N<i,kt=n 5. Unit failure rates are constant.

n=1,..,8. (3) Example System #1 satisfies assumptions #1-#5 on PCS


properties. If units 1-3 do not respond instantaneously, then
Table 2 lists f, and S,,. Thexk in table 2 are constant. the identity in (1) is not valid. However, assumption #3
136 IEEE TRANSACTIONS ON RELIABILITY, VOL. R-36, NO. 1, 1987 APRIL

does not requirexk to be constant as in given #3 for Example Modeling assumptions


System #1. Thex can be functions of the liquid level in the
Xk.^
tank (eg, x1 depends pressurex'2 X^3de-''J 1. The p(n, x, t) is constant for X E Vj
depends. on hydrostatic
on and *~
pend on the deviation of x from a reference level). On the [t,t + At).Control
tan (eg, hydrostatic pressure, ~2 and de 2. units do not change states in the interval
Units may change states instantaneously at
other hand, assumption #3 is violated if the liquid supply to t + t
units 2 and 3 are exhaustable within the process duration.
Characteristics of Example System #1 typical for PCS Section 6.2 shows the relevence of modeling assumptions
are: #1-#2 in defining an algorithm for mechanically generating
1. Control units interact through the process variable the transition matrix.
2. The transitions
x(t') -x(t)
i (t ') i (t) 1. Possible operational and failed unit states
are statistically dependent 2. System setpoints and control laws
3. The sequencing of transitions from operational to 3. Boundaries of the control space and fn, (x)
failed unit states can affect the type of system failure. 4. Failure and repair rates for control units
These characteristics follow from (3) and tables 1-2. Figure 2 Section 6.1 describes input preparation for the
illustrates characteristic #3 by two event-trees [21]. Figure 2 algorithm from the givens for the methodology. Input
also shows that unit failure does not necessarily lead to preparation is illustrated by Example Systems #1 and #2:
system failure. In fault tree methodology, systems with Example System #2 Consider a PCS with two process
characteristic #3 can be analyzed by the initiating-enabling variables xi, x2 (eg, temperature and pressure) and two
event approach [221. control units.
Given
Action| 2. Setpoints and control laws are(ikas=shown
1) or off (ik= 2).
Initiating Unlit Consequence 1. Each control unit is either on
Event in figure 3
3. V= {X;O<&f x &e, Q = 1,2}
tJnit 2
fa l s on
- overflow Thefg e failure and repair rates are not relevent to illustra-
tions.
Unit 1
fails off Unit 3 overflow Process Variable 2(x2)
fails on
Unit v
operates
Unit 3 = < l2i2
ope ra te s i21i2
12 =1 i2=2
(2)
Unit 3 dryout ,2
fails off (21
Unit 1 ' ,2
fails on
Unit 3 liquid level
operates fluctuates
Uniit 2 between 1 and 02 i 1=2 iI =1
fails off i2=2 i2=2
Uniit 3 overflow
fails on
Uniit I
operates
Unit 3 liquid level I
ooerates flucuates around 0 11 2.1
02 Process Variable 1 (xl)

Fig. 2. Example System #1 - Event trees for Initiating Events [22] Fig. 3 Example System #2 - Operational Unit States As A Func-
"Unit 1 Fails Off", "Unit 2 Fails On" With x3 xtion of the Process Variables

6. METHODOLOGY 6.1 Inputs For The Mechanical Construction The Transi-


tion Matrix
Markov model construction for the failure analysis of
PCS described in section 4 is based on the discrete state The inputs for generating the transition matrix are:
space-time representation of system dynamics. State space-
time discretization involves choosing a partitioning [25] V1, 1. fn.XKx) n = 1,......,N, e= 1,... ,L)
V, of E and a time step at. Since the events following system 2. Boundaries of V1, Ve,
failure are not under consideration, system failure is de- 3. Xk*(mkjlrk3), ,Lk(mkl)
scribed as transition to absorbing states. 4. A time step at
ALDEMIR: COMPUTER-ASSISTED MARKOV FAILURE MODELING OF PROCESS CONTROL SYSTEMS 137

Input #1 is known by given #3 for the methodology.


Input #4 specifies how the time is made discrete. The aj = a2 + ( - 2- 1)(AX)3, bj = a2 + (i-J2)(X)39
choice of At depends upon the accuracy of (9) in describing
the evolution of the process variables. In that respect, At is = J2 .+.. **J3
arbitrary. Model accuracy is determined by observing the
sensitivity of system failure characteristics to the choice of (Ax)l = (ail - )/Ji, (AX)2 = (a2 - ai)/J2, (AX)3
At (section 7). The steps in specifying inputs #2 and #3 are: = (6 -2)J3.
Step 1. Identify the control regions
The Vr are identified from the system setpoints and Vd {=x;x<a}, V0 = {x;x>b}.
the boundary of V. Table 3 and figure 4 show Vr for Ex- J.
The choice of is arbitrary. Figures 5 and 6 show two
ample Systems #1 and #2, respe'ctively, possible partitionings for Example System #2 with
TABLE 3 minimum Jr. Since the Pr for Example System #1 are dis-
Example System #1 - Control Regions joint intervals themselves, Vj are identical to Vr for
minimum Jr (viz, Jr = 1) and the partitioning is unique.
Region # (r) Control
Region (P')
Pr ocess Variables 2(x2)

2 a1<xPca2
aI< X< Cl2 V10
3 oetx.b b2-< _

(2)~~~~~~~~~~~~~~~~~~
V' V2
Process Variable 2(x2) V V lxP V a

a2 2

0(2) VP
1,2

V
a, 01,.1 "2,1
Process Variable 2(x2)
02(2 -l °,l 21)b
V0,(^y=8,11
Fig. 5. Example System #2 - A Possible Partitioning (#1) Vj,
VVy, = 8,1 1)
V~~~~~~00
01.1 (1)b
02,1
Process Variable 1(01) Process Var 11Piable 2(02)

Fig. 4. Example System #2 Control


- Regions V (10
b2

Step 2. Choose a partitioning Vj, V, of E: +2 S3VI

Vr (r=1,...,R)
,2
U Vj (5) 022)
45 6 V7 V9
JR+IF
U V,=V. (6) '

Eqs (6) and (5) are necessary for obtaining (8) and (11),
reSpectively. For Example System #1, (5), (6), table 3, and
138 IEEE TRANSACTIONS ON RELIABILITY, VOL. R-36, NO. 1, 1987 APRIL

The rkj and mkj are determined by (1) (table 1 and p(n,x,t+ At) =
figure 3 for Example Systems #1 and #2, respectively) and N/
(5). Table 4 lists rkj, mkj for Example System #1 with Jr = ( jdv '+ J-dv 'q(n, xn ', x ', At)p(n ', x ', t). (7)
1 (r= 1,2,3). Table 5 shows rkj, mkj for Example System #2 n'=1 v /
and the partitioning in figure 5. Once rkj, mkj are found, Let x E V, be absorbing states. The appendix shows
Xk(rkjmkj), ltk (mkJ) are obtained from given #4 for the that modeling-assumption #1 leads to-p ,(t+ At) =
methodology (by (4) for Example System #1). n,
TABLE 4 Unit Ste in v '- dvtv) dv' q(n,xjn ',x ',At) +
Example System #1 - Operational and Failed Unit States in Vj J
Operational States Failed States JRr ) n n
, (8)
Interval Unit 1 Unit 2 Unit 3 Unit 1 Unit 2 Unit 3 i =JR+I

VI 2 1 1 1 2 2 q(n,xIn ',x ',At) in (8) refers to the simultaneous


The
V2 1 1 2 2 2 1
V3 1 2 2 2 1 1 occurrenceoftheeventsA ax(t)=x'-x(t+At)=xandB
i(t) = Sn-i(t + At) = Sn. Thus both Pr{A }, Pr{BIA }
or both Pr{B}, Pr{A IB} have to be known to determine q.
TABLE 5 In principle, neither Pr{A } not Pr{B} can be determined
Example System #2 - Operational and Failed Unit States in Vj individually since A and B are statistically dependent
For The Partitioning in Figure 5. (characteristic #2 for Example System #1). The
Operational States Failed States methodology approximates Pr{A } using modeling-
assumption #2:
Interval____Unit___I
Interval Unit 1 _Unit___2
Unit 2
_Unit___l_Unit___2
Unit 1 Unit 2
Let i (t')
= i (t)=Sn for t < t' < t + At. If At is suffi-
V1 2 1 1 2 ciently small, by (2) and the continuity of x(t)
V2 2 2 1 1
V3 2 1 1 2 xQ(t + At) =x 'e +fn,(x ')At. for x(t) = x i(t) = Sn, (9)
V4 1 1 2 2
V5 1 2 2 1 Pr{A}
V6 2 2 1 1 Eq (9) implies that, given x' and n', is known.
Then -
V7 1 2 2 1

q(n,xjn ',x ',At)=h(nln ',x '-x,At) g(xjx ',n ',At). (10)


6.2 Development of Algorithm for Model Construction Eq (10) follows from Pr{AB}=Pr{BIA}Pr{A} and the
definition of h and g. The arguments x ',x in h account for
Notation the interaction of the control units through the process
dv infintesimal volume element in E variables. In that respect the methodology is similar to the
L
method of supplementary variables [19]. The appendix
Vj volume of Vj, vj rl (b2,j -a2,) shows:
q(n,xjn ',x ', At),q Pr{i(t + A\t) = Sn,x(t + A\t) Xv dv s, dv ' q(n,x l n ',x ', At) =h(n n j '-j,At)

h(nln ',x
n-x,At),h Pr{i(t +At)=SnIi(t) =S ,x(t) K
=x -x(t + At)=x} h(nln',j-j,At) = II ckI(nklnk',I-j,At)j,j'=1,...,JR
h(nln ',x'-j,At) Pr{i(t+At)=SnIi(t)=Sn,x(t) (12)
E Vj-x(t + At)eV)}
ck(nkIn,]'-j,At)cPk Pr{ik(t + Ait) = nklik(t) = nk,x(t) The Ck are analogous to the elements of the matrices A(i) in
E V1 .-x(t + At) v} [17]. Both the Ai and Ck describe the statistical dependence
qn jJ (At) elements of the Markov transition matrix of transitions between system component states in mechan-
ical Markov model construction. In (17] the transition
1 V 3 matrix is generated using the Kronecker algebra. The
e1 (x) step function on E, e1(x) ielements of A(i are input data for model construction.
O,otherwise By (5) and assumption #2 on system properties (sec-
tion 4) each j ',j in Ck is associated with a unique opera-
Definitions of p(n,x,t), q(n,x|n ',x,A\t) and N imply [15]: tional unit state r,k1, rk1', respectively. Figure 7 shows:
ALDEMIR: COMPUTER-ASSISTED MARKOV FAILURE MODELING OF PROCESS CONTROL SYSTEMS 139

assumptions #3, #5 on system properties) (14) is a


k
n' knk |homogenous Markov chain. The mechanized construction
of the transition matrix follows from (13) and (15).

7. FAILURE ANALYSIS AND CHECK


OF MODEL ACCURACY

Once the transition matrix for the Markov model is


YES r
k(nkI n kd-j.t) generated and initial system state is specified, several
;1kYt(nJ68t h techniques can be used for determining the time dependent
and the steady-state failure characteristics of the PCS [11,
13, 161. Some PCS failure parameters of interest are:
k k
;,
|j(nk n k) -JAt I

FI(t)= YEn=Ei N~~~


pnz(t) 7= vJR + 1, - JRp((k
+ r)At) (16)
ck(n Iln.kJ k' N

Li- iikin k)A |7 =


^ lkim k(kAt) Fl 1Pn,(kAt) Pn,,((k -Il)At)] (17)
-

Fig. 7. Flow Chart Illustrating the Logic in DeterminingCk Wn,y(t) = ( /At)[Pny(t + At) -
Pc*t)] (18)
N
/ (Im)flk (t) = E W,(01 E Wn,-(t) (19)
ck(nkjn' j -j,At) (1- E, =
\ k*mkr
Xk(ikjnk )At) 6nk,rkj'bnk,rkj snkn
6,,k,rkj, (I bnk,
. Xk(nklnk')At
+ Xk(nkInk )At A";, rkJt (1 - 6nk,rkj)
rkj) The s-importance function given by (19) is similar to
the one used by Dunglinson & Lambert [22, (4)-(5)].
+ yk(nk)At (I n' rki )nk rk - However (19) does not distinguish between initiating and
(1k- enabling events. s-Importance functions for unit failures in
+ (1 - IJk(nk)At)(l -
bn.,rkj,)(1 - bnkk) (13) example system #1 are shown in table 6.
Eqs (8), (11)-(13) show that probabilistic PCS behavior is TABLE 6
described by. Example System #1 s-Importance Functions for Unit Failures
-

System Event
N JR+r Failure (Unit/Failure type) s-Importance Function
Pn j(t+At) =
n'=1
r qn;J (At)pn j,(t)
j'=1
n
(14) 4 8
Dryout 1/fails on L Wn d/ L Wn,d
n=1 n=1
8
qn j (At) =
Dryout 2/fails off (W3,d + W4,d + W7 d + W8,d)/
Wn=
8
d

k Dryout 3/fails off (W2 d + W4 d + W6 d + W8 d)/ E Wn d


(1/vj) II ck(nkln,j'-j,At) sv,dv ej(x'+fn(x') 8 8 n=1
t-1
At)
Jt) - J,@,R; J= 9@ JR
s * 1 Overflow 1/fails off
n=4
wno/ E Wn,0
n=1 8
Overflow 2/fails on (wl,o + w2,o + w5,0 + w6,O)/ n=1
S Wn,0
8
(6n1,n/Vj1) 1,dv'ej (x' +fn,(x ')At) (15) Overflow 3/fails on (w.4+ w3,0 + w50 + w7,0)/ E W.
| j= 1,...,JR;Ij=JR + 1,...,JR + F

The transition matrix (15) contains the arbitrary


n 6n,, otherwise. parameters Jr (by step #2 in section 6.1) and At. Check of
model accuracy involves determining the sensitivity of
The qn';' (at) constitute a stochastic matrix (by (13), (15), predicted system behavior to the choice of Jr, At (sections
step 2 in section 6.1 and the definition of N). Since neither 8.2 and 8.3). Eqs (8) and (9) show that Pn,1(t) approach
Ck nor fnt(x't) in (15) are explicit functions of time (by p(n, xv t) with increasing Jr and decreasing At.
140 IEEE TRANSACTIONS ON RELIABILITY, VOL. R-36, NO. 1, 1987 APRIL

TABLE 7
Example System #1 - Data For Test Cases A - L
Cdf

Case Set Points Flow Rates Number of Vj At


(meters) (meter level change/minute) in V, (minutes) 6 Fd()
1l Of 2 Xi X3 J 2 J3 .5

A -1 +1 0.01 0.01 0.01 1 1 1 30 r/


B -1 +1 0.01 0.01 0.01 1 1 1 60 4
C -1 +1 0.01 0.01 0.01 1 1 1 120
D -1 +1 0.01 0.01 0.01 1 1 1 150 3
E -1 +1 0.01 0.01 0.01 1 1 1 240
F -1 +1 0.01 0.01 0.005 1 1 1 30
G -1 +1 0.01 0.01 0.01 2 2 2 30 2
H -1 +1 0.01 0.01 0.01 4 4 4 30
J -1 +1 0.01 0.01 0.02 1 1 1 30
K -0.5 +0.5 0.01 0.01 0.01 1 1 1 30
L -1.5 +1.5 0.01 0.01 0.01 1 1 1 30 ____._._._._._._._._._._._.
O 2
1 3 4 6 7 10 1 1 12 13 14
Unit Failure Rates Allowed Liquid Level time (100 hours)
Case (x 10'5 / minute) Range (meters)
>51 >52 )\3 a 5
A-L 5.2 7.6 9.5 -3 3 Fig. 8. Cdf for Overflow (F,(t)) and Dryout (Fd(t)) -
Case F

8. NUMERICAL IMPLEMENTATION Eq. (21) is consistent with the discrete-time nature of (14).
OF THE METHODOLOGY For case F in table 7, the action of unit 3 cannot restore
the process variables to the nominal control region (ie, a, <
The methodology was implemented on Example x < a2) if unit 1 is failed on when unit 2 fails off. The exact
System #1 for the test cases and data shown in table 7. The 2)mttcvle o Cdfd for rot and
o dryout vrlwae
n overflow
objectives in the
objetivs selection of
intheseletio of these test caes
hes tes cases were. wre.asymptotic
values of are:

1. To compare the predictions by (14) to analytical F +


results F'o(oo) = X2/(X1 + X2). (23)
2. To investigate the sensitivity of (16)-(19) to the
choice of Jr (state space discretization) and At (time dis- Figure 8 shows that Fd(t) and F0(t) as predicted by (16) ap-
cretization) proach (22) and (23) respectively for t - oo.
3. To demonstrate that the methodology can be used The exact pdf for dryout and overflow for case F are
to quantify the effect of variations in PCS parameters difficult to find since the time lag between a unit failure
(such as setpoints) on the failure characteristics of the and system failure depends on the liquid level at the time of
system. Setpoint drift can be a concern in nuclear safety unit failure. For t > (a1 - a)/xk a good approximation to
[18]. exact pdf for dryout is
For all the test cases the liquid level in the tank is initially in N
the nominal control region: E Wn,d(t) = [1 - exp(- X2t)]exp(- Xlt). (24)

Pr{x(O) = x, (0) = S -= ,2/(a2 - al), if al < x < a2 Figure 9 compares the prediction by (24) to results obtained
- -tvO , otherwise. from (18) for dryout. The pdf for overflow predicted by (18)
is also shown. As anticipated, there is an observable dif-
8.1 Comparison of the Analytic and Numeric Results ference between the results of (18) and (24) in the beginning.
The difference becomes negligible for t > 150 hours.
For cases A-D in table 7 the normal operating condi-
tions of example system #1 correspond to n = 2,1j 2 in 8.2 Sensitivity of the Analysis
Pn 1(t). The exact expression for pdf{example system #1 llResults to Time Discretization
normally operating at t = k At (k = 1, 2, ..., )} =
- -- ~~~~~~~~~~~Figure
10 shows the change in pdf for dryout with At.
P2,2(kAt) = exp{ - (Xi1 + X2 + X3)kAt]. (20) Eqs (20), (21) and figure 10 imply that, for XkAt < 1,
predictions by (14) are not appreciably affected by choice
The results obtained from (14) for cases A-D indicate that of At. The implication is true only if At allows correct
(20) is approximated by: description of system dynamics under modeling assump-
tion #2: For both cases D and E in table 7 Xi3At < 1 (X<3At
p2,2(kAt) = (1 - (X1 + X2 + X3)At] k. (21) 0.02). However, ill case E, At > (a, - a)il so that if:
ALDEMIR: COMPUTER-ASSISTED MARKOV FAILURE MODELING OF PROCESS CONTROL SYSTEMS 141

pdf (/hour) Some potential impacts of discretized state space on


predicted system behavior are discussed in (9].
The sensitivity of the analysis results to discretized
state space was investigated through cases A, G and H.
,50 yY \ \ | The shape of the distribution
N N r3
p(x, t) = n=1
E p(n, x, t) = n=1
E APnj(t) ej(x)
j=1
(25)
pdf (/hour)

o
O
Dryout
Dryout
Eq.(18)
-
Eq. (22)
-
io-
~~~~~~~~~~~~~~~.)Case
(D\ Case D

G) Overflow - Eq.( 1)

2 3 4 5 6
time (100 hours)

Fig. 9. pdf for Overflow and Dryout-Case F o\


pdf (/hour)

1o05
0 Case A
Case 0

10-7
1 2 3 4 5 6 i 8 9 10 11 12
time (100 hours)
Fig. 11. pdf for Dryout-Cases D and E
10-6
P(i,t) t 200 hours
.42-

.40-

.38 -Case G
.36 -- CaseH
.34 - CaseA
.32

.30
10 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~.28
2 3 4 5 6 7 8 9 10 .201
12

time (100 hours) .26


.24

tively). Figure 11 shows the difference in predicted pdf for .04F


.206
ttl
dryout. The corresponding errors on Td and Ed (oo) are .12
68070 and 270!7o, respectively. Case E predicts a shorter Td .10_
and larger Fd(oo) as anticipated. O
8.3 Sensitivity of the Analysis Results
o04
to State Space Discretization .02
The discrete representation of the continuous state-Liquid Leve (
space of the process variables in probabilistic model
construction is not unique to the proposed methodology. Fig. 12. Variation of p(x, t) with Partitioning V1-Cases A, G, H
142 IEEE TRANSACTIONS ON RELIABILITY, VOL. R-36, NO. 1, 1987 APRIL

was found to be sensitive to the choice of Jr (figure 12). pdf (/hour)


The changes in pdf for overflow and dryout were negligi-
ble. However, the results cannot be generalized to other
types of PCS (eg, nonlinear systems). Q Case A - Overflow
( Case J - Overflow
8.4 Variation in System Parameters - Case A - Oryout
\( Case J - Dryout

Case J in table 7 reflects an increase in X with respect


to case A. Cases K and L represent changes in the system '
set points. Figures 13 and 14 show that these changes can
lead to a redistribution of the Cdf for system failure -"
among the different failure states. Figures 15 and 16 show
the corresponding changes in pdf. Table 8 shows the effect
of system parameter variations on the s-importance of the
event, "unit I fails off" to system failure by overflow.
Cdf
1.0

0.9
jD
0.7

0.6 / ([
0 tCaseJ - Overflow
/
/ (2~~~)
Case A - Overflow v \ ''
0.5 / Case A - Dryout

0.4 1 Case J Dryout


0.3

0.'

time 12 hours)
11 5 ; 1;
0
0.1 7 2 3 4 5 6 7 8 9 (100
1

time (100 hours)


Fig. 15. Variation in pdf Due to Variation in the Flow Rate
Fig. 13. Variation in Cdf Due to Variation in the Flow Rate Through Unit 3 - Cases A and J
Through Unit 3- Cases A and J pdf (/hour)
Cdf

1.0

0.9

0.1
0.6
0.5~~~~~~~~~~~~~~~~~~~~~~~~~~~i-

0.4-
0.3-
0. 0~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ Case
~~( K
0Case A
0.1 Case L
o
2
.
3 4 5 6 7 U 1
.
I.2 1 .3,14
4 l

time (100 hours)

Fig. 14. Variation in Cdf for Overflow Due to Variation in Set 10


Points - Cases A, K, L
TABLE 8
Example System #1 - Effect of PCS Parameter Changes on \ \
s-Importance of "Unit 1 Fails Off - System Overflows" \ \
Steady-State\
Case s-Importance
timze (120 hours)
A 0.72 o°-L _______.__.___.__.__.___. __. __. ___. __
F 1.00 1 2 '3 4 5 6 7 S 9 13 11 12
J 0.91
K 0.70 Fig. 16. Variation in pdf for Overflow Due to Variation in Set
L 0.74 Points - Cases A, K, L
ALDEMIR: COMPUTER-ASSISTED MARKOV FAILURE MODELING OF PROCESS CONTROL SYSTEMS 143

9. DISCUSSION * expressing the k-step transition probabilities in


terms of the eigenvalues and eigenvectors of the transition
The results on Example System #1 show that our matrix (15)
methodology can be effective for the accurate failure * Z-transform [24] or geometric transform [11] of
analysis of PCS and is suitable for sensitivity analyses. (14).
However, obtaining the aivens for the methodology may
necessitate using other failure analysis techniques. For ex- 9.2 Common-Cause Failure
ample, if the control units in Example System #1 are sub-
systems with many components, a fault-tree analysis may As indicated in section 4, assumption #4 on system
be necessary to determine the failed control unit states and properties implies that common-cause failure events are
failure rates. not under consideration. Eq (12) follows from assumption
The assumptions on PCS properties in section 4 in- #4. However, the methodology can model common-cause
dicate the limitations of the methodology regarding system failures given:
type. Section 9.1 comments on the computational limita-
tions. Section 9.2 addresses common-cause failure events. * Type and frequency of common cause events
* Control unit states following common-cause events
9.1 Computational Limitations * Transition rates from operational to failed unit
states when there is no common-cause failure.
The size of the transition matrix defined by (15) is a
function of Jr, as well as K and N. For PCS with more than Example System #3 Suppose the control units 1-3 in Ex-
a few process variables and control units, a refined parti- ample System #1 share a common level sensor.
tioning of V (large Jr) may lead to prohibitive computer
storage requirements. Sometimes these requirements can Given
be reduced by merging [14] or sparse matrix [23] tech-
niques. 1. Sensor can fail high or low with frequency il1 and
A refined partitioning of V may not be always 712, respectively
necessary for accurate model construction. The steady- 2. If sensor fails high, the states of units 1, 2, 3, are
state probabilities for dryout and overflow in Example on, off, off respectively (i = S4 by table 2)
System #1 are not influenced by the x dependence of p(x, t) 3. If sensor fails low, the states of units 1, 2, 3, are
in (25) (observation #1). Singh & Billinton made an off, on, on respectively (i = S5 by table 2)
analogous observation (observation #2) when using the 4. When the sensor is good, the transition rates from
method of supplementary variables for the reliability operational to failed states of units 1, 2, 3, are X1, X2, X3
modeling of a bank of transformers [19]. Since our meth- respectively.
odology is similar to the method of supplementary vari-
ables, the implication of observations #1 and 2 is that cor- Then -
rect steady-state failure characteristics can be obtained
with minimum Jr (ie, Jr = 1 if Vr are pairwise disjoint in- h(n n', j' -j, At) = h(n nj', j' -1j, At)[1 - (01 + 712)At]
tervals). Thus computer-storage requirements can be
reduced without compromising model accuracy. Further + At(64,, i71 + 65,n 712)
research with nonlinear systems is needed to confirm this
implication. The h(n n', j' - j, At) are determined by (12), (13) and
Computer storage requirements can be also reduced given #4 for Example System #3.
by choosing At small enough that only the transitions be-
tween neighboring Vj are possible. Consider case H in Notation
table 7. The transition matrix has 12544 elements. If At <
25 minutes (instead of 30 minutes as in table 7), the data 711, 772 frequency of sensor failing high or low, respectively
for case H and table 2 imply that Pr{x(t') E Vj, -x(t + h(n n', j' j, At) Pr{i(t + At) = Sn i(t) = Sn, x(t)
At) E Vj} = 0 for ]>]j' + 1. Thus the transition matrix is E -/ -x(t + At) E Vi, common-cause event has
block tridiagonal and 9856 elements of the transition not occurred}
matrix are-zero. Such a At also assures the correct descrip-
tion of system dynamics under modeling assumption #2. A APPENDIX
bad aspect of the choice of a small At to reduce storage re-
quirements is that, for highly reliable control units, using Derivations
(14) to determine the steady-state system behavior leads to Deiaonf(8
long computation times. However, the steady-state failureDeiaonf(8
characteristics can be sometimes directly determined by. By definition of Vi and Ve, the (7) can be written as
144 IEEE TRANSACTIONS ON RELIABILITY, VOL. R-36, NO. 1, 1987 APRIL

pn,j(t + At) = REFERENCES


N JR [1] S. A. Lapp, G. J. Powers, "Computer aided synthesis of fault
n'=lQI'=l
E3E
|-dv J
~ivdv'
J
q(n, x In', x', At) p(n', x', t) trees", IEEE Trans. Reliability, vol R-26, 1977 Apr, pp 2-13.
[2] S. A. Lapp, G. J. Powers, "Update of Lapp-Power fault tree syn-
JR+r thesis algorithm", IEEE Trans. Reliability, vol R-28, 1979 Apr, pp
+ r ~~~~~~~~~~~~~~~~~~2-5.
+ I ,dv
j iv dv' q(n, xln', x', At)p(n', x', t). [3] H. Kumamoto, E. J. Henley, K. Inoue, "Signal-flow-basedgraphs
Y=JR+l -- for failure-mode analysis of systems with control loops", IEEE
(A. 1) Trans. Reliability, vol R-30, 1981 Jun, pp 1 10-116.
Since x' E
V. are absorbing states: [4] J. R. Taylor, "An algorithm for fault-tree construction", IEEE
Trans. Reliability, vol R-31, 1982 Jun, pp 137-145.
[5] H. Kumamoto, E. J. Henley, "Safety and reliability synthesis of
ivjdv q(n, x n ', x ', At) = ey(x t )bn ,
X E Vw (A.2) systems with control loops", AIChE Journal, vol 25, 1979 Jun, pp
[6] E. J. Henley, H. Kumamoto, "Comment on. Computer aided syn-
Let p(n, x, t) be constant within Vj (modeling assumption thesis of fault trees, IEEE Trans. Reliability, vol R-26, 1977 Dec, pp
#1). Then by definition of pn j(t) - 316-317.
[7] M. 0. Locks, "Synthesis of fault trees: An example of non-
JR coherence", IEEE Trans. Reliability, vol R-28, 1979 Apr, pp 12-15.
p(n, x, t) = E p.(t) ej(x) (A.3) [81 P. K. Andow, "Difficulties in fault-tree synthesis for process
j=i V1 plants", IEEE Trans. Reliability, vol R-27, 1980 Apr, pp 1-9.
[9] P. K. Andow, "Fault trees and failure analyses: Discrete-state
representation problems", TransIChemE, vol 59, 1981 Jan, pp 1-9.
Substituting (A.2) and (A. 3) in (A. 1) yields (8). [10] M. Galluzo, P. K. Andow, "Failures in control systems", Reliabili-
ty Engineering, vol 7, 1984 Feb, pp 125-128.
[11] R. A. Howard, Dynamic Probabilistic Systems, Volume I: Markov
Derivation of (11) Models, John Wiley & Sons, 1971.
[12] M. L. Shooman, Probabilistic Reliability: An Engineering Ap-
By (1), (5), and definition of Jr proach, McGraw Hill, 1968.
[131 S. K. Srinivasan, K. M. Meheta, Stochastic Processes, McGraw
h(n In ', x' - x,h(n,IAt)
At)x' = n', x,- ~ ~ ~ ~ ~ ~ Hil ,1978.
[14] I. A. Papazoglu, E. P. Gyftopulos, "Markov processes for reliabili-
JR JR ty analysis of large systems", IEEE Trans. Reliability, vol R-26,
E E
j=1=1J'
h(n In',j' - j, At)ej(x)e (x'). (A.4) 1977Aug, pp 232-237.
[15] A. Renyi, Probability Theory, Northolland Publishing Co., 1970.
[16] M. F. Neuts, Matrix-Geometric Solutions in Stochastic Models An
Thus (10) and (A.4) imply - Algorithmic Approach, The John Hopkins University Press, 1981.
[17] V. Amoia, G. De Micheli, M. Santomauro, "Computer-oriented
formulation of transition-rate matrices via Kronecker algebra",
v.dv
V
j' I v' '.dvq(n, x n ', x ,At) IEEE Trans. Reliability, vol R-30, 1981 Jun, pp 123-132.
[18] D. D. Sharma, D. W. Miller, "A Risk-cost benefit approach to

=h(n n', j'-j, dvrv


JV>,-
' J jX
dv g(X
d g( n, At)
establishing treshold values for critical nuclear plant safety
parameters", Trans. Am. Nucl. Soc., vol 44, 1983 Nov, pp 361-363.
[19] C. Singh, R. Billinton, "Reliability modeling in systems with non-
j=1 JR (A.5) exponential time distributions", IEEE Trans. Power App. Sys., vol
PAS-92, 1973 Apr, pp 790-800.
[20] E. J. Henley, H. Kumamoto, Designing for Reliability and Safety
By assumption #4 on system properties Control, Prentice-Hall, 1985.
[21] N. J. McCormick, Reliability and Risk Analysis, Academic Press,
K 1981.
h(n n', j' -j, At) = 1 Ck(nk |In', j' -- j, At). (A.6) [22] C. Dinglinson, H. Lambert, "Interval reliability for initiating and
k=1 enabling events", IEEE Trans. Reliability, vol R-32, 1983 Jun, pp
150-163.
By definition of g [23] R. P. Tewarson, Sparse Matrices, Academic Press, 1973.
[24] B. C. Kuo, Analysis and Synthesis of Sampled Data Systems, Pren-
~dvg(xjIn', x', At)
|Vidvg(xln', x', z\t) =Pr{x(t ± At) E
= Pr{x(t + At) E Vjli(t) Vjji(t) tice Hall, 1963.
[25] N. B. Haaser, J. A. Sullivan, Real Analysis, Van Nostrand
Reinhold Co., 1971.
=Sn',x(t) = x'} (A.7)
By
(9)
- ~~~~~~~~AUTHOR
Tunc Aldemir; 206 West 18th Avenue; Columbus, Ohio 43210 USA.
At) E V1 i(t) = S,,' X(t) = X' }
Prxt+ Pr{x(t
~ ~ ~ ~ ~ ~ Tunc
~ ~Engine rinAldemir
g at the Ohio
is Assistant Professor of Mechanical and Nuclear
State University. He received his PhD in Nuclear
Engineering from the University of Illinois in 1978. T. Aldemir worked at
= e1(x' + fn (x ')At). (A. 8) the Cekmece Nuclear Research and Training Center, Istanbul, Turkey be-
tween 1978-1983. His research interests are nuclear reactor safety, prob-
Eq (A.5) is (12) in section 6.2. Substituting (A.7) and (A.8) abilistic system analysis, and distributed parameter system optimization.
into (A.5) yields (11). Manuscript TR85-066 received 1985 July 15; revised 1986 June 19. ** *

You might also like