Reliability Engineering and System Safety: Andre Kleyner, Vitali Volovoi

ARTICLE IN PRESS
Reliability Engineering and System Safety 95 (2010) 606613
Contents lists available at ScienceDirect
Reliability Engineering and System Safety

journal homepage: www.elsevier.com/locate/ress
Application of Petri nets to reliability prediction of occupant safety systems with partial detection and repair
Andre Kleyner a,, Vitali Volovoi b
a b
Delphi Corporation, Electronics and Safety Division, P.O. Box 9005, M.S. CTC 2E, Kokomo, IN 46904, USA School of Aerospace Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
a r t i c l e in f o
Article history: Received 17 July 2009 Received in revised form 15 January 2010 Accepted 19 January 2010 Available online 29 January 2010 Keywords: Safety critical Failure on demand Occupant safety Petri nets System availability Fault detection Airbag IEC 61508 ISO 26262
a b s t r a c t
This paper presents an application of stochastic Petri nets (SPN) to calculate the availability of safety critical on-demand systems. Traditional methods of estimating system reliability include standardsbased or eld return-based reliability prediction methods. These methods do not take into account the effect of fault-detection capability and penalize the addition of detection circuitry due to the higher parts count. Therefore, calculating system availability, which can be linked to the systems probability of failure on demand (Pfd), can be a better alternative to reliability prediction. The process of estimating the Pfd of a safety system can be further complicated by the presence of system imperfections such as partial-fault detection by users and untimely or uncompleted repairs. Additionally, most system failures cannot be represented by Poisson process Markov chain methods, which are commonly utilized for the purposes of estimating Pfd, as these methods are not well-suited for the analysis of non-Poisson failures. This paper suggests a methodology and presents a case study of SPN modeling adequately handling most of the above problems. The model will be illustrated with a case study of an automotive electronics airbag controller as an example of a safety critical on-demand system. & 2010 Elsevier Ltd. All rights reserved.
1. Introduction 1.1. Reliability of safety-critical systems Reliability of safety-critical systems receives special attention in many industries including automotive, aviation, energy and chemical. Examples of the safety-critical systems include emergency power generators, re alarms and occupant safety systems such as airbags, seatbelt pretensions and knee bolsters. In many cases product specications include the expected reliability or some other values reecting the probability of such system to be operational when required. In some instances reliability numbers are legislated. For example, the International Electrotechnical Commission established an international standard, IEC 61508 [1] that applies to almost all electrical/electronic/programmable electronic safety-related systems including the associated with it automotive industry standard ISO 26262. In the case of the occupant safety systems such as airbags, reliability numbers are specied by the automotive OEMs and are treated as safety critical.
Corresponding author.
E-mail addresses: andre.v.kleyner@delphi.com (A. Kleyner), vitali.volovoi@ae.gatech.edu (V. Volovoi). 0951-8320/$ - see front matter & 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.ress.2010.01.008
The ability to estimate reliability of a safety system is critical to a successful system design and there are various ways to approach this problem. A reliability prediction is one of the most common forms of reliability analysis for calculating failure rate and mean time between failures (MTBF). When actual product reliability data is not available, standard-based reliability predictions [2] may be used to evaluate design feasibility, compare design alternatives, identify potential failure areas, trade-off system design factors and track reliability improvements [3]. However, reliability prediction of a safety system working in on-demand mode might require an approach involving the probability of the system being available on demand (1 Pfd), where Pfd is the probability of failure on demand. A good example of an on-demand emergency system would be an occupant safety system such as an airbag controller which includes functions such as crash sensing and generating the signal to deploy an airbag. Depending on design, an airbag controller may have a number of ring loops ranging from 2 to 24 or more (see for example [4]). Automotive occupant safety systems have been evolving for the past 20 years to reduce the number of automobile injuries and deaths. Initially, individual passive devices and features such as seat belts, airbags and knee bolsters were developed to help save lives and minimize injuries when an accident occurred. Today, heightened industry and consumer safety initiatives and increased government regulations strive to
ARTICLE IN PRESS
A. Kleyner, V. Volovoi / Reliability Engineering and System Safety 95 (2010) 606613 607
provide increased protection to vehicle occupants under any condition. In order to account for the fault detection and consequent repair of the system, the availability, or (1 Pfd), should be evaluated instead of simple reliability. In addition to that, for safety-related systems, reliability requirements in product specications are typically very high (0.9999 and higher), which would associate with SIL 34 categories of IEC 61508 [1] or ASIL CD of ISO 26262. Therefore, traditional reliability demonstration testing would be cost prohibitive due to the extremely large number of test samples required to demonstrate those kinds of numbers [5]. The only reasonable option to meet the specication would be to conduct a comprehensive modeling of the system availability. To this end, the use of stochastic Petri nets (SPN) can be suggested, as described next.
1.2. Reliability modeling using stochastic Petri nets A graphical framework called Petri nets was introduced by C.A. Petri [6]. This framework focuses on modeling component states that comprise the system, so that the state of the system can be inferred from the states of its components. Possible states (called places) are denoted with circles with the objects called tokens (denoted by small lled circles) occupying one of the places at a time. The combined position of all the tokens in Petri net is referred to as marking. Possible paths of token movements among places are modeled using so-called transitions depicted as lled rectangles. Movements of the tokens correspond to ring of transitions, where the tokens from all input places are removed, and tokens are deposited to all output places for this transition. Importantly, the ring of a transition can only occur when it is enabled, i.e., certain conditions are satised. For example, an inhibitor arc that connects a place anywhere in the model and a transition (the arc is depicted by using a hollow circle at the transitions end) can disable the transition if there is sufcient number of tokens in the place (this place does not have to be either input or output place for this transition). The original Petri net has not included the concept of time, so that enabled transitions re immediately. Such Petri nets can be particularly useful in safety assessments as formal methods are available to analyze so-called reachability of undesirable (unsafe) states and identify non-trivial scenarios that can lead to unsafe states [7]. These scenarios are of great importance for safety and reliability as they are analogous to cut sets in fault trees in the dynamic context where the order of events is taken into account. However, the likelihood of those scenarios cannot be quantitatively evaluated without explicit account of timing. To this end, an extension called a stochastic Petri nets (SPN) was developed some years later [8] and is a subset of so-called nonautonomous Petri nets [9] and is of particular relevance to the modeling of time-dependent system reliability (see, for example [10,11]). SPN introduces delays between the enabling and ring of a transition that are transitions attributes and can be either absent, deterministic, or sampled from a given distribution (stochastic). It is possible to provide an equivalent model to the Markov representation exponential distributions for ring delays. SPN is often used as a modeling preprocessor: so the model is internally converted to Markov state space and solved using standard Markov methods [12]. However, a discrete event (e.g. Monte Carlo) simulation can be used to solve SPN directly [13] as opposed to using the Markov method, which allows the use of non-exponential statistical distributions. Depending on the conguration of the system, the error due to the use of an exponential delay with the same mean value for non-exponential distributions can be quite signicant [14,15].
If, as in colored Petri nets [16], tokens can have unique identities (labels), an alternative interpretation of ring facilitates the preservation of the information about the systems past states: rather than considering removing a token from the transitions input place and depositing a different token to the output place as two disjoint actions, one can unite these two actions into a single action of moving the same token from an input place to the output place. Memory can be assigned to tokens with the result of the aging tokens [17]. Such tokens can move freely throughout the Petri net without losing their memory. While proliferation of great variety of versions and modeling styles used in SPN modeling can be construed as a testimony of its popularity and exibility, this also facilitates confusion among reliability practitioners who are used to relatively rigid and standardized frameworks such as reliability block diagrams and fault trees. In this context, clarity and simplicity of modeling is of great importance [18] and the reader is invited to compare the models presented in this paper to the previously published models that address a similar application [19].
2. Model formulation This section will present several modeling scenarios for the reliability/availability of an automotive occupant safety system simulated as on-demand emergency system.
2.1. Modeling procedure The function of the automotive occupant safety system is to provide an emergency function (e.g., airbag deployment) in response to an event such as a vehicle crash. The simplied version of an emergency on-demand system in Fig. 1 consists of the fault-detection system, power supply and user warning system, which can be as simple as a warning light. The systems failure to perform its functions (i.e., to deploy an airbag in the case of vehicle crash) can occur when the emergency system failure is combined with one of the following conditions: 1. System failure is not detected by the fault-detection system. 2. System detected the problem, but failed to notify the user. 3. System notied the user, but the user failed to take reparative action. 4. The repair was scheduled or initiated, but was not completed before the vehicle crash.
Fig. 1. Simplied diagram of an automotive occupant safety system with fault detection.
ARTICLE IN PRESS
608 A. Kleyner, V. Volovoi / Reliability Engineering and System Safety 95 (2010) 606613
Condition 4 may occur because it takes a certain amount of time for the system to be repaired. In exponential form this feature is expressed as m, mean time to repair (MTTR) [5]. It is important to note that most vehicles remain in use for extended time periods often exceeding 1015 years. It is also noteworthy that the human factor is involved in the key decisionmaking and repair processes. Therefore, the modeling of the safety system is further complicated by the list of factors classied as imperfections below: 1. Detectability of the system failure is less than 100%. 2. Emergency system power supply (e.g., vehicle battery) may fail during the crash. 3. Warning light can go unnoticed after the fault is detected (the human factor). 4. Repairs may not be initiated due to nancial, timing, or other considerations. As an example, vehicle age and market value become important factors in repair decision, where percent of repaired vehicles diminishes as vehicle age increases and market value decreases. Accounting for the factors above makes the modeling more challenging, but also more realistic. 2.2. System availability If no repair of the safety system is considered, the reliability of the system by the end of the design life is R(T). When system failures follow exponential distribution, they can be represented by the constant failure rate l. If all failures are detected and repaired with a mean time to repair MTTR=1/m, where m corresponds to a repair rate [20]. The unavailability (probability that the system will not work on demand) can be well approximated by a steady-state solution, providing the following estimate:
s Pfd
upon it. Consequently, (1y) fraction of systems will not be repaired after fault detection. Therefore, the overall probability of pd failure on demand under a perfect detection scenario Pfd is easily calculated as
pd s Pfd t 1y1Rt yPfd t pd Pfd
is the probability of failure on demand when all the where failures are detected (perfect detection), 1 y is the portion of the population which would not repair the failed system due to economic reasons or a failure to notice warning light. It is important to note that in certain cases, the function R(t) can be represented by a mixture of statistical distributions to reect the change in failure rate, for example in accordance to the bathtub curve (see for example [21]). In those cases R(t) should be addressed accordingly in the modeling process. 2.3. Dynamic modeling Rather than separating the whole population into two subgroups, let us assume that the decision as to whether to repair a detected failure is made every time the warning signal appears with the probabilities y and 1 y, respectively. This decision is considered to be independent of previous repair decisions for this system (e.g., the system that has been repaired the rst time might not be repaired the second time). In addition, for the moment we consider that y does not depend on time (this assumption will be relaxed later). While the difference is subtle it results in a need for dynamic (i.e., state-space based) reliability modeling. To this end, Markov analysis is widely used in modeling electronics reliability [12], but it has two well-recognized deciencies. The rst deciency is related to the large number of possible system states (on the order of kn where k is the , number of possible states for each component and n is the number of such components) that are needed to represent all possible permutations. Although this issue can be mitigated by the use of symmetry and hierarchical (nested) calculations, it remains an important limitation. The second limitation is a natural use of constant transition rate (following exponential distribution) due to the Markovian property. To illustrate the dynamic solution, let us present the system described in Section 2.2 as a state-space solution (Fig. 2), i.e., Markov chain. Initially the system is in state A, which corresponds to a fully operational safety system with the detection system functioning as intended. Transitioning from state A to state B indicates that
l lm
s where Pfd is the probability of failure on demand (fd), the steadystate solution (s). However, taking into account the considerations listed in Section 2.1, Eq. (1) might represent the system unavailability inaccurately. Due to the fact that both detection and repairs are taking place less than 100% of the time, the real time-dependent probability of failure on demand Pfd(t) will lie somewhere between this lower boundary and the unreliability of the system at the end of its life that never undergoes repairs s nr Pfd rPfd t rPfd t 1Rt
nr where Pfd is the probability of failure on demand for the system, which does not undergo repairs, R(t) the reliability of the system under no-repair condition (conventional reliability function). Importantly, those bounds (2) are quite wide, which provides motivation for a more rened analysis. Due to the factors listed as imperfections in Section 2.1, a certain percent of the vehicle population will not be subject to repair after the fault has been detected. In the simplest, static scenario, the total population of the system can be separated into two subpopulations based on whether a detected failure will be repaired or not. Let us dene y as the percent of the population of the vehicles subject to repair. This percent would include the drivers responding to the warning light as opposed to those who would ignore or not notice it. Next, we can consider the combined effects of failure of the detection system and the presence of a subpopulation that does not notice the warning or fails to act
Fig. 2. Markov chain for a system with imperfect detection (yportion of the vehicles subject to repair).
ARTICLE IN PRESS
the main function has failed with the corresponding transition rate l, but the detection system is still operational, and hence the driver receives a warning. On the other hand, if the detection subsystem fails rst (with the corresponding transition rate n), the system transitions to state D (note that detection sub-system is considered to be non-repairablethis assumption can be relaxed, e.g., if periodic inspections are introduced). The transition from D to E corresponds to the failure of the main function of the safety system after the detection system has failed, so this failure cannot be detected, and therefore is not repaired, hence there is no reverse transition from E to D. Once the system transitions into state B, a decision is made whether to repair the system or not with the probabilities y and 1 y, respectively. This decision is modeled using a ctitious transition c that is very large (a specic value of c is immaterial as long as it is several orders of magnitude larger then the other transition rates in the model). Assigning a transition rate cy from state B to state C (ready to repair) and the rate c(1 y) from state B to state E (non-repairable system failure) ensures that B is a vanishing (transitional) state. A choice between repair (state C) and non-repair (state E) occurs with the desired probabilities. Stochastic Petri nets are capable of addressing the main shortcomings of Markov chains. As mentioned in the Introduction, SPN focuses on modeling component states that comprise the system, so that the state of the system can be inferred from the states of its components rather than dened explicitly as required by Markov state space. Places in SPN are similar to Markov states, but SPNs tokens can represent individual components of the system and therefore allow differentiation among the state spaces for those components. As a result, marking (i.e., combined position of all the tokens) provides a means to describe the system as a whole implicitly, without the need to explicitly depict the corresponding system state, thus potentially mitigating the state-space explosion. Effective system modeling using SPN involves its decomposition into a set of relevant entities, where each entity does not necessarily represent a physical component of the system, but describes a phase of operation, or environmental condition. Fig. 3 provides the SPN model of the system shown in Fig. 2 using Markov space. The top two places describe the two possible states of the failure detection sub-system, while the bottom part describes the possible states of the main sub-system. Exponential delays with parameters l, m, and n are used for transitions (system failure, Repair, and Det failure, respectively). If the detection
system is operating normally (the token is in Det system OK place) when the main system fails (the token moves to System failed place), then two transitions are enabled at the same time (to Ready to repair and No repair). Just like in the Markov model (see Fig. 2) the decision is modeled by assigning those two transitions exponential rates cy and c(1 y), respectively. However, when the detection system fails rst, the corresponding token moves to Det Sys Failed and the inhibitor originating in the place prevents the transition of the system token to the Ready to repair place (the transition becomes disabled). 2.4. Modeling time-dependent parameters of the problem related to a vehicle aging In a real world the owners repair priorities often change with the vehicle age. With declining vehicle market value, the number of repairs considered by owners as non-essential is increasing. Since cost of repair remains virtually the same while vehicle value declines, the number of owners who choose to ignore warning lights steadily increases with vehicle age when the problem is considered non-critical to a vehicle performance. In order to model that phenomenon we will introduce here the renewal attrition function as a ratio of the number of repairs to the number of failures:
rt
#Parts Repaired t #Parts Failed
The typical renewal attrition function will have a shape presented in Fig. 4. Where TLife is the expected vehicle life (e.g. 10 years, 15 years, etc.), TW is the warranty term duration (e.g., 3 years, 5 years, etc.). The assumption is made that while the vehicle is under warranty all the required repairs will be performed ( 1:0 when t r Tw 5 rt f t when T ot r T W Life The following conditions apply: 0rf(t)r1 and f(TW)= 1: Therefore, once past warranty TW the percent of the repaired population y will be further reduced by the diminishing function r(t), and (3) will take form of
pd s Pfd t y1Rnr t rt1yPfd t 1rt1y1Rnr t
And consequently
pd Pfd t y s rtPfd t 1yrt1Rnr t
SPN provides a exible tool to model even minute nuances of system behavior. To demonstrate this exibility let us contrast the dynamic model in Fig. 3 with the model that represents a static choice (see Fig. 5). At the beginning of the simulation the token from place Choice can move either to No repair or Repair
Fig. 3. SPN model for a system with imperfect detection and with a portion of the vehicles subject to repair.
Fig. 4. Renewal attrition function.
ARTICLE IN PRESS
Fig. 5. SPN model describing the selection of detected failure for repair that takes place in a static manner (two sub-populations of drivers in regards to addressing the warning light).
moves from System OK to System failed, then the inhibitor from Repair Inhibited prevents the system token from moving to Ready to repair place. At the bottom of the model Fig. 6, the demand for system operation is modeled. Here, the Demand timing transition can be assigned simply a uniform distribution, which would correspond to time averaging of the probability of failure on demand (accident) or any other statistical distribution appropriate for the task. When token moves to Demand place there are two xed transitions that have durations of e (an arbitrary small number) and 2e. If both those transitions are enabled the former will re rst and the token will move to Success place. On the other hand, if the rst transition is disabled, then the token will move to Failure place. This transition should be disabled as long the system token is not in System OK place, and we could have three inhibitors starting at three other places where this token can be (System failed, Ready to Repair, and No repair). However, there is a more compact and direct way to model this situation by using a negative inhibitor (enabler) that acts in the opposite way of the regular inhibitor (and so it is denoted with a negative number). More precisely, additional conditions required for enabling a transition can be expressed by those enablers. Their action is opposite to that of a regular inhibitor: in the presence of a negative inhibitor of multiplicity k (here k40) transition is only enabled if the number of tokens n in the input place for that inhibitor n Zk. In our case for the transition from place Demand to place Success to be enabled it requires the enabler to have a token in System OK place.
2.5. Time averaging As mentioned before, for an on-demand repairable system, reliability function is of limited use and availability function would be a more reasonable measure. However, due to the fact that the system is not fully renewable and availability cannot reach a steady state, either a time averaging of availability should be considered or a full description of availability as a function of time should be given. Time averaging can be motivated by the following consideration: if a demand occurs at a random time (i.e., uniformly distributed throughout the life of the system), the relevant measure would be a probability that the system on-demand will be available. This probability would be given by Z 1 T P fd P t dt 8 T 0 fd While this formula can be used, SPN also provides an opportunity to simulate the demand directly as shown in Fig. 6. Moreover, non-uniform distributions can be used for the transition from the place demand if the demand for safetyrelated system varies with time. When this transition res the token into System failed place this token has a choice of moving either to Failed on Demand or Success depending on whether the inhibitor emanating from System OK place is engaged or not. This can be implemented by assigning the transition to Success a slightly smaller xed delay as compared to the transition to Failure, by assigning distinct colors for two tokens that can appear in the System failed place, and by implementing color-depending transitions from this place. In Fig. 6 the difference between the delays of transitions to Success and to Failure is not important as long as the former res rst when both transitions are enabled. In the following case study the xed delay e and 2e are selected for transitions to Failure and to Success places, respectively, and assigned arbitrarily as a small value of e =10 6 month.
Fig. 6. SPN model with attrition function and demand.
place with the specied probabilities. If this token moves to No repair place, then the corresponding inhibitor precludes the repair, even if the detection system operates properly (the token in place Det system OK does not move). The results of the simulation using this SPN model are consistent with analytical Eq. (7). Finally, to demonstrate a more realistic SPN model, we incorporate both the possibility of attrition and demand (Fig. 6), where the choice whether to repair takes place dynamically. Here in place Warranty (the top portion of the model) the token represents the attrition: Warranty ends transition is simply a xed delay Tw. Note if either of the top two tokens in Fig. 6 moves into Repair Inhibited by the time the system fails and the token
ARTICLE IN PRESS
m = 26.07143/1 year. If all the failures are detected and repaired,

the unavailability (probability that the airbag will not work on demand) can be well approximated by a steady-state solution:
s Pfd
l 1:31 104 lm
10
In reality both detection and repairs are taking place in less than 100% of the time, so this probability will be somewhere between this lower boundary and the unreliability of the system that never undergoes the repairs at the end of its life per (2):
s Pfd 1:31 104 o Pfd o 1R15 years 0:05
11
Fig. 7. Comparison of several approaches to account for the fact that only y = 0.98 fraction of all detected failures are repaired: static (when two populations are separated in the beginning) and dynamic (when the decision is made upon demand).
3. Case study: automotive occupant safety system The concept on an emergency, on-demand system is utilized in automotive safety systems and particularly in the design of an airbag controller unit. The original data in this case study has been modied to protect the proprietary nature of this information. The modern airbag controller is a complicated electronic system containing crash sensors capable of detecting various types of crashes (e.g. side impact vs. front collision) and 424 ring loops. The number of ring loops depends on the occupant safety options of the vehicle such as driver and passenger airbags, side curtains, rear passenger protection, belt pretensioners and dual-phase deployment. On-time deployment triggered by the vehicle crash is a safety-critical feature of a controller [22]; therefore system reliability requirements are high and, depending on a specic automotive customer, could range from 0.9999 to 0.999999. Each modern airbag controller is equipped with a faultdetection circuit that detects a system failure and triggers a warning such as a light indicator to alert the driver. The subsequent action may be either to repair or replace the faulty component [19]. Conversely, the vehicle owner may not act on the warning due to either inability to heed the warning in a timely manner or a conscious decision not to repair the system for nancial or other reasons. In order to obtain a renewal attrition function for an airbag controller an analysis of warehouse shipping history for this product was conducted (the details of this method are outside the scope of this paper). The following function was obtained: ( 1:0 when t r Tw 9 rt A eBt when T o t r T W Life where A= 1.0942, B =0.03, t is the time in years of service, TW is the 3-year warranty. Since an airbag is a typical on-demand repairable system, we would need to estimate its probability of failure on demand or availability instead of utilizing the traditional reliability function. Let us consider an example where demonstrated reliability for this system is 0.95 for 15 years. Assuming exponential distribution, the corresponding constant failure rate is l = 3.42 10 3/1 year. Let us further consider a repair with the mean value of 14 days, which corresponds to the equivalent repair constant rate
In order to model the imperfections listed in the Section 2.1 let us assume y = 0.98, meaning that 98% of the population will decide to repair the faulty system. Fig. 7 shows a comparison of dynamic and static scenarios for this value. One can observe that dynamic scenario shows a slightly higher probability of failure. Qualitatively this can be explained by the fact that two populations are separated at the beginning (static scenario); the sub-group that makes the repairs is less likely to fail and therefore the effective, dynamic fraction of this population will be slightly higher than y = 0.98. The effect is minimal for the presented values and therefore can be neglected. Please note that the results shown in Fig. 7 for dynamic model are presented using both Markov chains and SPN (obtained using 100 million Monte Carlo runs). The result for the scenario where all detected failures are repaired is also provided for reference purposes. In some instances, the difference between the static and dynamic models can be more signicant. For the case where demonstrated reliability R(15 years)= 0.5 and y =0.5 (a hypothetical scenario), the impact will be quite noticeable (see Fig. 8). Please note that while the Markov model provides the description of the system as a whole, SPN focuses on system component behavior. If the constant transition rates are used the results should be identical (see for example Fig. 7, where the results by SPN and Markov chains are practically indistinguishable for dynamic modeling). However, it is quite difcult to directly implement into the Markov chain model static subdivision into two populations to provide a model analogous to the one given in Fig. 5. Another important advantage of SPN is its ability to model transitions that have variable rates when simulation is used to
Fig. 8. Difference between two scenarios: static (when two populations are separated in the beginning and dynamic (when the decision is made upon demand).
ARTICLE IN PRESS
evaluate the model. To demonstrate this capability, let us investigate how the probability of failure on demand changes as various parameters for failure distribution is considered. Specically, let us compare Weibull distribution with shape parameters b =1 (exponential), 2, and 3 (wear-out mode). The corresponding scale parameters are calculated to match the reliability R(T)= 0.95 if no repairs are possible. This yields Weibull characteristic life of Z =292.436, 66.2309, and 40.3711 years, respectively. Larger values of b imply that the failures are relatively more likely to occur later rather than sooner, so if the reliability at the end of design life is matched and no repairs take place, the larger b implies smaller P fd . Those values are: P fd 0:0252; 0:0168, and 0:0126, respectively. The results for the scenarios with no attrition and the decision to repair made dynamically when the failures are detected with the probability y 0:98 are shown in Fig. 9. Note that unlike the cases with no repairs, the probability of failure on demand at the end of the design life increases with b rather than remaining relatively the same. However, the average value of probability of failure on demand still decreases; those
Fig. 11. Comparing probability of failure on demand as a function of time for different failure models with attrition that provide the same reliability if no repairs take place. Weibull with shape parameters b = 1 (exponential), 2, and 3.
values are: P fd 8:12 104 ; 6:33 104 , and 5:28 104 , for b = 1 (exponential), 2, and 3, respectively. Next, let us consider the effect or attrition and focus rst on the exponential failures (see Fig. 10). Note that for the rst 3 years there is no difference in the results, since r(t)= 1 within the warranty period (9). Finally, let us observe how changing the assumptions about the failures impacts the probability of failure on demand. Using the same assumptions as above for the models without attrition, we can observe (see Fig. 11) that the negative impact of attrition increases with the value of shape function b. Indeed, even the average value of probability of failure on demand will not always decrease; the corresponding values are: P fd 2:54 103 ; 2:68 103 , and 2:51 103 , for b = 1 (exponential), 2, and 3, respectively. In the cases where power source survival (vehicle battery) is a design concern (see Fig. 1) its effect on the model can be easily accounted for by multiplying the probability of successful airbag deployment (1 Pfd) by the probability of battery survival during the crash.
Fig. 9. Comparing probability of failure on demand as a function of time for different failure models that provide the same reliability if no repairs take place. Weibull with shape parameters b = 1 (exponential), 2, and 3.
4. Conclusions The proposed method illustrates numerous advantages of applying the concept of availability and stochastic Petri nets in the early stages of design reliability analysis, especially when dealing with safety-critical on-demand systems. The proposed method combines various real life factors, such as probability of the user to notice the warning signal, reliability of detection circuitry, users response time to the warning light, duration of repair, estimated down time, system age, and other relevant factors. It shows that the application of stochastic Petri nets (SPN) provides a clear advantage in performing such analyses. The presented case study with real life example illustrates the sensitivity of probability of failure on demand to various factors needed to be accounted for to provide an accurate solution to the real life problems. This innovative model presents a more realistic, exible, and accurate estimate of the systems failure rates and reliability compared to the more traditional reliability analysis techniques. In addition to that SPN provides a graphical traceability of the solution as opposed to some stochastic methods, such as custom-made Monte Carlo simulation.
Fig. 10. Effect of attrition for exponential failure.
ARTICLE IN PRESS
This method can also easily accommodate the time-dependent input variables, such as system age, which in turn may affect the renewal rate of the system. To add the exibility, the SPN method can be effectively combined with traditional reliability analysis techniques, such as Markov chains, standards-based reliability prediction, block diagrams, Weibull analysis, Monte Carlo simulation, etc. In summary, this method provides the efcient synthesis of practical engineering approach with the academic rigor of the modern stochastic simulation techniques. References
[1] IEC 61508: Functional safety of electrical/electronic/programmable electronic safety related systems, 19982000. [2] Foucher B, Boullie J, Meslet B, Das D. A review of reliability prediction methods for electronic devices. Microelectronics Reliability 2002;42:115562. [3] Kleyner A, Volovoi V. Reliability prediction using Petri nets for on-demand safety systems with fault detection. In: Martorell S, Guedes Soares C, Barnett J, editors. Safety and reliability and risk analysis. Taylor and Francis; 2008. p. 19618. [4] Product Information CG989 8-Loop Firing IC CG989 by Bosch (2006) /http://www.semiconductors.bosch.de/pdf/CG989_Product_Info.pdfS. [5] Kleyner A. Reliability demonstration: theory and application. In: Reliability and maintainability symposium (RAMS) Tutorials CD, January 2008. [6] Petri A. Kommunikation mit Automaten. PhD thesis, Institut fur Instrumentelle Mathematik, Schriften des IIM, 1962. [7] Sadou N, Demmou H. Reliability analysis of discrete event dynamic systems with Petri nets. Reliability Engineering and System Safety 2009;94:184861. [8] Symons FJW. Modelling and analysis of communication protocols using numerical Petri nets. PhD thesis, Department of Electrical Engineering Science, University of Essex, Essex, England, 1978.
[9] David R, Alla H. Discrete, continuous, and hybrid Petri nets. Berlin, Heidelberg: Springer; 2005. [10] Chew SP, Dunnett SJ, Andrews JD. Phased mission modeling of systems with maintenance-free operating periods using simulated Petri nets. Reliability Engineering and System Safety 2008;93:98094. [11] Clavereau J, Labeau P-E. A Petri net-based modelling of replacement strategies under technological obsolescence. Reliability Engineering and System Safety 2009;94:35769. [12] Trivedi SK. Probability and statistics with reliability, queuing and computer science applications, 2nd ed. John Wiley and Sons; 2002. [13] Dutuit Y, Ch telet E, Signoret J-P, Thomas P. Dependability modeling and a evaluation by using stochastic Petri nets: application to two test cases. Reliability Engineering and System Safety 1997;55:11724. [14] Faria JA, Matos MA. An analytical methodology for the dependability evaluation of non-Markovian systems with multiple components. Reliability Engineering and System Safety 2001;74(2):193210. [15] Khouas A, Derieux A, FDP: fault detection probability function for analog circuits. In: The 2001 IEEE international symposium on circuits and systems, ISCAS 2001, 69 May 2001, vol. 4. p.1720. [16] Jensen K. Coloured Petri nets. Basic concepts, analysis methods and practical use, vol. 1. Berlin: Springer; 1993. [17] Volovoi VV. Modeling of system reliability using Petri nets with aging tokens. Reliability Engineering and System Safety 2004;84(2):14961. [18] Schneeweiss WG. Tutorial: Petri nets as a graphical description medium for many reliability scenarios. IEEE Transactions on Reliability 2001;50(2): 15964. [19] Yang SK, Liu TS. Failure analysis for an airbag inator by Petri nets. Quality and Reliability Engineering International 1997;13:13951. [20] OConnor P. Practical reliability engineering, 4th ed. Wiley; 2003. [21] Kleyner A, Sandborn P. A warranty forecasting model based on piecewise statistical distributions and stochastic simulation. Reliability Engineering and System Safety 2005;88:20714. [22] Teng S-H, Ho S-Y. Reliability analysis for the design of an inator. Quality and Reliability Engineering International 1995;11:20314.

Reliability Engineering and System Safety: Andre Kleyner, Vitali Volovoi

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Reliability Engineering and System Safety: Andre Kleyner, Vitali Volovoi

Uploaded by

Copyright:

Available Formats

ARTICLE IN PRESS

Reliability Engineering and System Safety 95 (2010) 606613

Contents lists available at ScienceDirect

Reliability Engineering and System Safety

#Parts Repaired t #Parts Failed

Fig. 4. Renewal attrition function.

Fig. 6. SPN model with attrition function and demand.

m = 26.07143/1 year. If all the failures are detected and repaired,

Fig. 10. Effect of attrition for exponential failure.

You might also like