Comparison of Programmable Electronic Safety Systems Architectures

Comparison of Programmable Electronic Safety-Related System Architectures
Anton A. Frederickson, Mr., Dr. Independent Consultant member of Safety Users Group Network 10 January, 2003
Abstract
This paper discusses the concepts of risk, safety lifecycle, and safety integrity for safety-related electrical/ electronic/ programmable electronic systems (E/E/PES) contained in the International Electrotechnical Commission (IEC) 61508 Standard: Functional safety of electrical/electronic/programmable electronic safety-related systems, Parts 1 through 7. This paper utilizes information from various parts of the IEC 61508 so the concepts and methodologies can be presented in an abridged form. This paper also shows a number of PES architectures used in safety-related applications. Markov Models are used to calculate the PFDavg so the suitability of using the architectures in applications requiring different safety integrity levels can be determined. Markov Models are also used to compute MTTFspurious for all the PES architectures so the impact of spurious trips can be taken into account when selecting a PES architecture.
1. INTRODUCTION
The emphasis in this paper is on computer-based systems (referred to as programmable electronic systems (PESs)) that are increasingly being used to perform safety functions. While the PESs provide a flexible way of implementing safety functions and providing extensive diagnostics, great care must be taken to ensure the resulting systems meet the required level of safety integrity. The IEC 61508 addresses design and assessment methodologies that must be used to ensure the PESs are safe. The IEC 61508 Standard consists of seven parts. The first three parts are normative; Part 1 provides the general requirements, Part 2 provides the hardware requirements, and Part 3 provides the software requirements. The remaining four informative parts provide definitions, bibliographies, and guidelines for applying Parts 1, 2 and 3. The seven-part standard is generic and applies to all safety-related systems irrespective of the application. Examples of the application sectors coming within the scope include but are not limited to: Process industries (emergency shutdown systems, fire and gas detection systems, burner controls); Manufacturing industries (industrial robots, machine tools); Transportation (railway signaling, braking systems, lifts); Medical (miscellaneous electro-medical apparatus, radiography);
The IEC 61508 introduces the concept of Safety Integrity Levels that relate to the safety integrity required for the hardware and software used in the safety-related system. The IEC 61508 also introduces the concept of an Overall Safety Lifecycle to ensure that all activities necessary to achieve the required Safety Integrity Level are performed. Figure 1 shows the Overall Safety Lifecycle and for each phase of the lifecycle the standard specifies: The objectives to be achieved The requirements to meet the objective The scope of each phase The required inputs to the phase The deliverables required to meet the requirements of each phase
www.safetyusersgroup.com
Page 1 / 1
The objectives of this paper are to explain how the IEC 61508 relates the risk associated with the equipment under control (EUC) to the required safety integrity level of the safety-related system and how the PE logic system architecture can be selected to meet the required safety integrity level.
Concept
Overall Scope Definition
Hazard & Risk Analysis
Overall Safety Requirements
5 Safety Requirements
Allocation
Safety-related Safety-related External Risk Overall Planning 9 systems: 10 systems: 11 Reduction 6 Overall 7 Overall 8 Overall E/E/PES Facilities Other Operation & Validation Installation & Technology Maintenance Planning Commissioning Planning Planning Realisation Realisation Realisation
12
Overall Installation & Commissioning Back to appropriat Overall Safety Lifec phase
13
Overall Safety Validation
14
Overall Modification Overall Operation & 15 & Retrofit Maintenance
16 Decommissioning
FIGURE1. Overall Safety Lifecycle
Note1 Functional Safety Assessment and Verification activities are not shown for reasons of clarity but are relevant to all Safety Lifecycle phases. Note2 Boxes 10 & 11 are shown shaded to indicate that this International Standard does not deal in detail with these phases.
Page 2 / 2
2. RISK AND SAFETY INTEGRITY LEVEL

One of the challenges in developing safety-related systems for the protection of the equipment under control (EUC), is the determination of the required safety integrity level of the safety-related system. The safety integrity level required is directly related to the risk reduction necessary to achieve the required level of safety for the EUC. Parts of Annex A of the IEC 61508 - Part 5 are included in this section that discusses risk and the determination of the safety integrity level. The IEC 61508 - Part 1 defines four safety integrity levels to accommodate a wide range of risk reduction or safety integrity that the safety-related systems will have to achieve. Table 1 shows the Safety Integrity Levels (SILs) for safety related systems operating in a demand mode of operation and in a continuous/ high demand mode of operation. Target failure measures are shown for each of the four SILs to ensure that the hardware safety integrity is achieved. The seven parts of the IEC 61508 define procedures, techniques, measures, etc. that must be used for each of the four safety integrity levels to ensure the systematic safety integrity is also achieved.
SAFETY INTEGRITY LEVEL
DEMAND MODE OF OPERATION Probability of failure to perform its design function on demand PFDavg >= 10 to < 10 >= 10 to < 10 >= 10 to < 10 >= 10 to < 10
-2 -3 -4 -5 -4
CONTINUOUS / HIGH DEMAND MODE OF OPERATION Probability of a dangerous failure per year >= 10 to < 10 >= 10 to < 10 >= 10 to < 10 >= 10 to < 10
-2 -3 -4 -5 -4
4 3 2 1
-3
-3
-2
-2
-1
-1
TABLE1. Safety Integrity Levels & target failure measures
Safety integrity is defined as The probability of a safety-related system satisfactorily performing the required safety functions under all stated conditions within a stated period of time. Safety integrity consists of two elements: Hardware Safety Integrity: The achievement of the specified level of hardware safety integrity can normally be estimated to a reasonable level of accuracy since the hardware safety integrity is related to the dangerous random hardware failures and hardware common cause failures. The IEC 61508 addresses the dangerous random hardware failures by specifying target failure measures for the safety-related systems (See Table 1 above). The target values are a function of the safety integrity level. Systematic Safety Integrity: Systematic failure rates are hard to predict since they can be caused by hardware design errors, software errors, operational errors, etc. The IEC 61508 addresses systematic safety integrity by specifying procedures, techniques, measures, etc. that reduce systematic failures. The techniques, measures, etc. specified are a function of the safety integrity level.
Risk is defined as The probable rate of occurrence of a hazard causing harm and the degree of severity of the harm. In other words, risk has two elements; the frequency or probability at which the hazard occurs, and the consequences of the hazardous event. The main tests that are applied in regulating risks are similar to those we apply in daily life. They involve determining whether:
www.safetyusersgroup.com Page 3 / 3
1. The risk is so great it must be refused altogether; or 2. The risk is or has been made so small as to be insignificant; or 3. The risk falls between the two states specified in 1. and 2. above and it has been reduced to the lowest level practicable, bearing in mind the benefits flowing from its acceptance and taking into account the costs of any further reduction. The purpose of determining the risk level for a specific hazard is to state what is deemed reasonable with respect to both the frequency (or probability) of the hazardous event and its specific consequences. This risk is the risk required to meet the required level of safety. Figure 2 shows the general concept of risk reduction to achieve the required level of safety. This figure shows the EUC Risk (the risk of the equipment under control without a safety system), the risk to meet the required level of safety, and the necessary minimum risk reduction ( R). The risk reduction can be achieved by external risk reduction facilities, E/E/PES safety-related systems and other technology safetyrelated systems.
Consequence of hazardous event Risk level = EUC Risk Frequency of hazardous event External Risk Reduction Facilites E/E/PES Safety-related Systems Other Technology Safety-related Systems Risk to meet Level of Safety
Necessary minumum risk reduction
Equipment under Control (EUC)
FIGURE2. Risk and safety integrity concepts for safety-related protection system
Annex C of the IEC 61508 - Part 5, illustrates a quantitative technique for calculating the necessary risk reduction required to meet the required level of safety. The required safety integrity level for a single E/E/PES protection system can be determined directly from the required risk reduction the single -4 protection system must provide. For example if a risk reduction of 510 is required, then the probability -4 the single protection system performs its design function on demand must also be 510 . In other words, the required risk reduction is equal to the target failure measure, PFDavg, for a protection system operating in demand mode. Hence from Table 1, a safety-related protection system with a SIL of 3 is required for this example. Once the safety integrity level is known, the safety system designer can follow the techniques and measures specified for that level.
3. INTRODUCTION AND ASSUMPTIONS USED TO DEVELOP IEC ARCHITECTURAL REQUIREMENTS
PES
The remainder of this paper discusses the IEC requirements for sensor, PE logic system, and final element architectures for each of the safety integrity levels. The diagnostic coverage requirements, offline proof test interval (TI) requirements and mean time to a spurious trip (MTTFspurious) are shown for each architecture. Portions of Annex A of the IEC 61508 - Part 2 and Annex B of the IEC 61508 - Part 6 are used in the following sections. The analysis of the PE logic system was based upon the Markov Models developed by the ISA SP84.02 subcommittee responsible for developing a technical report on safety integrity evaluation techniques (TR84.0.02). (See Reference 9 in Section 8) The analysis used to determine the architectures and their associated requirements is based upon the following assumptions: 1. The safety-related systems are assumed to be protection systems operating in demand mode.
2. The safety-related systems are assumed to operate in normally energized mode, and hence outputs are de-energized to put the EUC in a safe state. 3. Only random hardware failures are used in the quantitative analysis to determine the basic configuration requirements. Systematic failures including common cause failures are taken into account by required hardware and software techniques that are defined for each safety integrity level. 4. In the dual and triple sensor, PE logic system, or final element configurations, the elements in each channel are assumed to have the same failure rates and diagnostic coverage factors. The logic solvers or PE in redundant architectures are assumed to be identical. In other words the PE is assumed to have identical redundancy instead of diverse redundancy. 5. The failure rates, percent safe failures, and configuration data used to analyze the architectures are shown in Tables A-1 and A-2. The configuration data is for one typical safety shutdown loop or interlock. 6. The basic failure rates assumed for calculating Table 3 are shown in Table A-1. These failure rates represent conservative values consistent with calculated values using guidelines of MILHDBK-217. Other reliable sources for such data could be published and verified failure rate databases. If such sources are not available, the MIL-HDBK-217 could be used. MIL-HDBK-217 contains two methods of reliability prediction: parts stress analysis and parts count. The parts stress analysis requires more information, produces better results, and is recommended. 7. Failure rate estimates computed using MIL-HDBK-217 can differ from actual failure rates by an order of magnitude or greater. SINTEF has studied the comparison using MIL-HDBK-217D and recommends the use of the following assumptions to achieve better correlation with actual failure rates. a. Ground fixed environment should be used b. The quality level/factor is based on the parts type as follows: Part Type Quality Level JAN M M Quality Factor 8 Varies with type 1 1
Integrated Circuits C Discrete Semiconductors Resistors Capacitors
8. The analysis assumes the process or equipment under control (EUC) is shut down whenever a dangerous failure is detected in the safety system and the safety system has no redundancy. This is assumed since dangerous detected failures should be repaired as soon as possible. A shutdown is required to make this repair when there is no module redundancy. Section 4 explains the PE logic system architectures. Section 5 describes the Dual PE, 1oo2D logic system configuration and illustrates the use of Markov Models to determine the PFDavg and MTTFspurious for the configuration. Section 6 shows the sensitivity of PFDavg and MTTFspurious to diagnostic coverage and test interval for all of the PE logic system architectures. Details of the quantitative analysis and the results used to develop the tables are contained in a separate document. Definitions of terms used in the development of the configuration requirements are in Section 7.
Page 5 / 5
4. PE LOGIC SYSTEM ARCHITECTURES FOR SAFETY- RELATED SYSTEMS

The following six PE logic system architectures were used for the development of the comparison data:
! ! ! ! ! !
Single PE with Single I/O and External Watchdog Dual PE with Single I/O and External Watchdogs Dual PE with Dual I/O, Interprocessor Communication, and 1oo2 Shutdown Logic Dual PE with Dual I/O, External Watchdogs, Interprocessor Communication, and 2oo2 Shutdown Logic Dual PE with Dual I/O, External Watchdogs, Interprocessor Communication, and 1oo2D Shutdown Logic Triple PE with Triple I/O, Interprocessor Communication, and 2oo3 Shutdown Logic
4.1 Single PE with Single I/O and External Watchdog

This configuration is shown in Figure 4-1. This configuration has no redundancy. The external watchdog (diagnostic) function provides a secondary means of de-energizing the outputs and putting the EUC in a safe state. This external watchdog function de-energizes the secondary contact output if a dangerous failure is detected in the logic solver or the associated output module. The outputs are shown as contacts but can be realized by solid-state switches or other means. All safe failures result in a false trip of the EUC. All dangerous detected failures also result in a false trip of the EUC since the system has to be shut down to replace any of the modules.
Diagnostics
Sensor XX YYY Input Termination Final Element IC
I P
PE
O P
OC
Output Termination
PSU 1
PSU 2
FIGURE 4-1 Single PE with Single I/O and External Watchdog
4.2 Dual PE with Single I/O and External Watchdogs

This dual configuration is shown in Figure 4-2 and has redundant main processors and external watchdogs. The switch shown in Figure 4-2 is controlled by the watchdog functions that are monitoring the diagnostic results of the main processors. The secondary means of de-energization will be activated if both the diagnostic inputs to the switch are de-activated. The switch is periodically changed to the other position so that its functionality and the functionality and the diagnostics of each processing part can be checked in the other state. The two processors compare results and if a discrepancy is detected, both of the watchdogs are commanded to de-activate the outputs. Hence any discrepancy between the processing parts will result in the outputs being de-energized to put the EUC in a safe state. Detected failures in any of the single I/O modules will also result in the outputs being de-energized. Safe undetected failures of the logic solver as well as the comparison errors
mentioned above result in a false trip of the EUC. Other detected safe and dangerous failures of either PE can be repaired on line. If a dangerous failure of the processor driving the outputs is undetected, the safety system will be in a fail-to-function state.
Diagnostics PSU - I/O
Sensor XX YYY Input Termination IC
PSU - PE A PE A I P PE B PSU - PE B Diagnostics

Final Element
Switch O P
Output Termination
OC
FIGURE 4-2 Dual PE with Single I/O and External Watchdogs
4.3 Dual PE with Dual I/O, Interprocessor Communication, and 1oo2 Shutdown Logic
This dual configuration shown in Figure 4-3 has two independent channels. Each output from one channel is wired in series with the output from the other channel, and hence each channel can open the output circuit and put the EUC in a safe state. This wiring produces a 1oo2 voting of the channel outputs and obviously enhances the safety of the system. The figure shows external watchdogs that are employed to increase the diagnostic coverage of the PE in each channel. These watchdogs provide a secondary means of de-energizing the output of a leg if a dangerous failure of a processor is detected. The communication between the main processors in each channel improves the overall system diagnostics. Without the interprocessor communication, the diagnostic coverage is determined by the effectiveness of the self diagnostics of the processors. With the interprocessor communication the overall diagnostic coverage of the main processors can be increased because of the comparison testing that can also be performed. The communication also allows the processors to compare input values and continue operation with a healthy input in the event of a detected failure of the other input. All undetected safe failures of the inputs result in a false trip since the system is operating with 1oo2 shutdown logic. Dangerous detected failures in the system require the system to be shutdown so they can be repaired.
Page 7 / 7
PSU A1
PSU A2
Diagnostics
IC Sensor XX YYY Input Termination IC
I P
PE A
O P
OC
Output Termination
I P
PE B
O P
OC
PSU B1
PSU B2
Diagnostics
Final Element
FIGURE 4-3 Dual PE with Dual I/O, Interprocessor Communication and 1oo2 Shutdown Logic
4.4 Dual PE with Dual I/O, External Communication, and 2oo2 Shutdown Logic
Watchdogs,
Interprocessor
This configuration is shown in Figure 4-4 and has two independent channels or legs. The outputs to the final elements from each channel are wired in parallel to reduce the number of false or spurious trips. Hence both channels must command the outputs to open before an output is opened. This wiring produces a 2oo2 voting of the outputs from each channel. The system has external watchdogs in each channel or leg to improve the diagnostics and hence the safety. These watchdogs provide a secondary means of de-energizing the output of a leg if a dangerous failure of a processor is detected. The interprocessor communication enhances the diagnostic capability since comparisons can be made between the output states of the two channels. The interprocessor communication also allows the processors to perform a 1oo2 vote on each input values and to continue operation with a healthy input in the event of a detected failure of the other input. Because of the 2oo2 voting and dual redundancy, any detected safe or dangerous failure in this system that can be localized to a channel can be repaired on-line without shutting down the EUC. However, all dangerous undetected failures in any module in either channel of the system will put the system in a fail-to-function state.
Page 8 / 8
PSU A1
PSU A2
Diagnostics
I P
PE A
O P
OC
Output Termination
I P
PE B
O P
OC
PSU B1
PSU B2
Diagnostics
Final Element
FIGURE 4-4 Dual PE with Dual I/O, External Watchdogs, Interprocessor Communication, and 2oo2 Shutdown Logic
4.5 Dual PE with Dual I/O, External Communication, and 1oo2D Shutdown Logic
Watchdogs,
Interprocessor
Figure 4-5 shows this configuration that has two independent and redundant channels or legs. The system has high-speed communication between the main processors. The outputs to the final elements from each channel are wired in parallel to reduce the number of false or spurious trips. The system has external watchdogs in each channel or leg to improve the safety. These watchdogs provide a secondary means of de-energizing the output of a leg if a dangerous failure of a processor or an output is detected. This system also has the capability of detecting failures in the other channel by use of the high-speed interprocessor communication. By use of the interprocessor communication, each processor will request a shutdown when a comparison error cannot be localized to a specific processor or other leg failure. All detected failures in this system that can be localized to a channel can be repaired on-line. However dangerous detected failures and safe detected failures, which cannot be localized to a particular leg of the system, result in a false trip. In this system each processor performs a 1oo2 vote using its outputs and the outputs of the other channel obtained through the communication link. Hence if a discrepancy exists between the outputs, the safe output is selected and sent to the output module. However, a dangerous undetected failure of either processor will result in a fail-to-function state of the system. This occurs since the faulty processor will leave its outputs energized, and the other channel will not be aware of this dangerous situation since it obtains its information from the faulty processor through the interprocessor communication link. This fail-to-function condition has been previously documented in a paper by Rainier Faller from TV Bayern, IQSE (Reference 10 in Section 8). The 1oo2D architecture is referred to as the 1oo2 (2v2) / 2oo2 (1v2) architecture in this paper.
Page 9 / 9
PSU A1
PSU A2
Watchdog Diagnostics
&
I P
PE A
O P
OC
&
Output Termination
I P
PE B
O P
OC
&
PSU B1
PSU B2
Watchdog Diagnostics
&
Final Element
FIGURE 4-5 Dual PE with Dual I/O, External Watchdogs, Interprocessor Communication, and 1oo2D Shutdown Logic
4.6Triple PE with Triple I/O, Interprocessor Communication, and 2oo3 Shutdown Logic
This configuration is shown in Figure 4-6 and contains three redundant channels or legs with interprocessor communication. Each output to the final element utilizes a fault tolerant hex output voter circuit that performs a 2oo3 vote on the three inputs to the voter. Utilizing the interprocessor communication, the processors can perform a 2oo3 vote on the triplicated sensors read by the system. The 2oo3 voting also allows a failure in any of the three legs to be outvoted. Any detected safe or dangerous failure in the triple system can be repaired on-line without shutting down the EUC.
Page 10 / 10
PSU A1
IC
PSU A2 O P PSU B1
I P
PE A
OC
A A
Sensor XX YYY Input Termination IC
I P P I P
PE B PSU B2 PE C PSU C1 PSU C2
O P
OC
B B
Output Termination
IC
O P
C
OC
Final Element
FIGURE 4-6 Triple PE with Triple I/O, Interprocessor Communication, and 2oo3 Shutdown Logic
5. MARKOV MODEL FOR DUAL PE WITH DUAL I/O, INTERPROCESSOR COMMUNICATION, WATCHDOGS, AND 1oo2D SHUTDOWN LOGIC
Figure 4-5 in the previous section shows the Dual PE system with 1oo2D shutdown logic. The dual system with 1oo2D logic operates like a dual system with 1oo2 logic even though the outputs from the two channels are wired in parallel. This provides improved safety integrity. Hence if an undetected safe failure of an input module or the main processor occurs the system will shut down the process. Since the outputs of the 1oo2D system are wired in parallel, extensive diagnostics are required to check the outputs and the main processors of the dual system. High diagnostic coverage of the main processors is achieved by comparison of data between the two processors by use of the interprocessor communication links. If a comparison error occurs, the processors check their own status based upon self-diagnostics and if one of the processors have located the fault, the system will continue to operate for a limited time with the remaining good processor. The faulty processor will be alarmed for repair. The diagnostic coverage of the main processors (excluding the memory) using selftests is typically limited to 90 %. Memory self tests using inverted memories, etc. can have very high diagnostic coverage (99.99 %). The failure rates of the memories are typically 10 % to 20 % of the failure rate of the total module. When the diagnostic coverage of the main processor failures is improved by comparison testing, the diagnostic coverage in excess of 90% results in comparison errors that cannot be localized to a specific main processor. When this situation occurs, the system must be shutdown to locate and repair the faulty module. To model this situation, the safe detected failure rate of the main processor is split into two parts, the safe failures detected by self-test and the additional safe failures detected by the comparison testing. In a similar manner, the dangerous failure rates are segregated in the same way. The equations are as follows:
= +
SD MP SDS MP
SDC MP
Where self-tests.
SDS MP
is the safe detected main processor failures detected by
the main processor
Page 11 / 11
Where MP is the additional safe detected main processor comparison testing.
SDC
failures
detected
by
the
= + Where
DD MP DDS MP
DDC MP
DDS MP
is the dangerous detected main processor failures detected by the main
processor self-tests. Where MP is the additional dangerous detected main processor failures detected by the comparison testing. For the modeling of the dual system with 1oo2D logic, it is assumed that 100 % of the memory failures on the main processor module are detected and the memory failure rates are 20 % of the main processor module failure rate. The other detected processor failures are assumed to be 91 % detected by self test and the remaining 9 % by comparison testing. The main processor failure rates are modeled using the equations below:
DDC
DD MP
DDS MP
DDC MP
DU MP
SU MP
= = = = =
S
SD MP
SDS MP
SDC MP
D MP
MP
= 5 + 20C =5+ 20(0.91)C = 20(0.09)C = 20 20C = 20 20C

DD MP SD MP
Note: The failure rates above are in failures per million hours. The main processor total failure rate is assumed to be 50 failures per million hours and the percent safe failures are 50 %. See Table A-1 in Appendix A. Hence if the diagnostic coverage is 99 %, the total detected safe or dangerous failure rate of the main processor is 5 + 20*.99 = 24.8, with 1.782 detected by the comparison testing and 23.018 detected by self test. The safe and dangerous undetected failure rates are 0.2. Since the outputs from the two channels of the 1oo2D system are wired in parallel, both of the parallel paths must be opened to put the process in the safe state if a process upset occurs. Dangerous undetected failures of the main processors result in a fail-to-function condition. This occurs since a dangerous undetected failure of a processor results in its output being left in the on state and at the same time the other processor is not aware of the problem since it receives its information through the communication link to the faulty processor. Hence one of the paths continues to operate in a dangerous mode with its switches closed. The equations on the below show the elements of the fail-to-function transition matrix for the Markov Model shown in Figure 5-1:
Page 12 / 12
1, 2
1,5
1,14
2 ,8
3, 9
4 ,10
5,11
6 ,12
7 ,13 8,2
, = 2n = 4mn , = 2m , = 2k = 2 + 2 p (l + n + m + + k ) = l + n + + m + ( k / l ) , = l + n = l + + + , = l + + = l + + , = l + = l + + + + , = l + + + = l + + + , = l + + = l + m + + k , = l + m + k = = = = = =
DU DU DU IP 1,3 1, 4 DU DU DU OC OC 1, 6 OP 1, 7 WD DU DU PS DU I DU DU DU MP 2 O MP WD DD PS DD I DD DD DD DU PS MP O WD 2 ,14 DD PS DD IC DD IP DD DU PS DU IC DU IP MP 3,14 DD PS DD I DD DU PS DU I MP 4 ,14 DD PS DD DD DD DD DU PS DU DU OC OP MP WD 5,14 OC OP DD PS DD DD DD DU PS DU DU O MP WD 6 ,14 O WD DD PS DD DD DD DU PS DU DU O MP WD 7 ,14 O WD 9 ,3 10 , 4 11,5 12 , 6 13, 7 OT
= 2l PS
= 2n n IC IC
DU I
+ m O + ( k / l )
DU
DU WD
DU WD
Page 13 / 13
l
2
DU PS
+ n I + mO + (k / l)WD
DU DU
DD DD DD
DU
DD PS
+ n I + MP + mO + (k / l) WD
DD
2l
DU PS
OT
2nn
IC
DU IC
2n
1
DU IP
l + + l + + + l + l + +
DU PS DU IC
DD PS DD IC DD IP DD MP OT
DU IP
DU PS
DU I
9 10 14
DD PS
DD I
DD
MP
OT
4mn
OC
DU
OC
2m 2k
DU
OP
6
DU WD
l + + + l + + + + l + + l + + +
DU PS DU DU DU OC OP WD
DD PS DD DD DD OC OP MP OT
DD
SYSTEM WILL FAIL-TO-FUNCTION
WD
11 12
DU
DU
DU
DU
DD PS
PS DD
O DD
WD DD
WD
MP
OT
DU MP
l l + m
DD PS
OT DU PS DU
DU PS DD O
+ MP + kWD
DD
DU DU
+ mO + kWD
DU DD
13
DU
+ 2p
(l
+ n I + mO + MP + kWD
FIGURE 5-1 Fail-to-Function Markov Model for Dual PE with Dual I/O, Interprocessor Communication, Watchdogs, and 1oo2D Shutdown Logic
Page 14 / 14
The Fail Safe Markov Model for the dual system with 1oo2D shutdown logic is shown below in Figure 5-2.
2l
S
S PS
2 3 4
PS
+ + +05( + .
S SD I PS DD PS
SDS MP DD I
+ O + WD
S S
+ MP + O + WD)
DDS DD DD
2n
SD IP
SD IC IC
S PS
+ I + MP
SD SD IP SD IC
SDS
+05( .
SDS MP DD IC
DD PS
+ I + MP )
DD DDS
OT
2nn
OT

S OP
SDS MP
5 6
OT
2m
S OC
S OP
4mn
+ + + +05( + + + ) . l + n + + m + k . +05(l + n + + m + + + + 05( + + + ) .

S PS DD PS DD IP DDS MP S SD I SDS MP I S PS O DD PS DD DDS MP S SDS MP DD PS S S PS O WD DDS MP S DD DD O WD
S DD O
9
+ kWD
DD
WD
System Fails Safe
OC S
2k
S WD
OC
+ . +05(
S PS S PS
SDS MP DD PS
+ OP + OC + WD
S S DDS DD
+ MP + OP + OC + WD)
DD DD SDS MP
8
SU I SU
WD
+ +05( .
SDC MP MP S PS S I S MP
DD PS
+ MP + O + WD)
DDS DD DD
+ O + WD
S S
2n + 2 + 2( + ) + 2 p (l +n + +m + k
DDC MP 2 S O
S WD
FIGURE 5-2 Fail Safe Markov Model for Dual PE with Dual I/O, Interprocessor Communication, Watchdogs, and 1oo2D Shutdown Logic
The equations below show the elements of the fail safe transition matrix for the fail safe Markov Model shown in Figure 5-2:
Page 15 / 15
1, 2
1, 6
1, 9
2 ,9
3, 9
4 ,9
5, 9
6,9
7 ,9
8,9
2 ,1
, = 2 = 2m , = 4mn , = 2k = 2n + 2 + 2( + ) + 2 p (l + n + + m + k ) = + + + + +05( + + + + ) . = + + + 05( + + ) . = + + + +05( + + + ) . = l + n + + m + k + 05(l + n + . + m + k ) = + + + + 05( + + + ) . = + + + + + 05( + + + + ) . = + + + + 05( + + + ) . = , = , = , = , = , = , =

= 2l PS , 1,3 = 2n IP , 1,4 = 2nn
S SD S S OP 1, 7 OC OC 1,8 SD IC SDS MP IC 1,5 S WD SU I SU SDC MP DDC MP MP 2 S PS S I S MP S S O S WD S SD I SDS MP S DD PS DD I DDS MP DD DD PS O WD O WD S SD I SDS MP DD PS DD I DDS MP PS S SD IP SD IC SDS MP DD PS DD IP DD IC DDS MP PS S SD I SDS MP S S DD PS DD I DDS MP DD DD PS O WD O WD S SDS MP S S DD PS DDS MP DD DD PS O S WD S O WD S SDS MP S DD PS DDS MP DD DD DD PS OP S OC WD OP OC WD S SDS MP S DD PS DDS MP DD DD PS S O WD O WD S S S PS 3,1 OT 4 ,1 OT 5,1 OT 6 ,1 OP 7 ,1 OC 8 ,1 WD
It should be noted that the dangerous detected failures in the above equations occur because the repair of the module with a dangerous detected failure may result in a false trip, if the module interacts with a previously failed module that has a safe fault. If the module with the dangerous detected failure is repaired before the module with the safe fault, a false trip will result. If the module with the safe fault is repaired first, then there will not be a false trip. Hence it is assumed that 50 % of the time a false trip can occur and hence the 0.5 factor has been applied to the dangerous detected failures in the equations above.
6. COMPARISON OF PE LOGIC SYSTEM ARCHITECTURES

The PE logic systems are compared by showing the effect of diagnostic coverage (C) and test interval (TI) on PFDavg and MTTFspurious. The random hardware failure rates used in the analysis are shown in Appendix A. Figures 6-1 and 6-2 show the effect of diagnostic coverage (C) for a test interval (TI) of 1 year. These figures show very clearly the importance of diagnostic coverage. In many cases an increase in diagnostic coverage of 10 % results in the reduction of PFDavg by a factor of 100! In order to achieve SIL3, the diagnostic coverage should be higher than 98 %. Figures 6-3 and 6-4 show the effects of test interval (TI) for PE logic systems with high diagnostic coverage (99 %). Typically, a 6 month decrease in the test interval results in an decrease in PFDavg by a factor of at least 2 to 3. The figures clearly show that the Triple PE, Triple I/O, 2oo3 Logic and the Dual PE, Dual I/O, 1oo2 Logic -4 systems are the only suitable configurations for SIL3 (i.e. PFDavg less than 0.75*10 ). The Dual PE, Dual I/O, 1oo2 Logic system architecture has a false trip rate that is typically more than 100 times that of the Triple PE, Triple I/O, 2oo3 Logic architecture. The Triple PE, Triple I/O, 2oo3 Logic architecture provides both high safety integrity and a very low false trip rate (MTTFspurious > 100 years). The Dual PE, Dual I/O, 1oo2 logic architecture has a very high false trip rate (MTTFspurious < 1 year), which is required to allow the system to achieve high safety integrity.
Page 16 / 16
1.0E+0 Single PE & Single I/O 1.0E-1 Dual PE & Single I/O 1.0E-2 Dual PE, Dual I/O & 2oo2 Logic 1.0E-3
PFDavg
1.0E-4
Dual PE, Dual I/O & 1oo2 Logic
Triple PE, Triple I/O, Hex Voter & 2oo3 Logic Dual PE, Dual I/O & 1oo2D Logic
1.0E-5
1.0E-6 50 55 60 65 70 75 80 85 90 91 92 93 94 95 96 97 98 99
Diagnostic Coverage Factor - %

FIGURE6-1 PFDavg vs. Diagnostic Coverage for Test Interval = 1 Year
Page 17 / 17
1000 Single PE & Single I/O 100 Dual PE & Single I/O Dual PE, Dual I/O & 2oo2 Logic 10 Dual PE, Dual I/O & 1oo2 Logic Triple PE, Triple I/O, Hex Voter & 2oo3 Logic Dual PE, Dual I/O & 1oo2D Logic 0 50 55 60 65 70 75 80 85 90 91 92 93 94 95 96 97 98 99
MTTFspurious - Years
Diagnostic Coverage Factor - %

FIGURE 6-2 MTTFspurious Page 18 / 18 vs. Diagnostic Coverage for Test Interval = 1 Year
1.0E-2 Single PE & Single I/O 1.0E-3 Dual PE & Single I/O 1.0E-4
Dual PE, Dual I/O & 2oo2 Logic
PFDavg
1.0E-5 Dual PE, Dual I/O & 1oo2 Logic
1.0E-6
Triple PE, Triple I/O, Hex Voter & 2oo3 Logic Dual PE, Dual I/O & 1oo2D Logic
1.0E-7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Test Interval (TI) - Months

FIGURE 6-3 PFDavg vs. Test Interval (TI) for C = 99 %
Page 19 / 19
1000 Single PE & Single I/O 100 Dual PE & Single I/O Dual PE, Dual I/O & 2oo2 Logic 10 Dual PE, Dual I/O & 1oo2 Logic Triple PE, Triple I/O, Hex Voter & 2oo3 Logic Dual PE, Dual I/O & 1oo2D Logic
MTTFspurious - Years
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Test Interval (TI) - Months

FIGURE 6-4 MTTFspurious vs. Test Interval (TI) for C =
Page 20 / 20
7. DEFINITIONS OF TERMS
7.1 Channel A channel is an element or a group of elements (I/O modules, logic solver, sensors, final elements, etc.) that independently perform(s) a function. For example, a dual channel configuration is one with two channels that independently perform the same function. The term can be used to describe a complete system, or a portion of a system (sensors or final elements). 7.2 Diagnostic Coverage The fraction of hardware failures of an element detected by the on-line diagnostics embedded in the safety-related system. 7.3 Off-line Proof Test Interval (TI) The time interval between periodic off-line tests performed on the safety-related system. These tests are performed to detect failures in the safety system so the system can be restored to an as new condition or as close as practical to this condition. 7.4 Watchdog A watchdog is a combination of diagnostics and an output device (typically a switch) whose purpose is to monitor the correct operation of the programmable electronic (PE) device and take action upon detection of an incorrect operation. NOTE: The watchdog can be used to de-energize a group of safety outputs when dangerous failures are detected in order to put the EUC into a safe state. The watchdog is used to increase the on-line diagnostic coverage of the PE logic system. 7.5 Dangerous hardware failures Hardware failures which in a single channel configuration, put the safety-related system in a dangerous or fail-to-function state. NOTE: In systems with redundant devices, a dangerous hardware failure of a device may not result in a dangerous or fail-to-function failure of the safety-related system. 7.6 Safe hardware failures Hardware failures which in a single channel configuration, cause the safetyrelated system to erroneously shutdown the EUC. NOTE 1: In systems with redundant channels, a safe hardware failure may not result in an erroneous shutdown. NOTE2: Other names used for safe hardware failure are: nuisance failure, spurious failure, false trip failure, or fail-to-safe failure. 7.7 Detected hardware failure A hardware failure that is detected by the on-line diagnostics performed by the safety-related system. NOTE: Other names used for detected hardware failure are: overt failure or revealed failure. 7.8 Undetected hardware failure A hardware failure that is not detected by the on-line diagnostics performed by the safety-related system. NOTE: Other names used for undetected failures are: covert failure, unrevealed failure, latent failure, hidden failure or dormant failure. 7.9 1ooN System A safety-related system, or part thereof, made up of N independent channels which are connected so that any one of the channels is sufficient to perform the correct safety function. 7.10 2ooN System A safety-related system, or part thereof, made up of N independent channels which are connected so that any two of the channels are sufficient to perform the correct safety function. 7.11 MTTFspurious The mean time to a spurious trip of the EUC.
7.12 Mean time to repair (MTTR) The mean time to repair a safety-related system, or part thereof. NOTE: Hence it includes the time to detect the failure and the time to repair the module once the failure has been detected. 7.13 PFDavg The average probability of the safety-related system failing to perform its design function on demand. This is the target failure measure for a safety-related system operating in demand mode. 7.14 EUC Equipment under control or protection of the safety-related system
8. REFERENCES
1. Hardware Architecture Selection According to IEC 61508, Dr. A. Anton Frederickson, Triconex Corporation and Ron Bell, Health and Safety Executive, Presented at 1st International Symposium on PLCs in Safety Related Applications, September 1994. 2. IEC 61508: Functional safety of electrical/electronic/programmable electronic safety-related systems, Part 1: General requirements, First edition 1998-12 3. IEC 61508: Functional safety of electrical/electronic/programmable electronic safety-related systems, Part 2: Requirements for electrical/electronic/ programmable electronic systems 4. IEC 61508: Functional safety of electrical/electronic/programmable electronic safety-related systems, Part 3: Software requirements 5. IEC 61508: Functional safety of electrical/electronic/programmable electronic safety-related systems, Part 4: Definitions and Abbreviations of Terms 6. IEC 61508: Functional safety of electrical/electronic/programmable electronic safety-related systems, Part 5: Guidance on the application of Part 1 7. IEC 61508: Functional safety of electrical/electronic/programmable electronic safety-related systems, Part 6: Guidance on the application of Parts 2 and 3 8. IEC 61508: Functional safety of electrical/electronic/programmable electronic safety-related systems, Part 7: Bibliography of techniques and measures 9. ISA Draft Technical Report dTR84.0.02, Safety Instrumented Systems (SIS)-Safety Integrity Level (SIL) Evaluation Techniques, Parts 4 and 5, Sept. 1999. 10. TV Bayern, IQSE Document - STAT0583.DOC, Version 1.4E, Safety and Availability Calculations Including Detailed System Structure, Automatic Checks, and Time Requirements, Rainier Faller and Dr. Heiler Hundhammer, 24 October 1993. 11. ANSI/ISA S84.01-1996 Standard, Application of Safety Instrumented Systems for the Process Industries
Page 22 / 22
Appendix A Failure Rate, Diagnostic Coverage, and Configuration Data Used for Comparisons of Architectures
Table A-1 shows the failure rate and diagnostic coverage factor data used in the analysis of the configurations. This data is the same as that used in The ISA Draft Technical Report (TR84.0.02) produced by the SP84.02 Subcommittee. The failure rate data is based upon a detailed parts count methodology defined in MIL-HDBK-217. Common cause and other systematic failures were not included in the analysis. The configuration data used for each of the different architectures is shown in Table A-2. This configuration data corresponds to a typical shutdown interlock or loop. A typical safety system will have a number of these interlocks. Item Main Processor I/O Processor Input Circuit Output Circuit Power Supply External Watchdog Typical Rates 50.010-6/ hr. 5.010-6/ hr. 0.210-6/ hr. 0.210-6/ hr. 5.010-6/ hr. 1.010-6/ hr. Failure Typical Failures 50 50 50 50 90 70 % Safe Diagnostic Coverage 50% to 99% 50% to 99% 50% to 99% 50% to 99% 50% to 99% 50% to 99%
TABLE A-1 Failure Rate, Percent Safe Failures and Diagnostic Coverage Data Used for Comparisons of PE Logic System Configurations
Page 23 / 23
Item k l m noc n nic MTTRot

TABLE A-2
Description No. of Ext. Watchdogs/ Channel No. of Power Supplies/ Channel No. of Output Modules/ Channel No. of Output Circuits/ Output Module No. of Input Modules/ Channel No. of Input Circuits/ Input Module Mean Time To Repair (Hours)
Number 1 2 1 2 1 2 8
Configuration Data Used for Comparisons of PE Logic System Configurations
This document has been prepared by: Anton A. Frederickson, Mr. Dr. For more information see full contact details in Safety Users Group Directory
Page 24 / 24

Comparison of Programmable Electronic Safety Systems Architectures

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Comparison of Programmable Electronic Safety Systems Architectures

Uploaded by

Copyright:

Available Formats

Comparison of Programmable Electronic Safety-Related System Architectures

Overall Scope Definition

Hazard & Risk Analysis

Overall Safety Requirements

Overall Safety Validation

Overall Modification Overall Operation & 15 & Retrofit Maintenance

2. RISK AND SAFETY INTEGRITY LEVEL

SAFETY INTEGRITY LEVEL

TABLE1. Safety Integrity Levels & target failure measures

Necessary minumum risk reduction

Equipment under Control (EUC)

3. INTRODUCTION AND ASSUMPTIONS USED TO DEVELOP IEC ARCHITECTURAL REQUIREMENTS

Integrated Circuits C Discrete Semiconductors Resistors Capacitors

4. PE LOGIC SYSTEM ARCHITECTURES FOR SAFETY- RELATED SYSTEMS

4.1 Single PE with Single I/O and External Watchdog

Sensor XX YYY Input Termination Final Element IC

FIGURE 4-1 Single PE with Single I/O and External Watchdog

4.2 Dual PE with Single I/O and External Watchdogs

PSU - PE A PE A I P PE B PSU - PE B Diagnostics

FIGURE 4-2 Dual PE with Single I/O and External Watchdogs

IC Sensor XX YYY Input Termination IC

IC Sensor XX YYY Input Termination IC

IC Sensor XX YYY Input Termination IC

Sensor XX YYY Input Termination IC

PE B PSU B2 PE C PSU C1 PSU C2

is the safe detected main processor failures detected by

the main processor

Where MP is the additional safe detected main processor comparison testing.

is the dangerous detected main processor failures detected by the main

= 5 + 20C =5+ 20(0.91)C = 20(0.09)C = 20 20C = 20 20C

SYSTEM WILL FAIL-TO-FUNCTION

+ + + +05( + + + ) . l + n + + m + k . +05(l + n + + m + + + + 05( + + + ) .

System Fails Safe

, = 2 = 2m , = 4mn , = 2k = 2n + 2 + 2( + ) + 2 p (l + n + + m + k ) = + + + + +05( + + + + ) . = + + + 05( + + ) . = + + + +05( + + + ) . = l + n + + m + k + 05(l + n + . + m + k ) = + + + + 05( + + + ) . = + + + + + 05( + + + + ) . = + + + + 05( + + + ) . = , = , = , = , = , = , =

6. COMPARISON OF PE LOGIC SYSTEM ARCHITECTURES

Dual PE, Dual I/O & 1oo2 Logic

Diagnostic Coverage Factor - %

Diagnostic Coverage Factor - %

Dual PE, Dual I/O & 2oo2 Logic

Test Interval (TI) - Months

Test Interval (TI) - Months

Item k l m noc n nic MTTRot

Configuration Data Used for Comparisons of PE Logic System Configurations

You might also like