You are on page 1of 18

C.

Mokkapati

A PRACTICAL RISK AND SAFETY ASSESSMENT METHODOLOGY FOR SAFETYCRITICAL SYSTEMS Chinnarao Mokkapati Ansaldo Signal Union Switch & Signal Inc. 1000 Technology Drive Pittsburgh, PA 15219 Abstract This paper presents a practical methodology for a) assessment of risks associated with the intended application of a safety-critical system, and b) verification that the system meets the safety design requirements that enable the risks to be kept at acceptable levels throughout its lifecycle. The methodology consists of the following steps: 1) Define the system and analyze its intended operation to determine all potential hazards; 2) Analyze the risks (potential consequences after considering the available procedural, circumstantial and physical risk reduction barriers in the intended operation of the system); 3) Determine the tolerable hazard rates for the system functions by comparing the remaining risks with industry-accepted tolerable levels; 4) Apportion the tolerable hazard rates and corresponding safety integrity levels to various subsystems/equipment within the system; and 5) Analyze the design of the subsystems/ equipment and the system to show that the tolerable hazard rates will not be exceeded, and that the required levels of safety integrity (assurance against systematic failures) have been built into the system. Suitability of the methodology for railroad signaling systems is shown with the help of an example.

1.0 INTRODUCTION When an organization such as a Railway desires to install a new product/system for the purpose of improving the efficiency and/or safety of its operations, there must be verifiable proof that the

C. Mokkapati

new product/system does indeed provide the desired improvements. Specific to safety, the improvements should come in the form of a reduced level of risk (of accidents/mishaps) relative to the current level of risk (if known), or relative to commonly-accepted tolerable risk levels. This paper presents an approach that can be used for risk and safety assessment of a safetycritical system. This approach, broadly based upon U.S. Military Standard 882C (1), AREMA C&S Manual Section 17 (2), IEEE Standard 1483-2000 (3), and the CENELEC Standards EN50126 (4), EN50128 (5), and EN50129 (6), has been used by the authors organization for the assessment of Automatic Train Control Systems furnished for the Copenhagen Metro and the Kuala Lumpur Monorail System. It can be applied in a practical manner for other systems such as PTC Systems, Train Protection Warning Systems, Train Collision Avoidance Systems, etc., that use newer technologies and architectures for meeting defined risk and safety requirements. The concepts of Safety Integrity Levels (SILs) and Tolerable Hazard Rates (THRs) are used in this approach. Reference (6) provides a detailed description of the concepts of SILs and THRs.

Section 2 of this paper presents an overview of the risk and safety analysis methodology. Section 3 presents details of risk analysis while Section 4 outlines the system design analysis that provides proof that the system meets its safety requirements derived from the risk analysis. Section 5 gives an example.

2.0 OVERVIEW OF RISK ANALYSIS AND SYSTEM DESIGN ANALYSIS A methodology, derived from CENELEC Report prR009-004 (7), for risk analysis and system design analysis is presented in this section. At the heart of this approach is a well-defined interface between the operational environment and the architectural design of the system. From

C. Mokkapati

the safety point of view this interface is defined by a list of hazards and tolerable hazard rates associated with the system. The general steps of the risk analysis and system design analysis methodology are shown in Figure 1 and can be summarized as follows: 1. Define the system adequately 2. Identify key operational hazards 3. Determine the tolerable hazard rate THR for each hazard by analyzing the consequences of the hazards (taking into account the operational parameters) 4. For each hazard: Anlyze the causes down to a functional level taking into account system definition and architecture 5. Decide which functions are implemented by which subsystem. Then, for each subssytem: Collect contributions of each function, which is realised by the subsystem, to all hazards Calculate overall tolerable hazard rate THRs for the subsystem Translate THRs into a safety integrity level SILs for the subsystem using a SIL table Determine failure rates for the system elements to meet THRs for the subsystem Verify & validate that the THRs and SILs are met.

This methodology, shown in the flowchart of Figure 1, can be divided into two parts: Risk Analysis, consisting of Steps 1-3, and System Design Analysis, consisting of Steps 4-5. Risk Analysis deals with the real world of the system operation. System Design Analysis deals with the technical solutions for managing the risks.

C. Mokkapati

3.0 DETAILS OF RISK ANALYSIS The Risk Analysis steps are shown in Figure 2. 3.1 System Definition The system under investigation must be defined completely. This is typically done in the form of following documents: System Requirements Specification System Architecture Description System Design Description Documents

These documents should give details of the systems Functional Requirements Type of Operation (e.g., signaling principles) Operational Parameters (e.g., train schedules, speeds, density,) System Boundaries

3.2 Hazard Identification Through a structured Hazard Identification study (e.g., as described in AREMA C&S Manual 17.3.5), and based on existing data from the End Users sources, the potential hazards associated with the intended operation of the system shall be identified and documented in a Hazard Log.

The following terminology is used: 1. An individual i uses the technical system (e.g., a train, a Level Crossing). The usage profile is described by the number of uses Ni (per year or per hour). For reference, a total exposure

C. Mokkapati

per use Ei (hours) may be defined (i. e. the duration of a train journey or the time needed to pass a LC).

2. While using the technical system the individual is exposed to hazards arising from failure of the technical system (or its subsystems etc.). Let there be n hazards associated with the technical system. Let each hazard Hj have a hazard rate HRj hazards/hour, j = 1,., n. The tolerable value of each HRj is what we are trying to determine through the Risk Analysis process. The probability that the individual is exposed to the hazard depends additionally on the hazard duration Dj and the exposure time Eij of the individual to the hazards. This probability consists of a sum of the probability that the hazard already exists when the individual enters the system (approximately HRj Dj) and the probability that the hazard occurs while the individual is exposed (approximately HRj Eij). Note that the exposure to the hazard Hj may be shorter than or equal to the total exposure: Eij Ei. 3.3 Risk Determination From each hazard one or several types of accidents may occur. This is described for each hazard by the consequence probability Cjk, that accident k occurs. Associated with each type of accident Ak is a corresponding severity, which from the individual point of view is described as the probability of fatality Fik for the single individual. This causality corresponds one to one to the individual risk of fatality by

IRFi =

all hazards Hj

Ni ( HRj x (Dj + Eij)

Cjk x Fik Accidents Ak

(1)

C. Mokkapati

If, as a result the IRF is less than the Tolerable Individual Risk (TIR) usually expressed in fatalities per year, then the calculated or estimated hazard rates (HR) are called tolerable hazard rates (THR). In Formula (1), the individual probability of fatality Fik can be calculated from the severity Sk (e.g., number of fatalities) in accident k, out of a population of Nk exposed to accident k (concept of collective measure of severity). That is, Fik = Sk/ Nk

(2)

Note: Accident k could result in other types of potential losses, namely commercial loss and environmental loss. It is possible to quantify these losses (convert them into an equivalent number of fatalities) in order to include them in the term Sk in Equation (2). A discussion and agreement with the User shall be needed in this regard. 3.4 Risk Tolerability Criteria and THR Determination To determine the tolerable level of risk, either the GAMAB, the ALARP, or the MEM principle can be used. Reference (8), a report by Dr. Hendrik Schbe, of the Institute for Software, Electronics, Railroad Technology, TV InterTraffic GmbH, provides a detailed treatment of these principles.

The GAMAB principle requires the risk of the new system to be no higher than that associated with the system being replaced. An upper and a lower bound on TIR (fatality rate in fatalities per year) can be derived from the ALARP principle. A single value of TIR can be derived from the MEM principle.

C. Mokkapati

The IRFi in Formula 1) is now equated to the TIR in order to determine the tolerable value of each hazard rate HRj. These are denoted THRj.

4.0 DETAILS OF SYSTEM DESIGN ANALYSIS The System Design Analysis Process is shown in Figure 3.

The Risk Analysis detailed in Section 3.0 results in list of n hazards H1, .., Hn together with their tolerable hazard rates THR1,.., THRn respectively.

Further analysis is then required to arrive at a suitable system architecture for the control of such hazards. This is called System Design Analysis, which is essentially a causal analysis of the hazards H1, ..,Hn. It consists of the following tasks: Define the system functions and architecture (technical solution), Analyze the causes leading to each hazard, Determine the safety integrity requirements (SIL and hazard rates) for the subsystems, Determine the reliability requirements for the equipment

Causal analysis of hazards constitutes two key phases. In a first phase, each THR is apportioned to a functional level (system functions). The hazard rate for a function is then translated to a SIL using the SIL table below, taken from (6). The SILs are defined at this functional level for the subsystems implementing the functionality. Tolerable Hazard Rate THR per hour and per function THR < 10-8 10-8 < THR < 10-7 10-7 < THR < 10-6 10-6 < THR < 10-5 Safety Integrity Level 4 3 2 1

C. Mokkapati

A sub-system, i. e. a combination of equipment, may implement a number of Safety-Related Functions, each of which could require a different SIL. Where this is the case, the sub-system must be designed to meet the highest Safety Integrity Level of those functions.

In the second phase of the causal analysis, the hazard rates for subsystems are further apportioned, leading to failure rates for the equipment, but at this physical or implementation level the SIL remains unchanged. Consequently also the software SIL defined in (5) would be the same as the subsystem SIL but for the exception described in clause 5.2.3 of (5)

The apportionment process may be performed by any method which allows a suitable representation of the combinational logic, e. g. reliability block diagrams, failure modes & effects analyses, fault trees, binary decision diagrams, Markov models etc. In any case, particular care must be taken when independence of items is required. While in the first part of the Causal Analysis functional independence is required (i. e. the failure of functions shall be independent with respect to systematic and random faults), physical independence is sufficient in the second part (i. e., the failure of subsystems shall be independent with respect to random faults). Assumptions made in the causal analysis must be checked and may lead to safetyrelevant application rules for the implementation.

System design analysis is essentially a combination of various qualitative and quantitative hazard analyses and safety verification & validation steps. A disciplined approach to system design

C. Mokkapati

analysis using a structured Safety Assurance Program (e.g., as outlined in AREMA C&S Manual Part 17.3.1) is recommended.

5.0 EXAMPLE A hypothetical Train Protection Warning System (TPWS) shown in Figure 4 is used as an example for detailing the steps involved in the Risk Analysis. The Safety Analysis portion is not covered in detail for this hypothetical system.

The desired functions of the TPWS are a) Provide Emergency Brake application to prevent Signals Passed at Danger(SPADs), and b) Provide driver warning and speed supervision with ability to stop the train if overspeed condition is ignored by the driver. This system is intended to be used on a Railroad with heavy passenger train traffic, and the goal is to reduce the risk of fatalities due to SPADs to a tolerable level. The following steps are as outlined in Section 3. The quantitative numbers used in the example calculations are the authors assumed data and are not reflective of any particular Railroads statistics. HAZARD H1: TPWS fails to prevent a SPAD that could result in a collision and ensuing fatalities.

RISK ANALYSIS 1. Determine Risk Tolerability A reasonably practical scheme shall be implemented with the aim of ensuring that train collisions due to SPADs pose a risk of fatality no higher than 1 in 1,000,000 per year. That is,

C. Mokkapati

10

Tolerable Individual Risk (TIR) 10-6 per year (Risk of SPAD-caused fatality to the train driver, also assumed to be the same for a passenger if the train involved in the event is a passenger-carrying train) 2. Determine Risk Exposure Ni = Number of times/year train i passes signals = 10,000 D1 = Duration of Hazard H1 = 10 hours (A pessimistic estimate) Hazard H1 exists when the TPWS has a wrong-side (hazardous) failure that remains non-negated or un-repaired. Hazard H1 has a hazard rate of HR1 failures/hour. The goal is to determine this HR1 before the design of the TPWS can proceed. Ei1 = Exposure time of the train to Hazard H1 (time taken by the train to pass a signal at a failed TPWS location. Very short, relative to D1. Ignored) 3. Cause-Consequence Analysis Done in the form of an Event Tree Analysis (ETA), as shown in Figure 5. 4. Loss Analysis From the ETA, two types of accidents and their probabilities of occurrence are determined and listed below. For the sake of simplicity, assume the probabilities of fatality in each accident as shown below. No. (k) 1 2 Accident (Ak) High Speed Collision Low Speed Collision Probability of Occurrence (C1k) 0.00005 0.00001 Probability of Fatality (Fik) 0.9 0.5

5. Determine THR Substitute the above values in Equation (1): IRFi = Ni {HR1x (D1+Ei1) (C1kxFik)}

C. Mokkapati = 10,000 x HR1x 10 x (0.00005x0.9 +0.00001x0.5) TIR = 10-6 This results in HR1 = 2x10-7 failures/hour, which is now called THR1 SYSTEM DESIGN ANALYSIS

11

Apportion THR1 to individual pieces of equipment in the TPWS by using Failure Modes and Effects Analysis (FMEA) and Fault Tree Analysis (FTA) techniques. Guidance given in AREMA C&S Manual Parts 17.3.3 (2) and IEEE Std 1483 (3) can be used. Make sure physical, functional and process dependencies within the TPWS equipment are properly handled with the use of AND gates in the FTA. An iterative approach is needed to arrive at a cost-effective design. Different parts of the TPWS equipment may end up being designed to different SILs for systematic failure integrity.

6. CONCLUSIONS
A practical methodology for risk and safety analysis using the concepts of tolerable risk, safety integrity levels, and tolerable hazard rates is presented in this paper with the help of a simple example. This methodology can be applied to signaling and train control systems that use new technologies and architectures, and is expected to provide a cost-effective approach to both design and assessment of such systems.

7. REFERENCES (1) United States Department of Defense (January 19, 1993) Military Standard: MIL-STD882C - System Safety Program Requirements.

C. Mokkapati (2)

12

AREMA Communications & Signal Manual, Section 17: Quality Principles. Parts 17.3.1 (2004), 17.3.3 (2004), and 17.3.5(2004).

(3)

IEEE Standard 1483-2000: Verification of Vital Functions for Processor-Based Systems Used in Signal and Train Control.

(4)

CENELEC Standard EN 50126: Railway Applications - The Specification and Demonstration of Dependability, Reliability, Availability, Maintainability and Safety (RAMS). Issue: March 2000.

(5)

CENELEC Standard EN 50128: Railway Applications- Communications, signaling and processing systems - Software for railway control and protection systems. Issue: March 2001

(6)

CENELEC Standard EN 50129: Railway Applications- Communications, signaling and processing systems - Safety related electronic systems for signaling. Issue: May 2002

(7)

CENELEC Report prR009-004: Railway Applications Systematic Allocation of Safety Integrity Requirements (March 1999).

(8)

Different Approaches For Determination Of Tolerable Hazard Rates, by Dr. Hendrik Schbe, Institute for Software, Electronics, Railroad Technology, TV InterTraffic GmbH, 51105 Kln.

C. Mokkapati

List of Figures in the Paper A Practical Risk and Safety Assessment Methodology for Safety-Critical Systems Figure 1. Risk and Safety Analysis Overview (From Reference (4)) Figure 2: Process Details of Risk Analysis (From Reference (4)) Figure 3. System Design Analysis Summary (From Reference (4)) Figure 4. A Simple Train Protection Warning System Figure 5. Cause-Consequence Analysis (Determination of External Risk Reduction)

C. Mokkapati

Input

Activity 1
Define System (functions, boundary, interfaces, environment,.)

Output

System definition

2
Identify (system) hazards

top level hazards

Hazard Log

Risk Analysis

Risk Risk tolerability criteria (Safety)

3
Analyze consequences of hazards THRs

System Requirements Specification

(Sub-) System Architecture

4
Analyze causes of hazards. Identify additional hazards

Hazard Analysis System Design Analysis

Iterate until system element level

5
Allocate Safety Integrity Requirements to subsystems/equipment

SILs, Failure Rates

Subsystem Requirements Specification

Figure 1. Risk and Safety Analysis Overview (From Reference (4))

C. Mokkapati

System Definition

Analyze Operation

Identify Hazards

Estimate Hazard Rates Hazard Log Identify Consequences: Accidents Near Misses Safe State

Determine Risk

Risk Tolerability Criteria (Safety)

Determine THR

System Requirements Specification (Safety Requirements)

System Design Analysis

Figure 2: Process Details of Risk Analysis (From Reference (4))

C. Mokkapati

Hazards H1, .., Hn and their tolerable hazard rates

For Each Hazard


For each AND: Common Cause Failure Analysis Fault detection mechanism and time Safety-related application conditions

Use FMEAs, FTAs, Reliability Block Diagrams, Binary Decision Diagrams, Markov models, etc. as appropriate For Each Subsystem

System Architecture

SIL Table

1. Collect contributions to hazards 2. Determine THR and SIL

SIL and THR for subsystems

Apportion failure rates to elements

SIL and THR for elements

Conduct Verification & Validation of SILs and THRs Figure 3. System Design Analysis Summary (From Reference (4))

C. Mokkapati

8 4 7

3 2 9

1. 2. 3. 4. 5.

Onboard Computer (OBC) Transponder Transmission Module Transponder Antenna Drivers Console Tachometer

6. 7. 8. 9.

Emergency Brake Interface Signal Control Logic Lineside Electronic Unit Transponder

BASIC FUNCTIONALITY DESIRED: Provide driver warning then Emergency Brake Application to prevent Signal Passed at Danger. Provide driver warning and speed supervision with ability to stop train if overspeed condition is ignored by the driver

Figure 4. A Simple Train Protection Warning System

C. Mokkapati

Train approaches a Signal at Danger

Engineer passes Signal at Danger

Engineer does not notice obstruction , plows ahead Yes 0.5 Yes 0.001 No Yes 0.2 No Engineer notices obstruction, starts braking, but cant stop short of obstruction No

High Speed Collision 0.00005 Low Speed Collision 0.00001

H1

Yes 0.1

No

Safe State 0.99994

Figure 5. Cause Consequence Analysis (Determination of External Risk Reduction)

You might also like