You are on page 1of 9

Reliability Engineering and System Safety 79 (2003) 5967 www.elsevier.

com/locate/ress

The classication and regression tree approach to pump failure rate analysis
Maurizio Bevilacquaa, Marcello Bragliab,*, Roberto Montanaric
a

` degli Studi di Bologna, Dipartimento di Ingegneria delle Costruzioni Meccaniche, Nucleari, Aeronautiche e di Metallurgia, Universita Viale Risorgimento 2, 40136 Bologna, Italy b ` di Ingegneria, Universita ` degli Studi di Pisa, Dipartimento di Ingegneria Meccania, Nucleare e della Produzione, Facolta Via Bonanno Pisano 25/B, 56126 Pisa, Italy c ` degli Studi di Parma, Viale delle Scienze, 43100 Parma, Italy Dipartimento di Ingegneria Industriale, Universita Received 22 March 2002; accepted 11 September 2002

Abstract In this article, a technique based on rule induction is suggested as non-parametric alternative to determine the expected failure rates of 143 centrifugal pumps included in a oil renery plant and subjected to different operating conditions. At the same time, the procedure makes it possible to determine the critical operating factors inuencing the reliability of the pumps. In particular, the classication and regression tree approach is used to automatically generate rules from an extended data base of the plant concerning information about failures and operating conditions of the different facilities. q 2002 Elsevier Science Ltd. All rights reserved.
Keywords: Failure rate; Reliability; Rule induction; Decision tree; Classication and regression tree

1. Introduction For years, industrial and other organisations concentrated most of their attention upon product production, generally ignoring the maintenance function, viewing it as a necessary evil. During the last 10 years there has been a gradual attitude change in how the corporate managers view the maintenance function. One of the most important factors forcing this change was that maintenance departments became major cost centres within those organisations. Today with general operating costs rising at the rate of 15 70 percent of total production costs, there is the potential for the realisation of signicant savings in the maintenance department that deserves serious scrutiny. By implementing certain of the advanced management practices outlined here signicant savings can be made. Through the application of good management practices, and with the use of sound technical expertise, cost reductions in the range of 20 35% are within the realms of possibility. Several alternative maintenance policies can be evaluated to obtain savings in maintenance costs. One of the
* Corresponding author. Tel.: 39-50-913029; fax: 39-50-913040. E-mail address: m.bragglia@ing.unipi.it (M. Braglia).

most frequently used in industrial plants is the preventive maintenance strategy. Preventive maintenance is aimed at the prevention of breakdowns and defects. The primary goal of preventive maintenance is to prevent failure of all equipment before it actually occurs. It is therefore designed to preserve and enhance equipment reliability. Daily activities of preventive maintenance include equipment checks, precision measurements, partial or complete overhauls at specied periods, oil changes, lubrication and so on. In addition, workers record equipment deterioration, so they know that they should replace or repair worn parts before they cause problems. Preventive maintenance is based on component reliability characteristics. This means that a high level of condence and high quality of facility reliability data are necessary. Too much of unscheduled downtime and frequent equipment breakdowns indicate that preventive maintenance programs are not working as they should. Unfortunately, the failure rates of facilities in production systems potentially depend on several factors, and the correct identication and quantication of independent variables leading to facility failure presents a difcult problem. Statistical techniques and/or models based on (complex) mathematical expressions have been generally used to rank

0951-8320/03/$ - see front matter q 2002 Elsevier Science Ltd. All rights reserved. PII: S 0 9 5 1 - 8 3 2 0 ( 0 2 ) 0 0 1 8 0 - 1

60

M. Bevilacqua et al. / Reliability Engineering and System Safety 79 (2003) 5967

critical variables in terms of their sensitivity to failure phenomena. Examples of these approaches are the Arrhenius exponential equation to link the failure rate with the temperature [9], or the complex mathematical models proposed to dene the failure rates of electronic components as function of expected operating conditions in terms of temperature, vibration levels, humidity, etc. Classication and regression tree (CART) methodology shows promise because of distinct advantages over standard statistical techniques. In this paper, the identication of the main operating conditions inuencing the failure rates of 143 different centrifugal pumps in an important Italian oil renery is studied. The analysis is conducted based on the CART method [4]. With this approach, after an analysis of the historical pumps failure data a nal decision tree is obtained. The tree makes it possible to recognise the most important operating variables inuencing the failure rate of the pumps, and represents an easy and robust tool to support the realisation of a decision support system for the maintenance staff. The aim is to predict the class each pump is in with respect to, for example, risk of failure (e.g. failure rate). Based on this decision tree, the management is able to dene improved preventive maintenance policies. In addition, the identication of the most critical operating conditions gives the company executives the basis of future proactive maintenance actions on the production plant design.

2. The API oil renery plant: a brief description The process plant, whose pumps are examined in this study, is one of the most important Italian and European industrial groups working in the oil rening sector. Its activity is performed in an integrated way by managing the entire petrochemical cycle, from oil supply to rening process and to the distribution of nished products. The transformation process is performed by an oil renery whose processing and services plants occupy about 650,000 m2, with 3000 km of piping. The oil renery productive cycle is of the medium high conversion type, operating through thermal process, and it has: a storage capacity of more than 1,500,000 m3; a production capacity of about 3,900,000 ton of oil per year; an oil tanker receiving capacity up to 400,000 ton displacement. The plant has a closed loop water system which is able to deliver up to 7000 m3/h, and a re ghting system which is able to supply up to 3000 m3/h of sea water. An integrated gasication combined cycle plant of 287 MW power is actually starting up its operations. This plant burns a synthesis gas obtained from a heavy oil rening products gasication plant, whose production

capacity is equal to 1250 ton per day. This plant has auxiliary oxygen production, gas washing, sulphur recovery, efuent treatment and heavy metals recovery utilities. The oil renery operation can schematically be described as shown in Fig. 1, which depicts the simplied cycle of the process plant. The monthly feed of 340,000 ton to the primary distillation process (Topping) comprises 300,000 ton of oil and 40,000 primary residuum. The main oil renery processing plants are formed by two recently introduced distillation units: an atmospheric one, with a production capacity of 10,500 ton per day and a vacuum one, whose distillation capacity is equal to 2500 ton per day. The light distillation fractions, mainly formed by liquid petroleum gas (LPG) and petrol, feed a hydrogenation process (Uninng), which is used to stabilise some components and to remove undesired elements such as sulphur. After the hydrogenation process, LPG is ready for use while petrol undergoes further processing to enhance its octane number (isomerisation, to produce light petrol without aromatics, and platforming, to obtain a very high octane number). The middle distillation fraction (kerosene, light and heavy gas oil) is subjected to a desulphurisation process (HDS1 and HDS2 plants), while heavy fraction and the distillation residuum are processed through cracking plants (thermal cracking and visbreaking ), to improve the oil conversion increasing the production of light products. All the main oil renery plants have been substantially revamped during recent years. This has lead to an increase in crude oil yield and to an improvement in the product quality.

3. Overview of rule induction and CART approach Inductive reasoning derives its conclusions from a set of observed data and, thus, is based on observations from experience rather than on predetermined rules or predicates. It starts with observed cases and ultimately generalises from them to build new rules. These rules are the natural kernel of the prediction-based evaluation system. Parametric statistical techniques (e.g. regression analysis) and non-parametric classication algorithms (e.g. rule induction) are similar in that they both use a set of data consisting of a number of cases or examples, each of which consists of a number of experimental observations. Both methods use induction to determine the relationship between these observations which can be used for predicting one of the variables (objective variable). The differences between the two approaches can be quite signicant. The use of statistical techniques, such as linear regression, assumes that the data is continuous from an interval scale of measurement. In addition, statistical approaches (e.g. regression analysis, discriminant analysis,

M. Bevilacqua et al. / Reliability Engineering and System Safety 79 (2003) 5967

61

Fig. 1. Oil renery simplied cycle.

etc.) generally assume the same relationship that exists between the independent and dependent variables throughout the measurement space and, in order to calculate signicance tests on the results, the variables are supposed to be normally distributed. These hypotheses in reality are often difcult to full. A classication tree is an empirical rule for predicting the class of an object from values of predictor variables. Rule induction techniques are classicatory, since the dependent variables are nominal. The independent variables may be nominal or interval and the relationships which are induced are logical rather than functional. The variables are not required to be independent or follow any particular distribution. The data may also lack homogeneity, i.e. different relationship may exist between variables in different parts of the measurement space. For a complete comparison of rule induction to multiple regression analysis, the reader may refer to Ref. [8]. The details of these decisions are beyond the scope of this paper but are explained at length in the standard reference on CART [4]. Only some notes of the method are reported here. Common features of classication tree methods are the following: 1. Merging: relative to the target variable, non-signicant predictor categories are grouped with the signicant categories; 2. Splitting: selecting the split point. Variable to split population is chosen by comparison to all others;

3. Stopping: rules which determine how far to extend the splitting of nodes; 4. Pruning: removing branches that add little to the predictive value of the tree are removed; 5. Validation and error estimation: the methods used for evaluating and comparing a given classier are the same regardless of which methods are used for generation. Measurement of true error versus apparent error, and validation of classiers using separate or resampled data are performed identically for the main classication tree algorithms adopted in the literature and, in particular, CART method. CART represents a computational statistical algorithm which predicts in the form of a decision tree. The CART procedure can be dened as a partition method of data into terminal nodes by a sequence of binary splits, starting at a parent node. CART starts executing a partition of initial training data set into two sub-sets, so the cases in every sub-set are more homogeneous than in the original (single group) set. CART repeats the partition for each child node, continuing recursively until the homogeneous level in the generic node required is obtained or a given stopping criterion is veried. Typically, the following conditions will cause the algorithm to terminate: (a) the maximum tree depth has been reached; (b) no more splits can be made, because all terminal nodes meet one or more of the following conditions:

62

M. Bevilacqua et al. / Reliability Engineering and System Safety 79 (2003) 5967

there is no signicant predictor variable left to split the node; the number of cases in the terminal node is less than the minimum number of cases for parent nodes; if the nodes were split, the number of cases in one or more child nodes would be less than the minimum number of cases for child nodes.

between factors that are critical in predicting the goal variable (i.e. the failure rate of pumps); (b) to communicate the classication decision to others; (c) to automatically classify or predicting new cases. CART represents a versatile method because it makes it possible to consider the risk of classication error. For a continuous objective variable (i.e. the failure rate), an estimation of the risk of classication error can be based on the variance node concept. Briey, the total variance is equal to the sum of the variance into the node (error) plus the variance between the nodes (explained):
2 2 s2 tot serr sexplained :

The CART splitting algorithm in each node is based on the concept that each child node must be more pure than the original parent, where pure is a concept linked to the values of a given variable. For example, in a completely pure node, all cases have the same value of the splitting variable (and, as a consequence, a variance equal to zero). Several algorithms are available to measure the levels of impurity. Examples of different measures of impurity are the Gini and twoing indices (for categorical objective variables), or the ordered twoing index (for ordinal categorical variables). Since, in this application, the objective variables are all continuous, the least squared deviation (LSD) method is applied as impurity measure technique. The function adopted by LSD approach for the split of a parent node t into two child nodes tL(eft) and tR(ight) is the following

The proportion of variance due to the error is given by (in percentage): %err s2 err 100: s2 tot

As a consequence, the proportion of explained variance by the model is equal to: ! s2 err %explained 100 2 %err 1 2 2 100; stot and it represents the reference index used to estimate the performance of a given tree in the sections that follow. Considering its attractiveness, CART and other classication tree algorithms have been widely applied in different elds of research (e.g. production [7], portfolio management [10], veterinary [11], metallurgy [12], medicine [13]) but never, as far as the authors known, in reliability studies. In addition, the growth of industrial applications of classication tree analyses is now supported and favourite by the presence of commercial software tools. An example of this tool is AnswerTree, the SPSSs software module devoted to classication tree analyses. It offers four powerful algorithms that enable the analyst to choose the best t model for his data: CART [4], CHAID [5], Exhaustive CHAID [3], and QUEST [6]. Some of the other capabilities of AnswerTree include: displays models visually; supports nominal categorical, ordinal categorical and continuous variables; permits to use tree generating methods: (a) automatic; (b) interactive; (c) production which includes scripting language that enables you to run the application in batch mode.

Ft R2 t 2 pL R2 tL 2 pR R2 tR 1 X 1 X yi 2 y y 2 y  t2 2 pL  tL 2 N t i[t N tL i[tL i 1 X 2 pR y 2 y  tR 2 ; N tR i[tR i


where pL is the proportion of cases in parent node t classied in the left note tL; pR, the proportion of cases in parent node t classied in the right note tR; yi ; the value of the objective variable for the experimental case i; y  t; the average value of parent node t; y t ; the average value of  x child node tx ; N tx ; the number of cases classied in child node tx x [ {R; L} and R2 tx is the weighted variance associated to child node tx : The best split (and, as a consequence, the corresponding dependent variable) is the one that maximise the Ft function which indicates the reduction of impurity of the tree due to the split. At the best split node, both, the most important variable and the critical value of that variable are identied. The same predictor variable can be used more than one time in the tree. The splitting process builds a tree structure based on a set of if then rules that guide the decision maker to better decisions because the variables are allowed to take on a hierarchical prioritisation and are also allowed to interact and have different differences under different conditions. A similar tree can be used: (a) to provide easily interpretable information about the main factors (i.e. operating variables) and interactions

4. Pump failure analysis The training data set concerns a sample of 143 centrifugal pumps inserted in the API renery plant.

M. Bevilacqua et al. / Reliability Engineering and System Safety 79 (2003) 5967 Table 1 Potential dependent and independent variables considered during pump failure analysis Potential objective (dependent) variables Failure rate l (no. of failures for 10,000 operative hours) Total number of failures Number of failures due to uid losses (P) Number of failures due to irregular working (FI) Number of failures due to vibrations (V) Number of mechanical failures (GM) Number of electrical failures (GE) Number of failures to start (AVV) Potential predictor (independent) variable Fluid density (kg/m3) Fluid type Plant type Pump head (m) Pump power (kW) Pump capacity (m3/h) Soot Fluid temperature (8C) Operative time (h) Type of seal Kinematic viscosity of the uid (cSt)

63

These pumps have been monitored for 18 months recording all failures that occurred, the operating time and the total down time of the facilities. Several operating conditions for each pump and uid processed have been considered and collected even if some data were unavailable (Table 1). Unfortunately, even though CART is able to manage variables characterised by the absence of some data, the quantity of unavailable data for some operating attributes (such as, required NPSH and pump efciency) was so large that these factors have not been evaluated as potential predictor variables of the classication tree analysis. An accurate analysis has been conducted to dene the best classication tree. Note that, properly deployed, CART is not a black box. For example, the input variables and the dependent variables are comparable to the data set one would use in multiple linear regression or discriminant analysis. The variable choices and specications should be causal and rational, just as with traditional statistical approaches. Another aspect found is the possibility to prune away some branches of the nal tree. CART produces more robust results by generating what is called a maximal tree and then examining smaller trees obtained by pruning away branches of the maximal tree. The important point is that CART does not stop in the middle of the tree-growing process because there might still be information to be discovered by drilling down several more levels. So, the trees are frequently grown larger than they need to be and, as a consequence, they must be selectively pruned back to improve the quality (i.e. the capacity of classication) of the nal decision diagram. For these reasons reported, several sets of variables have been tested and different pruning trees have been considered. Considering the high number of potentially interesting variables (see Table X), we have decide to

develop a progressive strategy of analysis. In other words, we have developed a succession of trees characterised by the same objective variable and by a growing number of predictor variables. In this way we have been able to evaluate the impact of each new variable on the prediction capacity of the tree. In virtue of the high number of tests executed, for sake of brevity, only the most interesting results will be reported in the following of the paper. In terms of the variables listed in Table X, the following brief considerations are reported. With the term type of plant we indicate the part of the renery plant (Fig. 1) where the centrifugal pump is employed (for example, visbreaking, topping, vacuum, thermal cracking, splitter, etc.). A critical element for the reliability of a pump is the presence of soot in the processed uid. The soot represents the solid carbon-based particles present in the uid and which have a disruptive action on the pump seals. It is very hard to obtain the data concerning the kinematic viscosity for the different uids processed at different temperatures. Since these data are not available from the Company, we have decided to use the viscosity temperature diagrams reported in the ASTM D341-93 standard norm [2]. The pump seals have been clustered into four different classes: single-seal, dual-seal, lip seal, tandem seal. During the different operations, the uids processed by the renery can change their chemical structure and physical properties considerably. For the impossibility of being sure of the exact composition of each uid processed by the plant, we have decided to classify the different uids in different categories such as: crude oil, kerosene, gasoline, diesel oil, water, thermal tar, turpentine, Lvgo, DEA, liquid propane, hydrocarbons, soda, distillation residuals, and hexane. It is evident that in this way we have introduced a certain level of approximation. But this approximation is, in any case, tolerable and compatible with the type of model/analysis adopted. Finally, considering that Failure rate of a pump Number of failures recorded ; Operative time

it is evident that the operating time will be not tested as predictor variable when the failure rate represents the dependent variable. As one can note, the variables reported in Table X are generally dened by continuous data. Only four of them (uid type, plant type, soot, and type of seal) are nominal variables. 4.1. Experimental results Considering the limited quantity of data analysed, in order to better distinguish the most critical cases, the following rules for deciding when to continue or stop splitting a node have been adopted:

64

M. Bevilacqua et al. / Reliability Engineering and System Safety 79 (2003) 5967

a maximum number of levels equal to 5 (root node not included); minimum number of cases for each parent node equal to 3; minimum number of cases for each child node equal to 1. The selective pruning of the trees will be automatically obtained by the full trees adopting the rule named 1-SE (one standard error ) rule [1]. This choice has been made considering that this rule generally gives the best results.
Table 2 Results of CART analysis Objective variable Name Predictor variables Plant Seals Fluid Soot Temperature Kinematic viscosity Pump head Pump capacity Fluid density Pump power Characteristics of the algorithm Impurity measure Stop criteria Maximum number of levels Minimum number of cases for parent node Minimum number of cases for child node Minimum change of impurity level Pruning rules Sub-tree selection Multiplier Final tree Dimensions Total number of nodes Total number of levels Total number of nal nodes Risk analysis (Performance) Total variance (root node) Proportion of variance due to error (%) Proportion of variance explained by the model (%) Predictor variable splitting root node Pruned tree Dimensions Total number of nodes Total number of levels Total number of nal nodes Risk analysis (Performance) Total variance (root node) Proportion of variance due to error (%) Proportion of variance explained by the model (%) 9 4 5 74.4132 33.0 67.0 41 5 21 74.4132 22.8 77.25 Plant

Tables 2 and 3 provide the CART experimental results for the eight different objective functions reported in Table 1. The results are relative only to those trees characterised by the best values of explained variance. As one can see, all trees show a high value of explained variance (over 70%). These results conrm the goodness of predictor variables choice, considering in particular the not so relevant size of the examined sample. The residual variance value can in fact be partially ascribed to the dimension and to the quality of the data (e.g. some possible

Failure rate

Number of failures Operative time Plant Seals Fluid Soot Temperature Kinematic viscosity Pump head Pump capacity Fluid density Pump power LSD 5 3 1 0.0001 1-SE Rule 1.0 standard error

P Operative time Plant Seals Fluid Soot Temperature Kinematic viscosity Pump head Pump capacity Fluid density Pump power LSD 5 3 1 0.0001 1-SE Rule 1.0 standard error

FI Operative time Seals Soot Temperature Kinematic viscosity Pump head Pump capacity Fluid density Pump power

LSD 5 3 1 0.0001 1-SE Rule 1.0 standard error

LSD 5 3 1 0.0001 1-SE Rule 1.0 standard error

41 5 21 5.44056 21.6 78.4 Plant

41 5 21 2.6503 17.1 82.9 Temperature

31 5 16 0.0908406 3.8 96.2 Soot

23 5 12 5.44056 26.0 74.0

31 5 16 2.6503 19.4 80.6

29 5 15 0.908406 4.5 95.5

M. Bevilacqua et al. / Reliability Engineering and System Safety 79 (2003) 5967 Table 3 Results of CART analysis Objective variable Name Predictor variables Operative time Plant Seals Fluid Soot Temperature Kinematic viscosity Pump head Pump capacity Fluid density Pump power Characteristics of the algorithm Impurity measure Stop criteria Maximum number of levels Minimum number of cases for parent node Minimum number of cases for child node Minimum change of impurity level Pruning rules Sub-tree selection Multiplier Final tree Dimensions Total number of nodes Total number of levels Total number of nal nodes Risk analysis (Performance) Total variance (root node) Proportion of variance due to error (%) Proportion of variance explained by the model (%) Predictor variable splitting root node Pruned tree Dimensions Total number of nodes Total number of levels Total number of nal nodes Risk analysis (Performance) Total variance (root node) Proportion of variance due to error (%) Proportion of variance explained by the model (%) 39 5 20 0.289696 18.0 82.0 23 5 12 0.181623 30.0 70.0 21 5 11 0.21869 29.96 70.04 17 5 9 0.230623 20.9 79.1 45 5 23 0.289696 14.6 85.4 Plant 29 5 15 0.181623 24.3 75.7 Plant 31 5 16 0.21869 25.45 75.55 Fluid 23 5 12 0.230623 18.5 81.5 Pump head LSD 5 3 1 0.0001 1-SE Rule 1.0 standard error Operative time Plant Seals Fluid Soot Temperature Kinematic viscosity Pump head Pump capacity Fluid density Pump power LSD 5 3 1 0.0001 1-SE Rule 1.0 standard error Operative time Seals Fluid Soot Temperature Kinematic viscosity Pump head Pump capacity Fluid density Pump power V GM V7 AVV

65

Operative time Seals Soot Temperature Kinematic viscosity Pump head Pump capacity Fluid density Pump power

LSD 5 3 1 0.0001 1-SE Rule 1.0 standard error

LSD 5 3 1 0.0001 1-SE Rule 1.0 standard error

inaccuracy in data recording, some data not complete, etc.) of the analysed sample. It is also interesting to note how the most important predictor variable used to split the rst root node changes in function of the type of failure considered. The analyses performed with AnswerTree software [1] whose ndings are briey shown in the previous Tables, make it possible to draw the following general considerations: (a) as expected, CART trees are characterised by a rather large number of nodes and the same predictor is used as splitting variable more than one time during the tree

growth process. These variables are distinguished by several categorical values. In particular, this is the case for the Fluid type and Plant type variables; (b) the explained variance values become larger as the predictor variables number increases. Moreover, categorical variables such as Fluid type and Plant type provide a small enhancement (only few percentage points) of the nal explained variance value; (c) the pruned CART trees report worse but comparable values of explained variance. However, they show a simplied decision structure with a reduced number of nodes and a better readability and operating easiness.

66

M. Bevilacqua et al. / Reliability Engineering and System Safety 79 (2003) 5967

Starting from the considerations previously reported (e.g. try to avoid the use of some categorical predictor variables) and with the aim to obtaining a good trade-off between accuracy of the classication tree and its readability (i.e. dimension), it is possible to dene different decision trees. As an example, Table 4 reports the main features of a different (pruned) tree developed for the failure rate objective function. With a reduced loss of explained variance with respect to the corresponding full tree (about 3%), it is possible to obtain a pruned tree with only 19 nodes compared with the original 35 nodes, with a dramatic reduction of the tree complexity. This is obtained by eliminating the several
Table 4 Results of CART analysis Objective variable Name Predictor variables

redundant splits giving only a small increase in the explained variance. In addition, it is relevant to note that the (critical) categorical predictor variables Type of plant and Type of uid are not included in the tree. Fig. 2 shows the structure of the pruned tree. The information contained in this last tree, can be used as reference to generate decision rules that are useful for maintenance operations. In Table 5, three examples of these rules obtained by using AnswerTree software are shown. They concern the rst three critical nodes of the pruned tree, i.e. the nodes characterised by the pumps with the higher failure rates. Similar results can be also obtained for the other seven objective variables reported in Table 1.

Failure rate Seals Soot Temperature Kinematic viscosity Pump head Pump capacity Fluid density Pump power

Characteristics of the algorithm Impurity measure Stop criteria Max number of levels Minimum number of cases for parent node Minimum number of cases for child node Minimum change of impurity level Pruning rules Sub-tree selection Multiplier Final tree Dimensions Total number of nodes Total number of levels Total number of nal nodes Risk analysis (Performance) Total variance (root node) Proportion of variance due to error (%) Proportion of variance explained by the model (%) Predictor variable splitting root node Pruned tree Dimensions Total number of nodes Total number of levels Total number of nal nodes Risk analysis (Performance) Total variance (root node) Proportion of variance due to error (%) Proportion of variance explained by the model (%)

LSD 5 3 1 0.0001 1-SE Rule 1.0 standard error

Fig. 2. Structure of the pruned tree for failure rate analysis.

Table 5 The most important decision rules for the pruned tree described in Table 4 Node 17 IF (Capacity [m3/h] is missing OR (Capacity [m3/h] . 5.25)) AND (Seals T OR Seals D OR Seals SL) AND (Capacity [m3/h] is missing OR (Capacity [m3/h] # 228)) AND (Power [kW] not missing AND (Power [kW] . 34.85)) AND (Head [m] not missing AND (Head [m] # 80)) THEN Node 17 Estimate 50 Node 3 IF (Capacity [m3/h] not missing AND (Capacity [m3/h] # 5.25)) AND (Temperature [8C] not missing AND (Temperature [8C] # 39)) THEN Node 3 Estimate 46.615 Node 15 IF (Capacity [m3/h] is missing OR (Capacity [m3/h] . 5.25)) AND (Seals S) AND (Temperature [8C] not missing AND (Temperature [8C] . 186.5)) AND (Kinematic viscosity [cSt] not missing AND (Kinematic viscosity [cSt] . 0.7532)) AND (Temperature [8C] not missing AND (Temperature [8C] # 316)) THEN Node 15 Estimate 28.853

35 5 18 74.4132 26.7 73.3 Pump capacity

19 5 10 74.4132 29.7 70.3

M. Bevilacqua et al. / Reliability Engineering and System Safety 79 (2003) 5967

67

5. Conclusions The decision tree generated by the CART technique provides an interesting way from which the manager can extrapolate rules for determining the expected failure rates of facilities depending on the operating conditions. In the application example presented, a complex oil renery plant has been analysed and a total of 143 centrifugal pumps considered. The decision tree obtained and based on an extended record of failure and operating condition information provides a manager with a practical and useful methodology. These analyses demonstrated that important prognostic variables were consistently applied by the CART approach and effectively segregated pumps into groups with similar failure conditions. CART also identied previously unappreciated pump subsets and is a useful method for dissecting complex failure situations and identifying homogeneous pump populations for future maintenance actions. From the obtained decision diagrams it is possible to recognise the most critical pumps in function of the objective variable considered. In this way such pumps can be submitted to a constant monitoring action with the aim of improving the efciency and reducing the failure rate. Other information that can be derived from the decision trees is the hierarchy of the predictor (independent) variables. In general, the variable selected to split the rst root node holds a primary importance and should be carefully taken into account when maintenance policies are dened. For example, if the predictor of a critical branch (e.g. characterised by pumps with high failure rate) is the high temperature of the uid processed, the economical suitability and technical feasibility of an investment in a heat exchanger to decrease the temperature uphill of the pump should be evaluated. If the critical situation is due to the high capacity required to the pumps, a solution could be found which is based on a parallel conguration of two or

more pumps. If the critical predictor variable is the pump head, a system based on a series of pumps is probably the most appropriate solution to reduce the reliability problems. It is evident that changes concerning the characteristics of the uids processed, also if critical, are less practicable than a plant solution.

References
[1] AnswerTreee 2.0, User Guide. Ireland: SPSS Inc; 1988. [2] ASTM D341-93 Norm. Viscosity temperature charts for liquid petroleum products; 1998. [3] Biggs DB, de Ville B, Suen E. A method of choosing multiway partitions for classication and decision trees. J Appl Stat 1991;18: 4962. [4] Breiman L, Friedman J, Olshen R, Stone CJ. Classication and regression trees. Belmont, CA: Wadsworth; 1984. [5] Kass G. An exploratory technique for investigating large quantities of categorical data. Appl Stat 1980;29(2):119 27. [6] Loh WY, Shih YS. Split selection methods for classication trees. Stat Sinics 1997;7:81540. [7] Markham IS, Mathieu RG, Wray BA. A rule induction approach for determining the number of kanbans in a just-in-time production system. Comput Ind Engng 1998;34(4):71727. [8] Mingers J. Rule induction with statistical data: a comparison with multiple regression. J Oper Res Soc 1987;38(4):34751. [9] OConnor PDT. Practical reliability engineering. New York: Wiley; 1991. [10] Sorensen EH, Miller KL, Ooi CK. The decision tree approach to stock selection. J Portfolio Mgmt 2000;27(1):4252. [11] Stark KDC, Pfeiffer DU. The application of non-parametric techniques to solve classication problems in complex data sets in veterinary epidemiologyan example. Intell Data Anal 1999;3: 23 35. [12] Voracek J. Prediction of mechanical properties of cast irons. Appl Soft Comput 2001;1(2):11925. [13] Zhang H, Yu CY, Singer B, Xiong M. Recursive partitioning for tumor classication with gene expression microarray data. Proc Natl Acad Sci USA 2001;98(12):67305.

You might also like