You are on page 1of 12

~ )

Pergamon

ControlEng. Practice,Vol.5, No.10,pp. 1373-1384,1997 Copyright 1997ElsevierScienceLtd Printedin GreatBritain. All rightsreserved 00967-0661/97$17.00+0.00

PII:S0967-0661 (97)00134-2

CLASSIFYING PILOT-PLANT DISTILLATION COLUMN FAULTS USING NEURAL NETWORKS


D.A. Brydon*, J.J. Cilliers* and M.J. Willis**

*Department of Chemical Engineering, Universityof Manchester Institute of Science and Technology (UMIST), P.O. Box 88, Manchester, M601QD, UK **Department of Chemical and Process Engineering, Universityof Newcastle upon Tyne, Merz Court, Newcastle upon Tyne, NEI 7RU, UK

(Received November 1996; in final form July 1997)

Abstract: An approach to fault detection is described which uses neural-network pattern classifiers trained using data from a rigorous differential-equation-based simulation of a pilotplant column. Two eases studies are presented, both considering only plant data. For two classes of process data, a neural network and a K-Means classifier both produced excellent diagnoses. Extending the study to include three additional classes of plant operation, a neural network again gave accurate classifications, while a K-Means classifier failed to correctly categorise the data. Principal components analysis is used to visualise data clusters. The robustness of the neural networks was found to be generally good.
Copyright 1997 Elsevier Science Ltd

Keywords:

Fault detection and diagnosis, neural networks, pattern classification, distillation columns.

1. INTRODUCTION The potential speed, accuracy and the inherent nonlinearity of the feedforward artificial,neural network (FANN) technique are very well suited to interpreting sensor information from complex chemical processes. Unlike expert systems, which use deductive analysis to produce an interpretation of system inputs, FANN use inductive reasoning, i.e. a general inference is made based on a set of known past instances (Leonard and Kramer, 1991). The majority of published research into fault and pattern classification in the process industries uses simulated plant, since it is generally undesirable and inconvenient to introduce faults into a real process. In this contribution, the behaviour of a pilot-plant distillation column is classified using neural networks 1373

and the K-Means clustering algorithm. The data used to develop each model are derived solely from a differential-equation-based model of the column. The significance of employing a simulator in this manner is two-fold. Firstly, plant fault data are typically sparse and insufficient for the development of a satisfactory classifier, and hence the use of simulated measurements enables the available plant data to be reserved for testing. This is particularly important when a process shift occurs which requires rapid modification of the existing classifier. If insufficient archive plant data is available, then the classifier cannot be modified until a suitable number of samples have been logged from the process and hence may perform sub-optimally for the new operating point during the logging period.

1374

D.A. Brydon et al. estingly, for such a large network, only 200 samples were used for training. Neverlheless, accurate classification is reported for both the training and test phases. Hoskins et al. (1991) also briefly discuss the idea of using a state-space model in conjunction with FANN. Venkatasubramanian and co-workers have examined in detail many aspects of FANN-based techniques of fault classification. In particular, it was reported that a FANN trained to recognise faults originating from a single cause could generalise well when presented with faults arising from multiple sources (Venkatasubramanian and Chan, 1989). Similar results were presented for a simulated reactor-distillation column case study. Both Kavuri and Venkatasubramanian (1992, 1994) and Hoskins and Himmelblau (1988) use simulated plant and report very good fault diagnosis. Patton et al. (1995) present a succinct review of several techniques of fault detection and diagnosis such as residual analysis, pattern recognition, parameter estimation and fuzzy logic. Although Leonard and Kramer (1990, 1991) did not specifically consider the application of neural network techniques to the detection of faults in a chemical process, their contribution to the literature is significant in many respects. The general conclusion of Leonard and Kramer (1990) was that distance-based classification techniques such as the nearest-neighbour method tend to exhibit robustness properties that are superior to those of sigmoidal neural networks trained using back-propagation. This was based on data, exlracted from a very simple simulated plant, that produced clusters in the input space that were quite symmelrical. Such clusters are nol frequently observed in plant dala (Kavuri and Venkatasubramanian, 1994) and the orientalion of the clusters tends to be much more random than the data used by Leonard and Kramer (1990, 1991).

Measurements of new conditions can be acquired from the simulator at a much greater rate than they are available from the plant; hence a classifier may be modified using these synthetic samples while testing is performed using process data as soon as they become available. Adopting this methodology enables a classification model to be modified, tested and redeployed in the period in which measurements are collected for a strategy based solely upon plant data. The second advantage of using a rigorous differentialequation-based model is the ability to estimate unmeasured or unmeasurable variables. For example, the model employed here provides estimates of quantities such as tray efficiencies and residence times, both of which may be useful in establishing the existence of internal faults before detection would be possible using only measurements such as temperature or flow. The main objective of this contribution is to demonstrate the feasibility of using simulated data to develop simple neural-network models that are capable of accurately classifying the behaviour of a pilot-plant column. Two cases are examined here, the first of which considers a simple example of pattern recognition in which neural network and K-Means classifiers are trained using simulated data to distinguish between two easily separable process conditions. The aim is to illustrate the feasibility of using simulated data to develop a classifier capable of correctly classifying process data. Secondly, a more complex classification task is considered, in which the models are trained to detect real plant faults. The differences between the measurement patterns for these faults are more subtle than in the first case, thus providing a sterner test of the discriminatory abilities of the candidate classifiers. Robustness to sensor faults and biases is also addressed, since such conditions frequently occur in plant environments, and a classifier must perform satisfactorily under these conditions to be considered viable.

2. PATTERN CLASSIFICATION AS APPLIED TO CHEMICAL PROCESS PROBLEMS Sorsa and Koivo (1993) compare the abilities of four neural-network architectures to classify faulty behaviour in a simulated heat exchanger-reactor system, concluding that back-propagation trained networks and radial basis function networks give a similar classification performance. Hoskins e t a l . (1991) discuss the application of a single network to a simulation of a complex plant. A very large FANN was used, comprising 418 inputs, 20 outputs and two hidden layers, one fixed at 40 nodes and the other varying between 18 and 30 nodes. Inter-

Further, t.eonard and Kramer (1990, 1991) discuss in detail the propensity of sigmoidal neural networks to generate poor classifications without providing a similar treatment of the limitations of distance-based methods. These methods can also produce erroneous results if classes of data overlap or are asymmetrical. Although the technique proposed by Kavuri and Venkatasubramanian (1994) addresses many of the limitations of sigmoidal networks described by Leonard and Kramer (1990, 1991), such approaches require a significant amount of time for data pre-processing. In addition, the technique necessitates a substantial degree of expertise in the statistical methods used, and is hence not immediately transparent to plant personnel unfamiliar with the required theory. Thus, a further aim of this work is to demonstrate the development of a methodology constructed from simple classification models that is readily accessible, robust and easily adaptable.

Classifying Pilot-Plant Distillation Column Faults 3. EXPERIMENTAL RIG The pilot-plant distillation column used in this work (shown in Figure 1) is approximately 7.5m high, 0.3m wide and separates a binary mixture of methanol and isopropanol using 30 sieve trays. Twenty-four variables can be measured, including the top and bottom product temperatures, in addition to those on six intermediate trays. The reflux, feed, steam, product and cooling water flows are all measurable, and an approximate pressure profile can be established using differential pressure measurements from the top, bottom and feed plates. The availability of inlet and outlet cooling water temperatures enables the performance of the overhead condenser to be monitored. A small amount of butanol (0.5-1.0 mol%) is also present in the feed. The L-V control structure is used, in which reflux (L) controls the top product purity and steam (essentially vapour flow, V) controls the bottom product composition. Further details can be found in (Brydon,1995).

1375

The model, which is based on those presented by Grassi (1992) and Luyben (1990) and has been validated using steady-state and dynamic plant data, includes dynamic calculations of the column tray efficiencies via the AIChE method and rigorous pressure drop calculations. The equations representing the column states are given in the appendix. The second-order Runge-Kutta method is used to integrate these equations, and a procedure has been developed to enable estimates of the initial conditions to be generated on-line. These conditions are then used as the basis for on-line simulations. In this work the model is used solely off-line to generate synthetic training data for the neural network and K-Means models. The model is capable of simulating one minute of plant time in approximately 6.6 seconds, and can hence generate simulated measurements at a rate that is appreciably quicker than that at which readings can be acquired from the plant.

5. CLASSIFIER DEVELOPMENT 4. DIFFERENTIAL-EQUATION-BASED SIMULATION OF THE PILOT PLANT The simulated data are derived from a differential equation-based model of the pilot-plant column. The basic structure of feedforward neural networks and the training methods available have been discussed extensively in the literature (e.g. Bremermann and Anderson, 1989; Werbos, 1990; Lippmann, 1987) and will not be presented here.

PI~tJI~T

...... -N

...........

ITII,gM

........

E
1PA~ lt'llOtlaJ~T

Fig. 1. Schematic of the pilot plant column

1376

D.A. Brydon et al. Here, back-propagation is used to train the neural networks, the hidden and output nodes of which consist of sigmoidal functions. The partitions described above are created by hyperplanes, the orientation of which is determined by the weights of the network. Poorly positioned decision boundaries lead to incorrect classifications, as discussed in detail by Leonard and Kramer (1990, 1991). Distance-based classification algorithms, however, partition data by placing samples into the nearest cluster. Proximity is typically defined by the Euclidean distance, and the position of the cluster by the average of the samples it contains. This type of classification is more intuitive and has been shown to be more robust than neural networks Irained using back-propagation (Leonard and Kramer, 1991). Here, the K-Means algorithm is used as an alternative to the neural network as it is a computationally simple algorithm that may be incorporated into more complex techniques such as radial basis function (RBF) networks. Testing the classifications on plant measurements indicates the extent to which the decision regions formed by the network (from the synthetic training data) are applicable to the process. Training and testing are carried out sequentially, with a single iteration consisting of one pass through the training data followed by a single pass through the test data. The parameters of the network are adapted using only the training data and are held constant during the test phase. This procedure is frequently termed "cross-validation" and is used to avoid overfitting (Leonard and Kramer, 1991), which occurs when the network closely approximates the classifications of the training data more closely than the overall probability distribution of the whole class. Overfitting is signified by a continually increasing error on the test data.
5.1 Case Study 1

INPI/T L A Y E R

HIDDEN LAYER

OUTPUT LAYER

Fig. 2. A feedforward artificial neural network

Consider a neural network with n inputs, a single hidden layer of m nodes and j output nodes (i.e., an n-m-j network) as depicted in Figure 2. The inputs to the m hidden nodes form so-called "decision boundaries" that partition the input space, thus forming classification areas known as "decision regions". The outputs of a pattem classifier indicate the class to which the input data belong. Here, two neural-network classifiers are developed, the first of which separates two classes of data, and the second an additional three classes. A single output node is frequently used when two classes of data are to be separated (Leonard and Kramer, 1991), where the desired nodal output is usually unity when the input data represent a fault and zero if the process is operating normally. Two outputs may also be specified, in which case training proceeds by specifying the first output node to produce a value of 1 when the inputs belong to Class 1 and zero when the inputs fall into the second category. Similarly, the second output node produces a value of 1 for inputs from Class 2 and zero when presented with measurements from Class 1. Two output nodes are specified in this work to enable a direct comparison to be made with a K-Means cluster model which must use two clusters to classify two classes of data. Similarly, five outputs are specified for the second case study, each node producing a high output when the inputs represent the class associated with that node, and a low output otherwise. The association of output node j with class j is common in the literature, and is employed in all the aforementioned references. In practice, the outputs are usually set slightly above zero and slightly below 1. This is because the sigmoid Iransfer function approaches unity and zero at + ~o, implying that infinitely large weights are required to attain these values (Sorsa and Koivo, 1993).

The proposed neural network and K-Means fault classifiers will be trained to discriminate between two modes of operation. The first mode (termed Phase 1) is defined by the steady state obtained using the steadystate conditions shown in Table 1, while the second (Phase 2) is considered to be a step-change of 50wt% to 70wt% in the methanol concentration in the feed from this steady condition. Both conditions are simulated using the mechanistic model. Although the selection of the most significant variables to use as classifier inputs is an important task (Leonard and Kramer, 1990) here an intuitive choice of inputs is adequate since the aim is solely to demonstrate the feasibility of the approach rather than to present a detailed analysis of the model structure. Furthermore, the causal variable of the change in operating point, in this case the feed composition, is not specified as an input. Hence, the classifier must infer the presence of the fault using the inputs given in Table 2.

Classifying Pilot-Plant Distillation Column Faults Table 1 Nominal operating point for the column
2 2.S

1377

Feed flow rate (litres/hour): 60 Top control temperature setpoint (C): 68 Bottom control temperature setpoint (C): 80 Top product purity (mol% methanol): 99.9 Bottom product purity (tool% isopropanol): 97.7

J.2t
#

4 ~ II Number of Hidden Nodes 4-

10

I ~ Tndnino error

Test error I

The dynamic responses of these inputs vary, with the closed loop time constant of a typical uncontrolled temperature response (such as those on trays 23, 21 and 15) being appreciably quicker than that of the flows. For both classes, fifty samples (with a sample period of one minute) of each input were extracted from the simulation to form the training data. An identical amount of process data was used to compile a propagation set. Note that the measurements representing Phase 2 were taken frtxn the new steady state resulting from the increased methanol concentration in the feed. Thus, the final neural-network fault classifier (NNFC) input set for the first case consists of 200 samples of each variable, of which the first one hundred comprised the simulated training data and the remainder of the process propagation data. Using back-propagation with a learning rate of 0.3 and a momentum term of 0.5 to train networks with hidden layers ranging in size from 1 to 10 nodes, a minimum stma-of-squared-error on the test set was achieved using 9 hidden nodes after 80 iterations (Figure 3). All the simulated and process data from both classes were classified correctly. The clusters within the data may be visualised using Principal Components Analysis (PCA), an approach which was adopted by Sorsa and Koivo (1993). A detailed description of PCA is beyond the scope of this work and a concise treatment of the technique is provided by Johnson and Wichem (1988).

Fig. 3. Training and test errors for two-oul mt neural network

Briefly, PCA reduces the dimension of the data to a number of principal components which explain different proportions of the variance within the original data. Typically, the first two to three components account for the majority of the variation, and may be used to represent the data in a lower dimension. The data were normalised by subtracting the means and dividing by the standard deviation prior to performing a PCA, the results of which are shown in Figure 4. Note that the circular and elliptical boundaries used in Figure 4 are used purely to aid visualisation of the clusters, and do not represent the true class boundaries or neural-network decision regions. It was found that the first two principal components accounted for 70% of the total variance within the original data. The two classes appear quite widely separated, indicating that 9 hidden nodes may lead to spuriously classified regions. Further, the simulated training data for both classes form relatively compact clusters in the upper quadrants of the plot, while the plant measurements form more diffuse clusters in the opposing quadrants. The first principal components for both cluster pairs are of the same order, while the second principal components for the plant data are negative. This implies that the level of variance accounted for by the second principal components of the simulated data is different from that of the plant data. This may be confirmed by an analysis of the eigenvalues of the original data matrix, although such an examination is beyond the scope of the current work. A K-Means clustering model, with two clusters, was also trained. The simulated data were correctly categorised very quickly, with training requiring 2 iterations. The ~ates of the duster centres were saved (analogous to saving the weights of a neural network), and were found to classify all process test data correctly.

Table 2 Neural-network innuts Input 1 2 3 4 5 Variable Top temp. Bottom temp. Feed Reflux Bottom Input 6 7 8 9 10 Variable Distillate Temp. tray 25 Temp. tray 5 Temp. tray 23 Temp. tray 21 Temp. tray 15

iII

1378
3

D.A. Brydon et al. Thus, fifty samples of test and training data were specified for Class 4 while 40 samples of training data were generated for Class 3 and 35 for Class 4. This results in an input set consisting of 420 samples of data. Networks were trained using back-propagation (and the same learning rate and momentum as Case 1) with a single hidden layer ranging in size from 25 to 35 hidden nodes. A hidden layer of 27 nodes produced the lowest error on the test data, as can be seen from Figure 5. It was noted that hidden layers containing 24 hidden nodes or less could not correctly classify the fifth class of data, i.e. the level sensor fault. The training time for the network containing 27 hidden nodes was substantially greater than for the previous case. During training it was observed that the network categorised the data frorn the steam valve fault, the isopropanol increase and the original steady-state relatively quickly. The differentiation of the methanol increase samples and those representing the level sensor fault (Classes 2 and 5) constituted a significant proportion of the training period, indicating that the characteristics of these two classes are quite similar. A PCA was performed on this data in the same manner as for Case 1, as shown in Figure 6. Note that the first two components explain 73% of the variance in the original data. The training and test data for Class 3 are reasonably close, as is the case for Class 4. The training data cluster for Class l, however, is close to that for Class 2, while the corresponding test data sets are quite distant. The area between the training and test data for these two classes is occupied by both sets of Class 5 data, which clearly overlap Class 2. This explains the slow training time and the large number of nodes required in the hidden layer, since quite convoluted decision boundaries are clearly required to separate Classes 2 and 5. The presence of such intricate partitions in the input space also implies that the network may lack robustness to shifts in the operating point.

~Class2Training Daia ~-1 Class 1Training Dam

~ -2 o

Class 2 Test Data

Class 1Test Data -4 0 2 Firstpnncipal component 4

Fig. 4 PCA of two-class data. The rapid training times and high classification accuracies indicate that the data are easy to separate. This can be attributed to the step-change in the feed composition, which causes a large characteristic perturbation to propagate through the column resfilting in a new steady state that is quite distinct from the original operating point. When the aim is to detect faults, however, the differences between measurement patterns for each class maynot be as pronounced as in this case and a finer level of discrimination is required. The following case study describes such a situation.

5.2 Case Study 2

For the second example, the data set used in Case 1 was augmented with simulated measurements of a further three process conditions, namely a step in the isopropanol feed concentration (of the same magnitude as for the methanol step-change), plus faults with the steam flow control valve and the reflux drum level transmitter. These classes are summarised in Table 3. Although fifty samples of process data were included for the isopropanol change, only 35 samples were available for the level sensor fault and 10 for the steam valve case. Simulated data of the last two conditions are clearly required since the existing process measurements are insufficient for both training and testing.

Table 3 Plant fault classes Class Plant State Normal steady-state Excess methanol in feed Faulty steam control valve Excess isopropanol in feed Faulty top level transmitter

g"

2.8

,..jr
ofHidden Nodm;
~

Number

[ -,-Training Error

Test

Fig. 5 Training and test errors for five-class data

Classifying Pilot-Plant Distillation Column Faults


10

1379

6.1 Determination of the base performance of the classifier


Class4 ~ + ~ Class5

Class2 ~

ClassI

-2 0 2 First PrincipalComponent

Fig. 6. PCA of five class data

The decision boundaries will clearly be very close to the edges of both Class 2 and Class 5; hence minor adjustments in, say, the temperature profile in the column, could result in Class 2 data erroneously appearing in a region defined by the network as Class 5. The following study evaluates the performance of this network when the sensors supplying the input data are assumed to be faulty or biased.

The performance of the classifier when all inputs were devoid of biases or faults is first established. Using the validation data set described previously, the five-output neural network correctly classified all Phase 1 data and 80.4% of the Phase 2 measurements. The network was unable to classify the first 15 minutes of the column response to the step, a period which is approximately equal to the closed-loop time constant of the plant. These misclassifications can be attributed to the lack of transient data in the training set, although it must be noted that repeated attempts to develop classifiers capable of correctly categorising the transient response proved unsuccessful. This also implies that the product flow rates dominate the classification process within the network, since the value of these variables after one time constant are approximately equal to those in the training data for Phase 2. Nevertheless, a correct classification of the step-change is produced after 15 minutes have elapsed, and thereafter all measurements are classified correctly. Thus, it would appear that the distillate and bottoms product flow rates determine the rate at which the classifier can respond to a step-increase in the feed composition. It was also noted that samples taken in the period 60 to 90 minutes after the introduction of the step were misclassified as Phase 1 data. At this point in the column response, the product flow rates are at the trough of a slow downward oscillation caused by the controllers and at this instant are within 10% of the Phase 1 measurements in the training. Consequently, the network places these samples into the class defining Phase I, thus providing further evidence of the influence of the product flow rates in classifying the column behaviour. 7. ROBUSTNESS TESTS To investigate the robustness of the classifiers introduced in the previous section, the notion of a temperature bias of -+ 1.5C is assumed, as similar off-sets have been observed on the plant while flow biases of + 10%, + 15% and d- 20% are considered. A faulty sensor is assumed to produce readings indicative of normal behaviour when the process is faulty. This is achieved by calculating the average value of the sensor reading while the process is normal and using this value as the classifier input during both normal and faulty conditions. Cases in which a single sensor is faulty are considered first, followed by examples of single sensor bias. The validation data described previously are used. The performance of the classifiers is quantified using the classification accuracy, which is defined as the percentage of samples of a given class that are classified correctly.

6. EVALUATING THE ROBUSTNESS OF THE CLASSIFIERS OFF-LINE A third set of data was compiled to enable the performance of the five-output classifier to be examined offline, prior to on-line implementation. None of the measurements contained in this validation set were present in the training or test sets. The validation set covers an entire step-change experiment, thus encompassing the plant steady state (the normal class), the transient behaviour following the introduction of the step-change and the subsequent new steady state (the fault class). With a sample time of one minute, the first 80 samples of this set represent the normal class while the remaining 97 samples define the faulty state of the column. Hence, the dynamic response to the perturbation, samples of which were not included in the original training data, is included in the Class 2 data. Practical experience indicates that, in subsequent use, the actual neural-network outputs will lie close to the values desired during training, while perhaps not matching them exactly. Thus, for the robustness tests and on-line experiments, thresholds are specified that define the process state. A class is deemed active if its output is greater than 0.7 and inactive if below 0.3. It is difficult to attach any meaning to outputs lying between these two thresholds, other than the corresponding input data are not definite members of a pre-defined class.

1380 7.1 Shzgle sensor faults

D.A. Brydon et al.

For all 12 cases of single sensor faults, the neural network correctly classified all Phase 1 data while individual measurements in 8 inputs yielded Phase 2 accuracies that were within 5.1% of the base performance established above. Figure 7 shows the accuracy for faults in each of the 11 network inputs. A fault with the bottoms flow meter, however, leads to almost all Phase 2 data being placed into the category defining Phase 1. The network performance also degrades when faced with faulty inputs from the temperature sensors on trays 23 and 21, with the latter case resulting in a particularly poor Phase 2 accuracy.

100

FEED

REFLUX

BOTTOM

~I'EAIA

TOP8

Fig. 8. Accuracies for 20% flow bias

7.2 Single sensor biases -flow measurements Considering first positive and negative bias of a 20% magnitude, Figure 8 shows that 6 of the ten cases yield Phase I accuracies of 72.5% or greater. In addition, the performance when faced with positive bias is generally superior to that for the negative bias cases, where the former case produces 4 Phase 2 accuracies of 80.4% or above. Separate negative bias of the feed and reflux flows leads to poor classifications of both phases, while the same bias applied to the bottoms flow results in poor categorisations of Phase 1. A lack of robustness is indicated by the poor results for negatively biased feed and reflux flows. When compared to the excellent classifications for a positive bias of the same magnitude, it may be inferred that negative bias of these variables leads to placement of the resultant measurement patterns into a region that does not define the required fault. Reducing the magnitude of the bias to 15% yields a similar trend, the smaller bias resulting in a modest improvement in the Phase 2 accuracies for the cases discussed above (Figure 9).

Again, the network appears to be more robust to positive bias. This trend is repeated to a lesser extent for positive and negative biases of 10% as can be seen from Figure 10. Notably, negative bias of the feed and reflux flow lead to overlaps between Classes 1, 2 and 5. This was also noted for the 20% bias cases but was not apparent with bias of 15% magnitude.

7.3 Sbzgle s e n s o r b i a s e s - t e m p e r a t u r e m e a s u r e ments

Figure 11 illustrates that the Phase 1 accuracy for all cases of temperature bias was generally excellent. Negative bias of the temperature on tray 21 leads to a modest but acceptable degradation in the classification performance, whereas the response to a biased reading from tray 15 is poor. This sensor is the closest to the point at which the feed enters the column and will hence be Ihe first to detect a change in temperature caused by a change in the feed composition. This indicates that the temperature on tray 15 is significant for the detection of such faults.

1oo
~x 60
L-

100
~
\\ \\

..'~, ~

_.."~

\"

60 40

2o
o 2 4 S S 7
Input

20
FEED

4]]1
REFLUX IbOTTOM ~ TOP8

10 11

Clmifier

IQ--i

Fig. 7. Single sensor fault accuracy

Fig. 9 Classification accuracies for 15% flow bias

Classifying Pilot-Plant Distillation Column Faults

1381

IO0

lOO

6O

60

~0

20 o F'GED REFLU~ BOTrOM* 8TF.AM TOPS Ck~lfler b ~

20 ~0 0 o 7 2s 23 2t t5 6 poelUon of Temperature 8 e n ~ r ('Troy)

[[~"
Fig. 10. Classification accuracies for 10% flow bias

I1"-"

Fig. I 1 Accuracies for temperature biases

Hence positive offsets in this temperature caused by measurement bias are attributed to Class 4, since an increase in the amount of isopropanol (i.e., the higher boiling point component) in the feed will lead to a corresponding increase in the temperature on tray 15. The classifications of Phase 2 were also good and, once again, the network clearly produced the superior performance when the bias was positive. 8. IMPLEMENTING THE NEURAL NETWORK ON-LINE The neural-network classifier was installed on a PC connected to the main plant control computer via a serial link. Measurements are available from the plant at intervals of 7 seconds, a sample time that is almost an order of magnitude lower than that of the off-line training, test and validation data. The aim of this investigation was to evaluate the performance of the network in a realistic situation where the experimental conditions were slightly different from those on which the classifier was trained. Note that the plant measurements were passed to the network without filtering or other preprocessing. A step-change was made in the feed composition, again from 5 0 w ~ methanol to 70wt%. The network correctly classifies 85.6% of the data before the step and 53.4% afterwards. In addition, almost 30% of the Phase 2 data were classified as belonging to Class 1 while the remaining 17% were placed into Class 5. All other outputs were correct in both phases. This is a further example of the overlap between these three classes but also represents an encouraging performance from the classifier since the majority of Phase 2 data are placed into Class 2, indicating that this is the most likely source of the disturbance.

develop classifiers capable of classifying data from the corresponding plant. Further, Case 2 also illustrates that distance-based algorithms may form incorrect classifications if classes are similar or overlap. Such algorithms can ascribe equal variance to all directions of the fault measurements when this is not the case (Kavu~i and Venkatasubramanian, 1994) and hence lead to incorrect decision regions. The method proposed by Kavuri and Venkatasubramanian (1994) uses bounded ellipsoidal hidden nodes to approximate the variance structure of the classes using PCA. Moreover, in the presence of process noise or class overlap, the nearest neighbour in the input space may be a member of a different class. The K-Means model was clearly the most parsimonious c~ the two developed in Case 1, particulady when the size of the neural-network hidden layer is considered. The PCA plot of this data (Figure 4) shows two quite widely separated classes of data, yet the neural network with the lowest sum-of-squared error on the test data c~nprised 9 hidden nodes. This implies that areas of the input space have been partitioned arbitrarily and unintuitively, as the PCA plot suggests that fewer decision boundaries are required to separate the data. Similarly, the large hidden layer used in the second case study indicates that very intricate decision boundaries were necessary to separate the overlapping data. Given the extent of the overlap between Classes 2 and 5, it is probable that the decision regions will be fragmented, suggesting a lack of robustness. The inability of the K-Means algorithm to correctly classify the five classes of data in Case 2 when using the same inputs as the neural network suggests the limited applicability of such techniques when the classes are asymmetrical and overlap. This example does not agree with the view expressed by Leonard and Kramer (1990) that distance-based methods are generally superior to sigmoidal neural networks.

9. DISCUSSION Cases 1 and 2 presented here demonstrate that simulated data extracted from a valid plant model can be used to

1382

D.A. Brydon et al. over 80% of the plant test data. The five-output network exhibited a reasonable degree of sensitivity to changes in the product flows and the temperature on tray 2l. A loose indication of this was derived from the network weights and input data PCA. Furthermore, the product flows determine the minimum response time to a feed step, which was found to be approximately one closed-loop time constant. This network also showed good robustness to a wide range of sensor faults and biases. The K-Means algorithm was unable to classify the fiveclass data due to overlap, which was particularly evident between Classes 1, 2 and 5, although in the two-class case the distance-based model proved to be the least complex. Neither the neural network nor the K-Means model could classify the transient behaviour following a feed step.

The response of the five-output classifier to shifts in individual variables was, however, generally good, although sensitivityto the product flows and the temperature on lray 21 was noted. Although it is difficult to extract meaningful interpretations from the network weights of the significance of the inputs, it was noted that the weights connecting the distillate ,and bottoms flow to the hidden layer were generally large. In particular, the magnitude of the bottoms flow weights exceeded that of the corresponding distillate parameters for 20 of the 27 hidden nodes. Further, an indication of the contribution of each variable to each principal component is given by the elements of the eigenvectors of the input data. Perf~arfing such an analysis reveals that the bottom flow rate possesses the second largest element for the first component, while the temperature on tray 21 is weighted by the largest eigenvector element for the second component. This appears to substantiate previous observations that the classifier is sensitive to these inputs. The performance of the neural network in the cases studied here is encouraging. Firstly, no attempt was made to pre-process the data in order to extract the most relevant information. Such data analysis can be timeconsuming, and may not be possible if a classifier has to be developed and implemented rapidly. Thus, the methodology presented here offers the possibility of producing accurate classifiers in a short period of time using an abundance of (simulated) training data. Some data processing will clearly be required to account for class overlap should this be expected. In essence, it is conjectured that the methodology proposed here will be complementary to more complex methods, while being relatively simple and transparent. Furthermore, the robustness of the network is particularly encouraging given that data analysis and feature extraction were kept to a minimum. Finally, it is conjectured that incorporation of the PCAbased method proposed by Dennis and Philips (1991) to analyse the hidden layer will further improve the diagnostic capabilities of the methodology presented here. This method performs a PCA on the outputs of the hidden nodes, which tend to form clusters for each class of input data. Hence, regions in the hidden node space can be defined for each class and novel data can be classified via a distance measure to the nearest cluster. An indication can then be provided of the proximity of the current input data to known neural network decision regions. This would then provide a simple combination of sigmoidal neural networks, multivariate statistics and distancebased classification approaches. It is intended to develop this methodology in further work. 10. CONCLUSIONS Neural networks were trained using simulated measure= merits and were subsequently found to correctly classify

REFERENCES Brydon, D.A. (1995), M.Sc Thesis, UMIST, UK. Bremmerman, H.I. and Anderson, R.W. (1989), An alternative to back-propagatiou: A simple rule for synaptic modification for neural net training and memory, Internal report, Dept. of Maths, Uni. of California, Berkeley. Dennis and Philips (1991), Analysis Tools for Neural Networks, Neural Networks and Engineerhzg Applications, International Symposium, University of Newcastle upon Tyne, UK, Oct. Grassi, V. (1992), Rigorous modelling and conventional simulation - in Practical DistiUation Control (Ed. W. Luyben), Van Reinhold, N.Y., pp. 29-47. Hoskins, J.C. and Himmelblau, D.M. (1988), Artificial neural network models of knowledge representation in chemical engineering, Comput. & Chem. Eng, 12(9), pp. 881-890, Hoskins, J.C., Kaliyur, K.M. and Himmelblau, D.M. (1991), Fault detection in complex chemical plant using artificial neural networks, AIChE Joutvml, 37(1), pp. 137-141. Johnson, R.A. and Wichem, D.W. (1988), Applied Multivariate Statistical Analysis, Prentice Hall International, N.J. Kavuri, S.N. and Venkatasubramanian, V. (1992), Combining pattern classification and assumptionbased techniques for process fault detection, Comput. & Chem. Eng, 16(4), pp. 299-312.

Classifying Pilot-Plant Distillation Column Faults Kavuri, S.N. and Venkatasubramanian, V. (1994), Neural network decomposition strategies for largescale fault detection, Int. J. Control, 59(3), pp. 767-792 Leonard, J.A. and Krarner, M.A. (1990), Diagnosis using back-propagation neural networks - analysis and criticism, Comput. & Chem. Eng,, 14(12), pp. 1323-1338. Leonard, J.A. and Kramer, M.A. (1991), Radial basis function networks for classifying process faults, IEEE Cont. Sys. Mag. , pp. 31-38, April. Lippmann, R.P. (1987), An introduction to computing with neural nets, IEEEASSP Mag., April, pp.4-22 Luyben, W.L. (1990), Process Modeling, Simulation amt Control for Chemical Enghzeers, 2nd Edition, McGraw-Hill Publishing Co., New York. Patton, R.J., Chen, J. and Nielsen, S.B. (1995), Modelbased methods for fault diagnosis : Some guidelines, Trans. Inst. Measurement and Control, 17, no. 2, pp 73-83. Perry, R.H. and Green, D. (1984), Perry's Chemical Engineers' Handbook, McGraw-Hill Book Co., New York. Sorsa, T. and Koivo, H.N. (1993), Application of artificial neural networks in process fault detection, Automatica, 29, no. 4, pp 843-849. Stephanopoulos, G. (1984), Chemical Process Control McGraw-Hill Book Co., New York. Venkatasubramanian, V. and Chan, K. (1989), A neural network methodology for process fault detection, AIChEJoulTzal, 35(12), pp. 1993-2002. Werbos, P.J. (1990), Backpropagation through time: What it does and how to do it, Proe. IEEE, 78(10), pp. 1550-1560. Fig. A. 1 A general column tray

1383

Ln+l _/.--

Xn+ 1

where: V = vapour flow Yn,Yn-I= vapour compositions on the n th and ( n - l ) t~ trays respectively. x,, x,+t = liquid compositions on the n t~ and (n+l) th trays respectively. ~ , L~.t = liquid flow rates from the n th and (n+l) th trays respectively. M n = liquid hold-up on plate n.

Mathematical Model Differential Equations


The total mass balance around the ith plate is given by:

t~b/"t
- -

dt

- Li. I - L I

(A.1)

where I~ = the liquid flow from plate i. The corresponding component balance for the ith plate is:

d(Mft) T " " Lt'lXt'l " Vyt-I - Llxi - Vyt

(A.2)

where"

APPENDIX The differential equations that form the mathematical model of the column have been discussed in many contributions; in particular concise accounts are provided by (Stephanopoulos, 1984), (Luyben, 1990) and (Perry, 1984). The variables appearing in the differential equations are depicted in Fig. A.1.

V = vapour flow x~,y~= Liquid and vapour compositions on plate i. The mass balance around the feed tray (plate./) is given by the following:

dMf "tit " L/'I - LI'* F

(A.3)

1384 The feed plate component balance is:


a(M:i)
- LulX~.l *
dt

D.A. Brydon e t

al.

Considering now the bottom of the column, the mass and component balances for the first tray above the reboiler
are :

VYt.1 -

L ~ x t - :'3'~ * F z j

(A.4) ;~lf1
- - L 2 - L 1 (A.9)

dt

where:

zj = mole fraction of component j in the


feed stream.

a(Mtl)
dt

L2x 2 - Vy I -LlX

l Vy B

(A.10)

Mass and component balances around the condenser and reflux drum yield:

a(MD)
- - dt

V-R-D

(A.5)

Finally, a mass balance around the base of the column yields:

- -

dt

- L I -

V-

(A.II)

and

dMcx 9
Vy a - Rx D - Dx D dt

while the associated component balance results in: (A.6)

a(MB: B)
.

LlX 1 - Vya - Bx B

(A.12)

dt

The corresponding equations for the top tray are:

a(mm.)
R - L m. (A.7)

dt

with the component balance given by:

Luyben (1990) determines that a distillation column model based upon these relationships requires a further four equations to be formulated if the system is to be specified completely. The required equations represent the controllers necessary to maintain the holdup and desired purities within the column.

d(M~ xn. r)
- R X D - L l c r X m . * VY,v.r_ I - V y I r ,it

(A.8)

You might also like