You are on page 1of 9

176

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 2, NO. 3, AUGUST 2006

Combustion Efciency Optimization and Virtual Testing: A Data-Mining Approach


Andrew Kusiak, Member, IEEE, and Zhe Song
AbstractIn this paper, a data-mining approach is applied to optimize combustion efciency of a coal-red boiler. The combustion process is complex, nonlinear, and nonstationary. A virtual testing procedure is developed to validate the results produced by the optimization methods. The developed procedure quanties improvements in the combustion efciency without performing live testing, which is expensive and time consuming. The ideas introduced in this paper are illustrated with an industrial case study. Index TermsCombustion efciency, data mining, nonstationary process, process control, temporal data mining.

I. INTRODUCTION OWER industry is interested in optimizing combustion efciency as it improves fuel consumption and emissions. The existing approaches to optimization of combustion efciency can be grouped into three categories. The rst category includes analytical models based on thermodynamics and chemistry. The second one is that of soft computing, i.e., neural networks (NNs), fuzzy logic, and evolutionary computation. Among these techniques, NNs are most widely used in the power industry. The third class includes hybrid systems, combining analytical modeling with the soft-computing methods. Li et al. [12] developed an online search method for nding the optimal air/coal ratio based on a simple thermal efciency equation, which was applied to a control system to improve combustion efciency. Havlena et al. [9], [10] developed a MIMO predictive control model (assumed to be time invariant and mainly for controlling boiler pressure and air/fuel ratio) to improve boilers combustion efciency as well as reduce NOx emissions. Kuprianov [11] discussed optimization models by minimizing various types of costs (internal and external) related to boiler operations. Li et al. [19] studied control strategies for boiler steam pressure and power output loops to reduce harmful system vibrations. Chu et al. [15] used an NN to predict the performance index and some nonanalytical constraints, thus speeding up the trial-and-error process of nding the optimal operating points optimizing the boilers combustion process. Rusinowski et al. [17] focused on nding the optimal rate of travel of the grid and the height of the fuel layer. These are two controllable parameters neglected in most of the published research. Bche et al. [20] applied an evolutionary computation algorithm to nd optimal design of the burner reducing the

Manuscript received December 6, 2005; revised February 13, 2006; accepted March 7, 2006. This work was supported in part by the Iowa Energy Center under Grant no. 04-06. The authors are with the Intelligent Systems Laboratory, Department of Mechanical and Industrial Engineering, The University of Iowa, Iowa City, IA 52242-1527 USA (e-mail: andrew-kusiak@uiowa.edu). Digital Object Identier 10.1109/TII.2006.873598

emissions of NOx as well as the pressure uctuation of the ames. Wang et al. [21] applied a nave intelligent control algorithm to determine the best air supply for the boiler. Cass et al. [18] combined the NN and evolutionary computation techniques to determine the optimal fuel/air ratio. Booth and Roland [5] developed an NN model to optimize the boilers operations and thus reduced the emission of NOx and improved the boilers performance. Chong et al. [6] applied an NN model to identify the dynamic process of the nitrogen oxides and carbon monoxide emissions. Other applications of the soft-computing techniques in a power plant are discussed by Azid et al. [2], Chang and Hou [7], Kalogirou [16], and Tascillo et al. [24]. Building a comprehensive and accurate mathematical model of the industrial-scale combustion process is difcult. Therefore, most optimization methods and control schemes are reduced to a subset of parameters that impact combustion efciency, e.g., excess air ratio, boiler pressure. Another common characteristic of the research published in the literature is that the time-shifting nature of the combustion process is ignored in the optimization of control set points. Although an NN is a useful modeling technique, it suffers from slow training and poor interpretability of the results. Also, NNs are not particularly suited for real-time optimization of the temporal combustion process. Power plants routinely collect the boilers run-time data by numerous sensors. The sampling frequency may vary; however, recording the data at a minute-long intervals prevails. This large volume of data contains information of the boiler that can be analyzed with data-mining algorithms to optimize the combustion process as well as predict possible faults. Ogilvie et al. [13] applied an association-rule algorithm to mine relationships among parameters of a power plant. The inducted rules were intended to build the expert system. Burns et al. [4] used a genetic-wrapper approach to select a subset of parameters for mining boiler data to improve combustion efciency. In comparison to basic science models, data mining offers advantages in optimizing combustion efciency by considering many parameters and generating accurate results. Some datamining algorithms produce explicit knowledge that is understandable to a user. The temporal nature of the combustion process has not been fully investigated in the published literature. The underlying concepts (laws) governing combustion are changing in time, which makes any a priori control strategy inaccurate and inefcient, e.g., the importance of some combustion parameters is changing in time. This paper is focused on data-driven methods (clustering algorithms, temporal linear regression, decision-tree (DT) algorithms, and NNs) to optimize combustion efciency. In com-

1551-3203/$20.00 2006 IEEE

KUSIAK AND SONG: COMBUSTION EFFICIENCY OPTIMIZATION AND VIRTUAL TESTING

177

TABLE I SAMPLE BOILER DATA

parison to the approaches discussed in the published literature, the data-driven methods are easy to understand and implement, and they are able to handle temporal data. II. BACKGROUND AND PROBLEM DESCRIPTION A control and monitoring system continuously monitors and records real-time values of the boiler parameters, such as the feeders speed, fan speed, pressure, steam temperature, megawatt load, and so on. At the same time, the boilers efciency is computed by the predened equations and experimental data [14]. A small sample of the boilers data is shown in Table I. Some of the boiler parameters are controllable, e.g., the feeder speed, and some are not controllable, e.g., the outside air temperature. The goal of the research presented in this paper is to improve combustion efciency by adjusting controllable parameters in the presence of operational constraints. For example, the total megawatt load cannot exceed the demand. The research goal is accomplished with the data-driven methods that determine appropriate control settings improving combustion efciency. These control settings are discovered with data-mining algorithms. For example, consider a boiler with four feeders A, B, C, and D. The total speed of the four feeders determines the amount of combustion fuel. It is interesting to note that at some time periods, the increased speed of feeder A, and lowered speed of feeder B, and keeping the same average speed increases combustion efciency. There are numerous reasons for this phenomenon, such as the boiler installation, heat ball location, age of the boiler, and so on. III. GENERATION OF CONTROL SIGNATURES WITH -MEANS CLUSTERING ALGORITHM THE A clustering algorithm groups similar vectors of parameter values (i.e., collected at a particular instance) and determines patterns of interest. The vectors are clustered based on a similarity metric, e.g., the Euclidian distance measure. In this paper, a modied -means clustering algorithm is used to determine control signatures (here cluster centroids and related statistics) optimizing combustion efciency. The concept of control signature for decision rules extracted with a data-mining algorithm was introduced by Kusiak [22]. In this paper, control signatures are extracted from clusters, and they are essentially equivalent to cluster centroids generated by the following -means algorithm [27].

1) Given a training data set with data points in it, select points as initial cluster centroids. initial 2) Form clusters by assigning each data point to its closest centroid. 3) After all data points have been assigned, recalculate each clusters centroid by using the data points in the cluster. 4) Repeat Steps 2 and 3 until the centroids no longer move or the predened error is satised. 5) Output the nal centroids and the clusters. For example, when applying the -means algorithm [27] to the data in Table I, each row (excluding the time stamp and boiler efciency) is regarded as a ve-dimensional point. The is calculated by the equation centroid of the cluster , where is the number of points assigned to cluster . The -means algorithm maximizes the similarity within each cluster, and its output is used to construct control signatures. The continuously recorded combustion parameter values , can be represented as a set where is the vector of parameter values recorded at time . The vector represents the values of parameters such as temperature, ow, pressure, and so on. Of parameters, parameters are controllable, and all parameters are noncontrollable. is the value of the rst is the value of controllable parameter at time . The the rst noncontrollable parameter at time . The parameter is the combustion efciency at time . The basic idea behind the clustering-based control scheme is , discussed next. Given a future data point search the set for the most simwith the corresponding efciency ilar (nearest) point greater than the efciency . The distance (or similarity) beand is computed using the tween the two data points , weighted Euclidean distance is the weight of parameter . By changing the values where of the controllable parameters to the settings corresponding to , the boiler combustion efciency improves from to . However, the large volume of combustion data makes this nave search mechanism impractical for real-time optimization of the boiler. Also, there are numerous data points that are similar and statistically identical due to random noise in the historical data in the entire database set. Searching for the nearest point in a much smaller becomes searching the nearest centroid centroids database. In this paper, control signatures are constructed from centroids and used for real-time optimization of the combustion process. A control signature involves a set of values of controllable and noncontrollable parameters, the corresponding combustion efciency, and the related statistics, such as support (i.e., the number of points in a cluster) and condence (i.e., the variation of the data points in a cluster). The control signature is expressed as

(1)

178

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 2, NO. 3, AUGUST 2006

TABLE II LEVELS OF BOILER EFFICIENCY

is the value of the rst controllable parameter, and is the value of the rst noncontrollable parameter. The support and condence impact the decision to update the value of controllable parameters according to the control signature. The control signatures extracted from the historical data are stored in a knowledge base. Thus, searching this knowledge base, rather than processing the original data, is relatively easy. The proposed approach for efciency optimization adjusts the controllable parameters of the boiler based on the control signatures stored in the knowledge base. Since the combustion efciency corresponding to each control data point is continuous, it is more convenient to discretize the continuous efciency values into categorical values. By categorizing the continuous boiler efciency values into levels (inare tervals), all the points in the set assigned discrete values. Table II shows the example of boiler efciency discretized into ten levels (intervals), where LT_86 stands for Lower Than 86%, 86_86.25 means the efciency is in the interval [86%, 86.25%), and HT_88 denotes efciency Higher Than 88%. For simplicity, the ten intervals are represented by letters from to . corresponding Using the efciency levels, the efciency becomes categorical (the efciency level at to the data point is also cattime ). Thus, of the th control signature egorical. If for a run-time data point of the boiler with the with efciency efciency , a nearest control signature can be determined by calculating the weighted Euclidean distance . Then, changing , the desired the controllable parameters to the settings in boiler efciency becomes . Fig. 1 shows that for each efciency level, there is a set of clusters (each with a certain number of points). The improvement of combustion efciency from a lower level to a higher level reduces to nding a nearest cluster with a higher efciency level. Using the above-discussed approach, control signatures are built in two phases: an ofine learning mode and online learning mode. A. Ofine Learning Mode In this mode, control signatures are constructed from the historical data. The ofine learning involves the following four basic steps. 1) Preprocess the raw data points by denoising, scaling, removing missing values, and discretizing the continuous efciency values. 2) Group the preprocessed data points into ten data sets (ten efciency levels) based on the boiler efciency level associated with each data point. 3) For each of the ten data sets of Step 2, apply the -means algorithm to generate a set of centroids. The number can

where

Fig. 1. Two-dimensional view of clusters for different efciency levels.

be determined based on the number of data points at the corresponding efciency level, i.e., 5% of the data points. Also, other metrics can be considered, e.g., variance. 4) Save each computed control signature in the knowledge base, provided that it is different from the control signatures stored in the knowledge base. The degree of being different is determined by a similarity threshold. In Step 3 of the above algorithm, the -means algorithm is applied to each of the ten data sets. For example, consider the efciency level with 1000 data points in the historical data set. Since every point has the same efciency level, the efciency value can be removed from the vector; thus, . The 50 nal centroids and clusters will be created by the -means algorithm of Section III, where the th nal centroid . For each cluster, the following statistics are computed: standard deviation, mean value, and the number of instances in the cluster. The smaller value of standard deviation indicates high condence. The cluster centroid is becoming the control signature: , the number of control signatures for efciency level is 50, which is equal to the number of centroids. B. Online Learning Mode In this mode, control signatures can be collected and updated continuously while optimizing the combustion efciency. The boilers run-time data are collected and preprocessed. For a selected run-time control setting, the knowledge base is searched for the nearest similar signature at the same efciency level. If the nearest signature is found, the corresponding signature statistics, such as standard deviation and number of instances in the cluster, are updated. If a signature leading to improved efciency level is not found in the knowledge base, the signature reecting the current boiler status becomes a new control signature to be stored in the knowledge base. IV. CONTROL STRATEGIES Control signatures that are similar to the current boiler setting are retrieved from the knowledge base, and the higher efciency level signature is selected. The retrieved control signature denes new settings for the directly controllable boiler

KUSIAK AND SONG: COMBUSTION EFFICIENCY OPTIMIZATION AND VIRTUAL TESTING

179

Fig. 2. Distance between a control signature and the current parameter settings. Fig. 3. Concept of virtual testing.

parameters, thus leading to higher efciency (see Fig. 2). Consider the boiler efciency of 86_86.25 measured during the real-time operation of the boiler, To increase it to the next level, that is similar to curi.e., 86.25_86.5, a control signature rent boiler status and the control settings is retrieved. Since the boiler data point and the control signature include controllable and noncontrollable parameters, their similarity can be evaluated using a distance metric. The distance could be computed based on the noncontrollable parameters only or all parameters. A small distance between controllable parameters implies that efciency can be improved with small adjustments of the controllable parameters. Three strategies for selecting the desired control signature are dened: conservative, neutral, and aggressive. The three control strategies differ in the type of parameters used to compute similarity (distance) between the current parameter settings and the control signatures. The conservative strategy involves selecting control signatures based on the distance computed on all parameters (controllable and noncontrollable) and the support and condence of the control signature. If a strong (i.e., high support and condence) control signature is not found, the current settings of the boiler controllable parameters are not modied. The threshold for the strength is dened for each specic application. In the neutral strategy, the distance is computed using all parameters, and the desired efciency level is set based on current boiler status. The aggressive strategy considers noncontrollable parameters only to compute the distance. This strategy assumes that the values of all controllable parameters can be adjusted. If the nearest control signature in terms of the noncontrollable distance is found (see Fig. 2), the controllable parameters can be adjusted to the corresponding values. To test the effectiveness of the three control strategies, a virtual testing approach was developed. The developed virtual testing approach and computational experiments with the three control strategies are discussed in the next section. V. VIRTUAL TESTING To validate the effectiveness of a new control strategy, testing, ideally on an operating boiler, is needed. Performing life tests would be disruptive to the boiler operations and would require numerous approvals due to safety and environmental regulations. In this paper, a virtual test is proposed in place of lifeboiler testing. The basic idea behind virtual testing is to develop a valid computational model of the boiler to test the proposed

control signatures. The virtual model is derived from the historical data, and it simulates the real-time boiler process. It can be used to test the proposed control strategies before its industrial implementation. Fig. 3 illustrates the basic steps of virtual testing. As the combustion process is temporal, the system function S of boiler efciency is changing in time. The basic concept of virtual testing is explained with an example. Consider data set 1 collected over a seven-day period. The boiler system model S built on data set 1 is assumed to be valid for the seven days. There are several ways to identify the system S by using the data. In this paper, a DT algorithm and an NN are used. Although an NN is not the most appropriate for real-time combustion control, it is a good approximator of high-dimensional nonlinear functions. Once the system model is built, it represents a virtual boiler for the seven-day period. The input to the system model S is the current status (all the parameters except the boiler efciency) of the boiler; the output is the boiler efciency. Control signatures are extracted from data set 2, i.e., the knowledge for increasing the boilers efciency is learned from the historical data, day 1 to day 6. Then, data set 3 represents the boilers running status at day 7. Assume that at day 7, the knowledge (control signatures) learned from the previous six-day data set is available. Based on these signatures, at day 7, the run-time boiler controllable parameters are adjusted so that the efciency is improved. Since the boiler run-time data are collected every minute, the data set 3 includes 1440 instances representing the boiler status at minute-long intervals. This leads to 1440 opportunities to adjust the controllable parameters according to the control signatures derived from the previous six-day data set. The controllable parameters in the 1440 instances can be changed based on the signatures generated according to the control strategies discussed in the previous section. Note that there is no need to change the controllable parameters every minute. Identifying any one of the 1440 over a day period of opportunities would increase the boiler efciency, for the signatures pointing to lower efciency would not be implemented. Also, using a larger data set (more than six days) for the knowledge extraction will result in more control signatures, thus producing higher efciency gains over the future control horizon. Will the boiler efciency increase after applying the control signatures learned from historical data? The most obvious way is to experiment with the actual boiler, which is not practical. Since a model of the boiler for the seven-day period has

180

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 2, NO. 3, AUGUST 2006

been built, the 1440 instances controlled by the signatures can be tested. The model S predicts the efciency with altered parameter values. These are compared against the actual historic values. The difference between the two efciencies is used to validate the control strategies. The results from virtual testing based on the historical data have shown that the boiler efciency can be improved. VI. REAL-TIME OPTIMIZATION OF COMBUSTION EFFICIENCY Although the boiler efciency can be improved based on the historical data, discovery of control regions that have not been reected in the historical data are of interest. Such signatures could lead to even higher efciency levels, i.e., some previously unseen control signatures could improve combustion efciency. In fact, such signatures are likely to exist, as the boiler tends to operate in a specic space dened by the process parameters. For example, in this paper, 18 directly controllable (impact) parameters are considered, each with its own minimum and maximum value. Some parameters could be bounded by constraints, e.g., the speed of all feeders could not exceed an upper limit established at the boiler system design stage. The variety of parameters, constraints, and the fact that data are temporal make the space of control signatures large, and therefore, some regions of that space could have not been seen during the standard boiler operating process. One of the goals of the research reported in this paper is to search the unknown control regions that have not been represented in the historical data set. These previously unknown control settings could maximize the boiler efciency beyond the levels reported in the historical data. The underlying problem is real-time maximization of boiler efciency with changing objective function (2) and operating constraints (3), (4)

(2) s.t. (3) (4)

is the boiler efciency at time In the model (2)(4), , and is the system efciency function (model) at time (it is time dependent). Boiler efciency is a function of con(e.g., feeder speed, pre-coil tempertrollable parameters (e.g., outside temperaature), uncontrollable parameters ture, inlet circulating water temperature), and the previous pe. The constraints imposed on process pariod efciency rameters are represented by (3) and (4). For example, the feeder speed constraint imposes a lower (0, the feeder is off) and upper limit (e.g., 10.25). Since the combustion process is nonstationary, it is difcult to obtain an accurate expression of the objective function (2). One way to approximate the objective function is to develop a temporal linear regression model. The basic idea of temporal linear regression is to use historical data (e.g., from the time to ) and build a linear regression model that period can be assumed to be valid and represent the objective function

at time . It is also assumed that the linear regression model , where is greater than or equal to 1; howis valid at time ever, it is smaller than some xed value, e.g., 5 min later. The linear regression model generates large errors for the objective function. Then, the linear regression model needs to be recom. puted using the data from time interval By approximating the objective function (2) with a linear regression model, the real-time optimization problem (2)(4) becomes a linear programming problem. The solution to the linear programming problem becomes the optimal control setting maximizing the boiler efciency at time . The optimal control setting is usually on the boundary of the solution space. When implemented, it should lead to increased efciency. In summary, the least that can be accomplished with a piecewise linear approximation function of the nonlinear and nonstationary combustion process over a short time period is the direction leading to higher combustion efciency. Computational experiments have been performed to simulate the concept of temporal linear regression using the model (2)(4). In simulating the scenario of implementing a new control signature at time , a time delay (several minutes) needs to be considered in seeing the impact on the boiler efciency. The structure of the training and testing data set is described next. The historical data set from the time peis used to train an NN. represents the values of riod all the recorded boiler parameters, except for boiler efciency is the boiler efciency at time 0, and is at time 1. and are the inputs the boiler efciency at time 1. to the NN. The reason for training an NN is to predict boiler at time , to use given the boiler status efciency at time and efciency at time . The NN is (the time unit trained with one day data, i.e., . is a minute). This NN models the boiler in The NN model is used to predict the boiler efciency after the controllable parameters are updated according to the optimal control settings generated from the linear programming model. With all controllable and noncontrollable parameters remaining stable and unchanged, the boiler efciency predicted by the NN is returned back as an input to the NN predicting boiler efciency at the next level. The entire process lasts several minutes so that a trend of boiler efciency can be observed. Two experiments were performed with the results shown in , a linear reTables III and IV. For example, at time gression model of the boiler efciency is built based on the pre. The boiler efciency model can vious 60 minutes of data be written as

where the parameters through are controllable, and they are the variables in the linear programming model. To maximize the boiler efciency at the 61st minute, controllable parameters have to be adjusted to their maximum or minimum possible values based on the coefcients to and the operating constraints dened in (3) and (4).

KUSIAK AND SONG: COMBUSTION EFFICIENCY OPTIMIZATION AND VIRTUAL TESTING

181

TABLE III TESTING RESULTS AT TIME  = 61 AND THE FOLLOWING 5-MIN TREND

TABLE IV TESTING RESULTS AT TIME  = 196 AND THE FOLLOWING 5-MINTREND Fig. 5. Boiler efciency responses to the optimized control settings.

TABLE V EXPERIMENT DATA SETS DESCRIPTION

TABLE VI PREDICTION ACCURACY OF THE UNCONTROLLED DATA SET 3

Fig. 4. Boiler efciency response to the optimized control settings.

Solving the linear programming model (2)(4) produces an optimal control signature. Then, the values of the boiler controlminute are replaced with the optimal lable parameters at control settings. The NN model predicts the boiler efciency at . It can be seen from Table III that the boiler eftime to 86.66% at time ciency increases from 86.04% at time after the controllable parameters have been reset. Orig. At time inally, the boiler efciency was 86.01% at time , the boiler status and the predicted boiler efare used to predict the boiler efciency ciency , which is 86.75% (see Table III). Without changing at , the original boiler efcontrollable parameters at time ciency at time would be 86.01%. The efciency was increasing until reaching a plateau at minute (see Fig. 4). Note that here, the assumption was made that the uncontrollable parameters (e.g., outside temperature) did not change in the 5-min period. Table IV and Fig. 5 show the results of another experiment of temporal linear regression and the boiler efciency response. VII. INDUSTRIAL CASE STUDY Numerous computational experiments have been performed with industrial data collected from a boiler operated by an energy company. The original data set contained 77 parameters (including the time stamp and the boiler efciency). In this boiler, 18 parameters are directly controllable. The continuous boiler efciency values are discretized based on the intervals shown in Table II. The weight for each parameter is assumed to be one. To validate the concepts presented in Sections III and IV, the virtual testing approach of Section V is applied to a randomly

selected seven-day data set. Control signatures extracted from the rst six-day data set are used to control the boiler at day 7. For each minute of day 7, the nearest control signature is determined based on the three different control strategies and used to modify the 18 controllable parameters. The controlled data set of day 7 is evaluated by the models constructed from the seven-day data set. In this paper, a DT model and an NN model were used. Five seven-day data sets are selected for experiments. The desired efciency levels for the neutral and aggressive control strategy are listed in Table V. There are two criteria to select the desired efciency level. One is the desired efciency levels should be higher than most boiler data points efciency levels at day 7 data set 3. The other is there is a certain number of control signatures associated with this desired efciency level. A DT algorithm [23] was applied to build a predictive model based on the seven-day data set (data set 1). The results in Table VI show that the boiler model built (experiment 1) on the data set 1 is highly accurate (98.75%) in predicting the boilers efciency of the day 7 (original data set 3 unmodied by control signatures). The same DT model was applied to predict the boiler efciency for data set 3 controlled by the control signatures extracted from the previous six-day data set. Note that in experiment 2, the prediction accuracy of the NN is lower than in other cases. The reason for that is the failure of the boilers computer system that was conrmed later by the domain experts. Based on the ideas of virtual testing, ve experiments are done with three different control strategies. DT and NN are used to model the boiler and predict whether the controlled data

182

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 2, NO. 3, AUGUST 2006

TABLE VII NUMBER OF INCREASES IN BOILER EFFICIENCY FOR DATA SET 3

Fig. 6. PIIE for different control strategies based on ve experiments. TABLE VIII NUMBER OF DECREASES OF BOILER EFFICIENCY FOR DATA SET 3

Fig. 7. PIDE for different control strategies based on ve experiments.

set 3 (total 1440 instances or data points) will have increased boiler efciency. Table VII shows the number of instances with increased boiler efciency predicted by DT and NN over the ve experiments. To be conservative about the efciency increase, we take a minimum value of the predictions from DT and NN. For example, in experiment 1, the DT predicts there are 1042 instances with increased boiler efciency, while the NN predicts that the number is 897; 897 is taken as the increased number of instances. From Table VII, neutral control strategy has the largest average increased number of instances; conservative strategy has the smallest number of increase. Neutral control strategy performs a little better than the aggressive strategy. Table VIII shows the number of instances with decreased boiler efciency predicted by DT and NN after applying three control strategies on data set 3. To be pessimistic about the efciency decrease, we take a maximum value of the predictions from DT and NN. For example, in experiment 1, the DT predicts there are 42 instances with decreased boiler efciency, while the NN predicts the number is 98; 98 is taken as the decreased number of instances. From Table VIII, conservative control strategy has the smallest average decreased number of instances; aggressive strategy has the largest number of decrease. Neutral control strategy performs a little better than the aggressive strategy. It is obvious that the neutral and aggressive control strategies perform better than the conservative control strategy in terms of increasing the boiler efciency. The computational results in Tables VII and VIII have conrmed that the neutral and conservative control strategies provide more stable results (smaller number of decreases) than the aggressive one. This can be explained with the larger number of parameters used to compute distances by the two strategies. Two metrics are formulated to evaluate performance of the control strategies. One is the total percentage of instances with improved efciency (PIIE). The other is the total percentage of

instances with decreased efciency (PIDE). Control strategies with high PIIE value and low PIDE value are preferred. The metrics are dened next. PIIE (Prediction accuracy of the boiler on the original data set 3) (The total number of instances predicted with higher efciency levels in the controlled data set 3)/(The total number of instances in data set 3). (Prediction accuracy of the boiler model on the original data set 3) (The total number of instances predicted with lower efciency levels in the controlled data set 3)/(The total number of instances in data set 3).

PIDE

PIIE and PIDE can be computed from Tables VII and VIII (Min and Max columns). Figs. 6 and 7 show the computed values PIIE and PIDE for the ve experiments. Note that the models prediction accuracy is determined by whether DT or NN is selected based on Min or Max functions. For example, the PIIE for experiment 1 is computed from . The PIDE for experiment 5 is computed . as It is also interesting to observe the average efciency increase ) by or decrease (measured in levels, where level the three control strategies. Figs. 8 and 9 show that neutral and aggressive strategies perform similarly in terms of average efciency improvement; however, they perform better than the conservative strategy. However, the conservative strategy outperforms the other two strategies in terms of average decrease in efciency. More details are included in Tables IXXI. In a real-time implementation of the concept proposed in this paper, a tandem efciency monitoring system needs to be developed to disallow changes of controllable parameters once a bad control signature (leading to decreased efciency) has been retrieved.

KUSIAK AND SONG: COMBUSTION EFFICIENCY OPTIMIZATION AND VIRTUAL TESTING

183

in Section VI. In some cases, the data from a 60-min interval are not sufcient to build a linear regression model due to an emerging trend in the data. In such cases, data collected over a 120-min period can be used. A decrease in combustion efciency after the control parameters have been adjusted would signal that the temporal linear regression model is not correct. Then, the controllable parameters should be adjusted back to the original values to prevent further decrease of the efciency.
Fig. 8. Average increase of efciency for different control strategies.

VIII. CONCLUSION In this paper, different data mining techniques were applied to improve combustion efciency. A clustering algorithm extracted control signatures from the boiler historical data. When implemented, the control signatures act as a digital boiler operator optimizing boiler efciency. Temporal linear regression-formed control signatures were not reected in the historical data. Often, the regression-determined signatures lie on the boundary of the control region. The DT algorithm and an NN were used to validate the efciency gains offered by the derived control signatures. A new virtual test was developed in this paper. Three different strategies for the selection of control signatures were tested. The future research will focus on the development of an algorithm for dynamic generation of parameter weights. This research has shown that the importance of parameters is changing over time. It is believed that the weights could further improve efciency of the combustion process. The ideas introduced in this paper were applied to an industrial data set. REFERENCES

Fig. 9. Average decrease of efciency for different control strategies.

TABLE IX PERCENTAGE (RELATIVE NUMBER OF INSTANCES) AND AVERAGE (IN LEVELS) INCREASE AND DECREASE IN EFFICIENCY (NEUTRAL CONTROL STRATEGY)

TABLE X PERCENTAGE (RELATIVE NUMBER OF INSTANCES) AND AVERAGE (IN LEVELS) INCREASE AND DECREASE IN EFFICIENCY (AGGRESSIVE CONTROL STRATEGY)

TABLE XI PERCENTAGE (RELATIVE NUMBER OF INSTANCES) AND AVERAGE (IN LEVELS) INCREASE AND DECREASE IN EFFICIENCY (CONSERVATIVE CONTROL STRATEGY)

The experiments to determine unknown control signatures with temporal linear regression validated the concept discussed

[1] A. M. Annaswamy and A. F. Ghoniem, Active control of combustion instability: theory and practice, IEEE Control Syst., vol. 22, no. 6, pp. 3754, Dec. 2002. [2] I. A. Azid, Z. M. Ripin, M. S. Aris, A. L. Ahmad, K. N. Seetharamu, and R. M. Yusoff, Predicting combined-cycle natural gas power plant emissions by using articial neural networks, in Proc. IEEE TENCON: Intelligent Systems Technologies New Millennium, Kuala Lumpur, Malaysia, 2000, pp. 517512. [3] S. S. Anand and A. G. Buchner, Decision Support Using Data Mining. London, U.K.: Financial Times Pitman, 1998. [4] A. Burns, A. Kusiak, and T. Letsche, Mining transformed data sets, in Knowledge-Based Intelligent Information and Engineering Systems, R. Khosla, R. J. Howlett, and L. C. Jain, Eds. Heidelberg, Germany: Springer-Verlag, 2004, vol. I, LNAI 3213, pp. 148154. [5] R. C. Booth and W. B. Roland, Neural network-based combustion optimization reduces NOx emissions while improving performance, in Proc. IEEE Industry Applications Dynamic Modeling Control Applications Industry Workshop, 1998, pp. 16. [6] A. Z. S. Chong, S. J. Wilcox, and J. Ward, Neural network models of the combustion derivatives emanating from a chain grate stoker red boiler plant, in Proc. Inst. Elect. Eng. Seminar Advanced Sensors Instrumentation Systems Combustion Processes, 2000, pp. 6/16/4. [7] P. S. Chang and H. S. Hou, A fast neural network learning algorithm and its application, in Proc. IEEE 29th Southeastern Symp.Systems Theory, Cookeville, TN, 1997, pp. 206210. [8] A. Figueroa, B. A. Barna, and A. Kursenhauser, Modeling and optimization of steam generating equipment, in Proc. 32nd Intersociety Energy Conversion Engineering Conf., Honolulu, HI, 1997, vol. 3, pp. 16001605. [9] V. Havlena, J. Findejs, and D. Pachner, Combustion optimization with inferential sensing, in Proc. American Control Conf., Anchorage, AK, 2002, pp. 38903895. [10] V. Havlena and J. Findejs, Application of model predictive control to advanced combustion control, Control Eng. Practice, vol. 13, no. 6, pp. 671680, 2005.

184

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 2, NO. 3, AUGUST 2006

[11] V. I. Kuprianov, Applications of a cost-based method of excess air optimization for the improvement of thermal efciency and environmental performance of steam boilers, Renew. Sustain. Energy Rev., vol. 9, no. 5, pp. 474498, 2005. [12] J. Q. Li, J. Z. Liu, Y. G. Niu, and C. L. Niu, Online self-optimizing control of coal-red boiler combustion system, in Proc. IEEE TENCON, Chiang Mai, Thailand, 2004, 2004, pp. 589592. [13] T. Ogilvie, E. Swidenbank, and B. W. Hogg, Use of data mining techniques in the performance monitoring and optimization of a thermal power plant, in Proc. Inst. Elect. Eng. Colloq. Knowledge Discovery Data Mining, 1998, pp. 7/17/4. [14] H. Taplin, Combustion Efciency Tables. Lilburn, GA: Fairmont Press, 1991. [15] J. Z. Chu, S. S. Shieh, S. S. Jang, C. I. Chien, H. P. Wan, and H. H. Ko, Constrained optimization of combustion in a simulated coal-red boiler using articial neural network model and information analysis, FUEL, vol. 82, no. 6, pp. 693703, 2003. [16] S. A. Kalogirou, Articial intelligence for the modeling and control of combustion processes: A review, Progr. Energy Combust. Sci., vol. 29, no. 6, pp. 515566, 2003. [17] H. Rusinowski, M. Szega, A. Szlk, and R. Wilk, Methods of choosing the optimal parameters for solid fuel combustion in stoker-red boilers, Energy Convers. Manage., vol. 43, no. 912, pp. 13631375, 2002. [18] R. Cass and B. Radl, Adaptive process optimization using functionallink networks and evolutionary optimization, Control Eng. Pract., vol. 4, no. 11, pp. 15791584, 1997. [19] S. Y. Li, H. B. Liu, W. J. Cai, Y. C. Soh, and L. H. Xie, A new coordinated control strategy for boiler-turbine system of coal-red power plant, IEEE Trans. Control Syst. Technol., vol. 13, no. 6, pp. 943954, Nov. 2005. [20] D. Bche, P. Stoll, R. Dornberger, and P. Koumoutsakos, Multiobjective evolutionary algorithm for the optimization of noisy combustion processes, IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 32, no. 4, pp. 460473, Nov. 2002. [21] S. Y. Wang, G. L. Bai, Y. Cao, and B. Li, An algorithm of simulating human intelligent control for combustion system, in Proc. IEEE Int. Conf. Intelligent Processing Systems, Beijing, China, 1997, pp. 768771. [22] A. Kusiak, A data mining approach for generation of control signatures, ASME Trans.: J. Manuf. Sci. Eng., vol. 124, no. 4, pp. 923926, 2002. [23] J. R. Quinlan, Induction of decision trees, Mach. Learn., vol. 1, no. 1, pp. 81106, 1986.

[24] A. Tascillo, C. Gearhart, and J. Fridrich, Nonlinear forecasting with wavelet neural networks, in Proc. IEEE Int. Conf. Systems, Man, Cybernetics, Orlando, FL, 1997, pp. 11111116. [25] I. H. Witten and F. Eibe, Data Mining: Practical Machine Learning Tools and Techniques, 2nd ed. San Mateo, CA: Morgan Kaufmann, 2005. [26] M. J. A. Berry and G. S. Linoff, Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management. New York: Wiley, 2004. [27] J. B. MacQueen, Some methods for classication and analysis of multivariate observations, Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley, CA, Univ. California Press, 1967, vol. 1, pp. 281297. [28] D. E. Seborg, T. F. Edgar, and D. A. Mellichamp, Process Dynamics and Control, 2nd ed. New York: Wiley, 2003. Andrew Kusiak (M89) is a Professor in the Department of Mechanical and Industrial Engineering, University of Iowa, Iowa City. He is interested in applications of computational intelligence in automation, energy, manufacturing, product development, and healthcare. He has published numerous books and technical papers in journals sponsored by professional societies, such as AAAI, ASME, IEEE, IIE, ESOR, IFIP, IFAC, INFORMS, ISPE, and SME. He speaks frequently at international meetings, conducts professional seminars, and consults for industrial corporations. Dr. Kusiak has served on editorial boards of over thirty journals. He is an IIE Fellow and the Editor-in-Chief of the Journal of Intelligent Manufacturing.

Zhe Song received the M.S. and B.S. degrees from the China University of Petroleum, Beijing, China, in 2001 and 2004, respectively. He is currently pursuing the Ph.D. degree in industrial engineering at the University of Iowa, Iowa City. His research concentrates on data mining, computational intelligence, optimization, and their applications in process and discrete manufacturing industry. He is a member of the Intelligent Systems Laboratory.

You might also like