You are on page 1of 6

Seventh International Conference on Hybrid Intelligent Systems

Application of a Hybrid Classifier to the Recognition of Petrochemical Odors


E. M. J. Oliveira, P. G. Campos, T. B. Ludermir, F. A. T. de Carvalho, W. R. de Oliveira Center of Informatics, Federal University of Pernambuco, CP. 7851, 50740-540, PE, Brazil E-mail: {emjo, pgc, tbl, fatc}@cin.ufpe.br, wilson.rosa@gmail.com Abstract
Nowadays there are several data mining algorithms applied to the resolution of many different problems, such as the classification of patterns. However, when these algorithms are used separately to classify they usually present an inferior performance compared to the performance obtained by combined models. The Bagging and Boosting techniques combine models of the same kind in a competitive form, in other words, the output is generally provided by the winning classifier. Alternatively, Stacking usually combines different algorithms, constituting a hybrid model. Nevertheless, stacking has a high cost, due to the search for the best models that will be combined to solve a certain problem. Thus, we present a Hybrid Classifier (HC) to be applied to the recognition of gases derived from petrol at a lower cost and in a cooperative way. the cost of the acquisition is high in the combined model. The models built adequately by using Stacking tend to generate better results than Bagging and Boosting, but at a high cost. Aiming at obtaining a classifier that may provide better performance at a lower cost in its development, we presented a Hybrid Classifier (HC) to identify gases derived from petrol. The petrochemical composts are ethane, methane, butane, carbon monoxide (CO) and propane, which have the peculiarity that human beings do not feel their smell and the disadvantage that they may lead people to death. This work is divided into six sections. Sections 2 and 3 describe the HC. Section 4 presents details of the data used in the experiments. Section 5 concentrates details of accomplished experiments. Section 6 contains a summary and conclusions of this paper.

2. Hybrid Classifier
As we have already mentioned in the introduction, combined models normally present better results than an isolated model. In this context, three techniques stand out: Bagging, Boosting and Stacking [2][3][8][9][12]. Combining distinct models, by using Stacking generally leads to superior results compared to the ones obtained by the competitive junction of the same kind of models, by using Bagging or Boosting. However, the construction of hybrid models with Stacking is much more expensive, since it tries to find out which of the models is more adequate for the solution of a certain problem. Therefore, we will present in this section an algorithm for the construction of a cooperative hybrid classifier which is less expensive in computational effort, keeping a superior classification performance compared to the isolated models applied to the resolution of the same problem. As it can be seen ahead, the principle to obtain the HC is the divide-and-conquest one. The idea is to evaluate the database in question concerning the existence of at least a linearly separable class. In this

1. Introduction
In the literature there are several applications of data mining methods to many problems, such as the prediction of time series, association, clustering and classification of patterns [2][3][8][9][11][12]. In classification problem combined models generally present better performance than a single classifier [2]. Thus, it is justified, in practice, the importance of combining models. The Bagging and the Boosting techniques [3][8][9] present a competitive combination of models of the same kind. The response of the classifier constructed is provided through voting or through the average of outputs of the individual models. Another alternative is indicated by the Stacking technique [2][12], where different models are combined. In this way, a hybrid model can be obtained. However, instead of deciding for the most voted, this combined model finds out which is the most adequate algorithm for the problem to which it has been submitted. Note that in this way,

0-7695-2946-1/07 $25.00 2007 IEEE DOI 10.1109/HIS.2007.21

78

case, note that it is possible to divide the initial database into two other less complex and smaller databases. In other words, one database would be linearly separable and the other would have less quantity of patterns. This algorithm, described informally, is made of the following steps: 1. Detecting through a 3D graph whether the work database has any linearly separable class; 2. If so, making the substitution of the non-linearly separable classes of the output attribute for a standard value, for instance, OTHERS, by making use of a copy of this database; 3. With this new database, applying a classifier by using any supervised data mining algorithm to identify the linearly separable set (s) (classifier 1); 4. After that, obtaining a reduced database made of only non-linearly separable patterns of the original database; 5. Based on this reduced database, utilizing another classifier by using a supervised mining method for non-linearly separable data (classifier 2); 6. At the end of this process, joining these two classifiers, obtaining the Hybrid Classifier. This junction is in such a way that if the output of classifier 1 is OTHERS, classifier 2 will be responsible for the response of this hybrid classifier. If not, the final output will be provided by classifier 1 (Figure 1).

minimum of three classes and that at least two classes are not linearly separable. If not, it should evidently be used the traditional known methods.

3. Construction of a Hybrid Classifier


Based on the algorithm of the previous section, first it has to be analyzed the database adopted. This way, we intend to detect whether the database to be used satisfies the exigency to compensate this work of acquisition of a HC. That is, if the database has a minimum of three classes and that at least two classes are not linearly separable. The database used in this work has five classes. According to step 1, we can detect if there is any linearly separable class through a 3D graph of the database. To do so, since in the database used in this work there are eight input variables, we apply a technique to reduce the dimensionality, the Principal Components Analysis (PCA) [1]. Thus, we have obtained three new variables (PC1, PC2 and PC3) to represent the clean database (detailed on following section) in question. So, we could elaborate the desired 3D graph (Figure 2) by using the Minitab [10].

Figure 2. 3D graphic of the clean database Figure 1. Illustration of the join of the classifiers 1 and 2, components of the hybrid classifier

Observe that classifiers 1 and 2 are combined in a cooperative way, that is, the first classifier identifies the linearly separable class and the second one the non-linearly separable class, depending on the pattern provided at the input of the HC. Besides, with this algorithm it is possible to reduce the complexity of the acquisition of the HC, probably presenting better results than if we used a single classifier. It is valid to emphasize that this methodology is indicated in the cases in which the database has a

By analyzing Figure 2, it can be easily evidenced that the gas butane (patterns stand out in the 3D graph) is linearly separable from the other four gases, as a matter of fact. Besides, the gases CO, ethane, methane and propane are not linearly separable among themselves. Therefore, we can keep on developing the HC. According to step 2, in a copy of this clean database we have substituted all of the categories of the output attribute that were different from butane (the only linearly separable set of this database) for OTHERS. Going on, we have obtained classifier 1 with this new database by using Discriminating Analysis [1]. Now, in step 4, we use another copy of the clean database and remove all of the patterns whose output attributes

79

has a category equal to butane. This resulting database remained with 3860 patterns, a reduction of around 20% of the clean database. Afterwards, in the acquisition of classifier 2, we have chosen to apply a MLP network [5] to this reduced database. And, at the end, we made the junction of these two classifiers according to what is described in step 6 of this algorithm, obtaining a HC. The choice of Discriminating Analysis as classifier 1 of the HC is due to the fact that the database used is linearly separable and, then, it does not need a more robust method, but any another linear classifier can be used. On the other hand, the reason why a MLP network was used to be classifier 2 is due to the fact that the database to which it was applied is not linearly separable. Classifier 1 removed the easy patterns (linearly separable patterns) from the data set. So the MLP is trained based on the most difficult patterns avoiding the possible problem of degradation of the knowledge acquired from the easy patterns.

indications of the presence of outliers in this database, justifying a pre-processing stage [4].
Table 1. Distribution of the initial databases

Class Butane CO Ethane Methane Propane Total

# of Patterns 1086 1088 1088 1084 1086 5432

Proportion 19,99 % 20,03 % 20,03 % 19,96 % 19,99 % 100,00%

4.1. Pre-Processing
Nowadays, the real world application databases are highly exposed to noisy, absent and inconsistent data. Then, pre-processing the database is strongly recommended, improving the quality of these data and the necessary time to miner them [4]. Therefore, the main goal of this stage is to improve the quality of the database and, consequently, facilitate its mining and the acquisition of the HC.

4. The Database
The database has 8 (eight) real numerical input attributes, corresponding to the sensors (conducting polymers) which vary their behavior (electrical resistance) according to the presence of chemical substances in the air, and an output attribute, related to the type of gas (ethane, methane, butane, carbon monoxide or propane). The data set were obtained with nine data acquisitions for each one of the gases with an artificial nose [6], by recording the resistance value of each sensor at every twenty seconds during 40 minutes. In the acquisition phase, each sensor has obtained 120 resistance values for each gas. A pattern is a vector of eight elements representing the values of the resistances recorded by the sensor array. Thus, each acquisition contains 600 patterns, formed by 960 values of each one of the gases. Table 1 shows the initial distribution of the database. Before using that database, we created the graph Resistance of the Sensor versus Pattern to have a general view of such database (Figure 3). Note that in this way it is easier to verify if this database will need a pre-processing. The data visualization method, mainly in the cases in which the input variables are continuous, permits, in a rapid and simple way, the detection of inconsistent patterns concerning the other ones existent in the database per type, for instance [4] As matter of fact, looking at the graphs shown in Figure 3 (mainly concerning the DBS, ASA and PER sensors), it can be observed that there are strong

Figure 3. Resistance of the Sensor versus Pattern graphics of the initial database

In this stage the data was clean. During such cleaning, it has been detected 606 outliers through the visual inspection of the graphs shown in Figure 3 and through the technique of detection of outliers based on distance [4]. After the removal of these abnormal patterns, the clean database remained with 4826 stratified patterns, according to Table 2.

80

Table 2. Distribution of the clean database

according to Table 4. Thus, we have obtained a lesser variability clean database.


Table 3. Distribution of the clean database

Class Butane CO Ethane Methane Propane Total

# of Patterns 966 966 967 963 964 4826

Proportion 20,02 % 20,02 % 20,04 % 19,95 % 19,97 % 100,00%

Sensors Minimum Maximum DBS OSA NBS ASA PTSA CAS PER OSA2 3999,0 350,0 582,0 136,8 259,6 28600,0 30220,0 452,0

Mean

In Figure 4, we can visualize how the initial database became after the cleaning phase, for example, there is a massive difference in the scales give for the ASA sensor between Figures 3 (where the range is between 136 and 58800) and 4 (range between 136 and 155).

145000,0 91799,7 816,0 498,8 1060,0 737,3 58800,0 1701,9 367,4 297,4 123200,0 60995,2 362200,0 73516,8 1120,0 596,8

Standard Deviation 22707,6 123,4 124,7 7419,5 26,6 25245,4 59709,6 148,5

Table 4. Statistical description of the sensors of the clean database

4.2.Descriptive Statistics of the Initial and Clean Database


To better evidence the importance of the databasecleaning phase, we have shown in Tables 3 and 4 the minimum and maximum values, mean and the standard deviation of the electrical resistance in Ohms for each sensor of the initial and clean database.

Sensor s DBS OSA NBS ASA PTSA CAS PER OSA2

Minimu Maximu Mean m m 68600,0 145000,0 94884, 5 414,0 816,0 516,4 642,0 1060,0 753,8 136,8 155,6 141,7 277,8 367,4 301,2 35320,0 123200,0 64702, 3 36500,0 121000,0 64017, 7 496,0 1120,0 611,0

Standard Deviation 22178,6 119,7 122,6 5,7 25,8 24334,6 23721,6 151,4

5. Experiments
To measure the performance of the HC we compare the results of the HC, in the classification of gases, with the results of Logistic Regression technique (LR) [7], the MLP (Multi Layer Perceptron) Artificial Neural Network (ANN) with a hidden layer [5], the Rule Extraction (Prism) [12] and the Decision Tree (ID-3 and C4.5) [12]. Besides, to estimate the accuracy of these classifiers (both the isolated one and the HC) and find out what their precision in the future will be, for a set of data that had not been used in the training, we adopted the 10-fold cross-validation technique [4] in all of our experiments, which were repeated 10 times with the same configuration of the model and with the randomize of the patterns.

Figure 4. Resistance of the Sensor versus Pattern graphics of the clean database

By observing the values of the standard deviation of the ASA and PER sensors shown in Table 3, it can be evidenced that the removal of the outliers resulted in the reduction of these values in a significant way,

81

Table 5 shows the average percentage of classification errors obtained with the data mining methods mentioned, used separately, and the HC after the execution of 10 experiments by making use of the 10-fold cross-validation applied to the database in study.
Table 5. Average of the percentage errors obtained from the classifiers utilized

By analyzing Table 6, we conclude that the HC is statistically better than the other classifiers compared, with 95% of reliability. Finally, the error rate obtained with the proposed CH for this database is compatible with the error rate obtained by others authors [13] when they had used MLP with backpropagation but they obtained a smaller error rate when they used sophisticated hybrid systems.
Table 6. T-Student test for paired samples

# of fold 01 02 03 04 05 06 07 08 09 10 Mean Std. Dev

LR (%) 35,62 35,62 35,41 35,64 35,74 35,56 35,64 35,74 35,62 35,66 35,62 0,093 4

MLP (%) 15,10 15,66 12,03 13,34 11,00 12,47 9,36 13,38 13,42 13,94 12,97 1,863 9

Prism (%) 11,60 11,83 11,71 11,62 11,67 11,67 11,62 11,58 11,58 11,64 11,65 0,075 2

ID-3 (%) 9,20 9,18 9,22 9,14 9,18 9,12 9,24 9,20 9,24 9,24 9,20 0,041 9

C4.5 HC (%) (%) 9,37 2,14 9,32 2,05 9,34 2,34 9,32 2,59 9,30 2,59 9,28 1,93 9,26 2,76 9,41 1,86 9,39 2,70 9,45 2,61 9,34 2,36 0,060 0,3372 2

Compared Classifiers HC versus LR HC versus MLP HC versus PRISM HC versus ID-3 HC versus C4.5

Decision P-values 0,00 2,60 x 10-8 1,97 x 10-4 8,00 x 1014

3,24 x 10-7

Accept H1 Accept H1 Accept H1 Accept H1 Accept H1

Aiming at verifying the significance of the results obtained with this HC, compared to the ones obtained through the other data mining methods used separately, we have applied the t-Student test. In Table 6, we can see the results of the t-Student test for the comparison of the average of the paired up samples. In other words, we have been testing the alternative hypothesis (H1), that the average error of the HC is statistically better than the other classifiers studied and used alone with 95% of reliability. We show below the mathematical expression of the invalid hypothesis (H0) and the alternative one (H1): H0: 1 = 2 versus H1: 2 < 1

6. Conclusions
The problem of recognition of petrochemical odors demands a classifier with the least possible level of errors, since humans do not feel the smell of such substances found in nature and because they are lethal to man. An alternative to obtain a classifier to be applied to this problem with such characteristics is combining models. It is because combined models usually present better classification performance than an isolated model. Another relevant information is that combining distinct models, by using Stacking, normally lead to results that are superior to the ones obtained through the competitive junction of the same kind of models, by using Bagging or Boosting. However, the construction of hybrid models with Stacking is much more expensive, since it tries to find out which are the ideal models to be combined and applied to the problem that we want to solve. Thus, we present a cooperative way of integration of two supervised data mining algorithms, so that we can obtain a single hybrid classifier with high performance and simplified development to be applied to the recognition of gases derived from petrol. After 10 repetitions of the experiments with 10-fold cross-validation, by using some isolated classical

where: 1 is the result average of a traditional classifier; 2 is the result average of our HC. In the case where the p-value obtained through the test is less than 0,05 (p < 0,05) there are evidences to reject the invalid hypothesis and adopt the alternative hypothesis with 95% of reliability. The p-value represents the probability of committing error type 1, that is, rejecting the invalid hypothesis when it is true.

82

classifiers and an HC built by a linear classifier (Discriminating Analysis) and a more robust one (MLP network) with the clean database in question, we evidenced through the t-Student test that this HC has demonstrated performance statistically superior to the models used separately with 95% of reliability, as we have shown in the previous section. Besides, the average classification error of the best-isolated model reached around 9% while the HC got around 2%. In future works we will apply the HC algorithm to the solution of problems with databases that are similar to the one used in this article. Thus, we intend to verify the performance of the hybrid classifier with other databases. The new results will be compared to the results obtained with the database of gases derived from petrol. ACKNOWLEDGEMENTS The authors would like to thank CNPq, CAPES and FINEP (Brazilian research agencies) for their financial support.

[4] Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann, San Francisco (2001). [5] Haykin, S.: Neural Networks, A Comprehensive Foundation. 2nd ed. Prentice Hall (1999) [6] Ludermir, T. B., Yamazaki, A.., Zanchettin, Cleber . An Optimization Methodology for Neural Network Weights and Architectures. IEEE Transactions on Neural Networks, v. 17, n. 6, p. 1452-1459, 2006 [7] Hosmer, D. W., Lemeshow, S.: Applied Logistic Regression. John
Wiley & Sons (1989).

[8] Klivans, A. R., Servedio, R. A.: Boosting and Hard-Core Set Construction. In: Gentile, C. (ed.): Machine Learning, Vol. 51. Kluwer Academic Publishers, Netherlands (2003) 217238. [9] Long, P. M., Vega, V. B.: Boosting and Microarray Data. In: Sebastiani, P., Kohane, I. S., Ramoni, M. F. (eds.): Machine Learning, Vol. 52. Kluwer Academic Publishers, Netherlands (2003) 3144. [10] Minitab Inc.: Minitab Release 14 (http://www.minitab.com) [11] Oliveira, E. M. J.: Forecasting the Price of IBOVESPA Index Using Neural Networks and Estatistical. Master Thesis in Portuguese. Federal University of Pernambuco, Center of Computer Science, Recife (2001) [12] Witten, I. H., Frank, E.: Data Mining Practical Machine Learning
Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (1999).

References
[1] Duda, R. O., Hart, P. E., Stork, D. G.: Pattern Classification.2nd ed., John Wiley & Sons (2006). [2] Dzeroski, S., Zenko, B.: Is Combining Classifiers with Stacking
Better than Selecting the Best One? In: Giraud-Carrier, C., Vilalta, R., Brazdil, P. (eds.): Machine Learning, Vol. 54. Kluwer Academic Publishers, Netherlands (2004) 255273.

[13] Zanchettin, C., Ludermir, T. B.: Wavelet Filter for Noise Reduction and Signal Compression in an Artificial Nose. International Conference on Hybrid Intelligent System (2003) 907916.

[3] Grandvalet, Y.: Bagging Equalizes Influence. In: Schapire, R. (ed.): Machine Learning, Vol. 55. Kluwer Academic Publishers, Netherlands (2004) 251270

83

You might also like