You are on page 1of 5

Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.

com Volume 1, Issue 3, September October 2012 ISSN 2278-6856

Kernel Ensemble Learning with Heterogeneous and Subspace Oriented Learners


Shuyi Chen1,* and Gang Zhang1,2
2 1 School of Automation, Guangdong University of Technology School of Information Science and Technology, SUN YAT-SEN University

* Corresponding author: Shuyi Chen.

Abstract: It is well accepted that combination of learners


can obtain better generalization ability than individual ones. Theoretical analysis indicated that learners of high accuracy and diversity yield relative good results. Towards this goal, we propose two ensemble strategies based on mapping the input space to kernel-as-similarity representation to obtain learners of both high accuracy and diversity simultaneously. The former strategy is to train different learning models by data new representation before ensemble, the latter adopts the same learning model training with different attributes subset to obtain difference. We also propose three attributes subset division strategies so as to get a comprehensive view of how attributes subset contribute to the effect of ensemble. The proposed algorithm is evaluated on UCI benchmark data repository to show its effectiveness.

Keywords: ensemble learning; kernel mapping; heterogeneous ensemble; subspace oriented learners; kernel-as-similarity

1. INTRODUCTION
In recent years, ensemble learning and learning with kernels are two active topics in machine learning. Ensemble learning tries to improve the generalization ability by combining several learners together. Ensemble learning has been proved to outperform than an individual learner in many machine learning tasks [4]. There were different ensemble strategies reported in previous machine learning literatures, such as ensemble of all learners [3], selective ensemble [2], weighted ensemble [3] and ensemble pruning [5], either of which plays an important role in certain areas and applications. It has been proved empirically and theoretically that ensemble learners can improve the overall accuracy. Accuracy and diversity of individual learners are two key factors that significantly affect the effectiveness of ensemble, which has been theoretically proved by G. Brown in [4]. Recently many ensemble methods are based on this result, i.e. adopting different methods to improve both accuracy and diversity of individual learners. The idea of these methods is mainly to obtain individual learners of better quality by weighting, sorting or directly applying an optimization procedure to get optimal parameters. The goal of our work is also to get individual Volume 1, Issue 3, September October 2012

learners of good accuracy and diversity. But we go along another way which is different from previous work. We put data points into different spaces spanned from different base, and then train individual learners by supervised learning algorithm. Training data from different spaces guarantees diversity between learners and supervised learning guarantees high accuracy of individual learners. Kernel learning method is one of the most important learning frameworks in machine learning, which has been widely used in numerous empirical and theoretical studies [6][7][8]. It provides a way to map data set from input space to feature space induced by kernel function. The mapping procedure is implicit, and the feature space is a higher dimensional space. The main aim of kernel mapping is to obtain the different function values based on the similarity of any two elements in a data set. In other words, we obtain high function value when the elements with slight similarity, on the contrary, we obtain low function value. So an effective kernel plays a crucial role in many kernel based machine learning techniques [3]. The work presented in this paper is motivated by [1], which proposed that ensemble learning with kernel mapping outperforms simple ensemble learning without kernel method. Going a step further, we propose a kernel ensemble learning method with two kinds of learners to establish a stronger learning method. Hybrid model was considered to be a good method to reduce error rate. In order to improve generalization ability of kernel learning and accuracy, we propose a learning framework named kernel ensemble learning (KEL), which means applying kernel-as-similarity strategy to map samples on some pre-set base samples by means of kernel function. Two ensemble strategies are introduced in our work. The first is Kernel Ensemble with Heterogeneous Learners (KEHL), aiming at improving the diversity of individual learners by training learners of different types with kernel-as-similarity training data representation. The second is Kernel Ensemble with Subspace oriented Learners (KESL), aiming at improving the diversity of individual learners by training learners of Page 172

Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 1, Issue 3, September October 2012 ISSN 2278-6856
the same model on different subspaces of the whole data set. In our setting subspace is referred to different projections to some attribute subsets. We propose three different attribute subset generation strategies and address some important issues related to individual learners. Our work is well motivated by the observation of high diversity of different kernel-as-similarity data space which was studied in many previous machine learning literatures. To the best of our knowledge, this is the first work to improve the diversity by kernel-as-similarity data representation. The proposed algorithm framework can be easily solved since it doesnt require solving an optimization problem, compared to recent weighted ensemble and ensemble pruning methods. The remainder of this paper is organized as follows: Section II presents the main algorithm of this paper, including kernel mapping algorithm, subspace generating strategies and ensemble methods. Section III reports the evaluation result with some baseline algorithms as comparison. Finally we conclude the paper in Section IV. our work. We describe the main steps of Kernel Ensemble with Heterogeneous Learners (KEHL) algorithm. As mentioned above, the motivation of KEHL is that heterogeneous learning models with different kernel mapping representation data would lead to great diversity while retaining their accuracy. We detail the step of KEHL algorithm as follows. 1. Step 1: Bootstrapping Here we have an idea about the input data; we can get a set with different data in slightly from the original dataset by bootstrapping times. Eq. (3) formulizes the definition of this procedure. (3) The bootstrap is invoked times so as to generate slightly different train set, marked as . 2. Step 2: Mapping with RBF Kernel After bootstrapping, we apply an RBF kernel function for mapping both training data and testing data to the base set. (4) where and , which refers to the mapping from training data to base set through RBF kernel function. Note that there are different ways to determine the base set, which will significantly affect the effectiveness of the ensemble according to how much information contains in the base set. 3. Step 3: Ensemble learning of Heterogeneous Learners We consider three kinds of learners for ensemble: Artificial Neural Network (ANN), Decision Tree (DT) and Support vector machine (SVM) algorithm. Formally we have: (5) (6) (7) The return values in (5) to (7) are the output of the heterogeneous learners, and majority voting procedure is used to determine the final output of ensemble. (8) Equation (8) details the majority voting procedure of outputs of three heterogeneous learners, we define (8), and the classification result of sample x can be calculated by it finally. 2.3KESL algorithm We describe the main steps of Kernel Ensemble with Subspace oriented Learners (KESL) algorithm. As mentioned above, the motivation of KESL is that training models with different attribute subsets of original data set would lead to great diversity while retaining their accuracy. To do this, we propose three strategies for feature subspace division. Random division We randomly select attributes each time and run the procedure times. Thus we get random subspace of Page 173

2. MAIN ALGORITHM
As mentioned above, the work of this paper is based on Q. Pan et al. [1]. To make the paper self-contained, we summarize the algorithm of kernel mapping presented in [1]. 2.1Review of Kernel Mapping Q. Pan et al. [1] proposed kernel mapping method to work out a kernel-as-similarity data representation and then apply to ensemble learning. Kernel-as-similarity representation is obtained by a kernel mapping procedure which maps each data point to a fixed dataset. Formally, let be a data set for training and for test. RBF kernel function is applied to calculate a similarity matrix and as Eq. (1) and (2) illustrates: (1) (2) Training data set and test data set are mapped to a space induced by RBF kernel. Then several models (both single learner and ensemble learners) are trained with the new data representation. They adopted bootstrapping on training data set to obtain slight difference base on mapping, resulting in slightly different learners. They reported that ensemble with kernel mapping significantly improved the final accuracy in most evaluation data sets from UCI repository. We argue that bootstrapping on the training data set would not yield much diversity if the amount of folds is limited. 2.2Review of Kernel Mapping For simplicity, we only consider binary classification in Volume 1, Issue 3, September October 2012

Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 1, Issue 3, September October 2012 ISSN 2278-6856
probably overlapping attributes. The set of all attributes is divided into subsets randomly (may be overlapped) and the number of attributes in each subset is equal to . Prior knowledge based division Prior knowledge based division strategy makes use of prior knowledge to divide attributes into several groups related to different topics. Often attribute groups of different application background can be separated naturally. Non-intersect subset This strategy is straightforward. The first attributes make up a subset, then the same as following attributes, and the rest can be done in the same manner. So the subsets of attributes can be described as Eq. (10): (10) 2.4Kernel mapping We consider kernel mapping in this paper by using the well-known RBF kernel function. Original data set in real world will be mapping into kernel space by this RBF kernel so that we can transform the non-linear problem to a linear separation problem. Eq. (11) shows the definition of RBF kernel function. (11) Given data set as input space, calculate the Euclidean distance between any two objects and , and is the width parameter which controls the radial range. 2.5Heterogeneous Learners with KEL - (KEHL) Qiang Pan et al. [1] proposed an ensemble learning method with homogeneous learners. In this paper, we consider a new ensemble learning method with heterogeneous learners. Here, we consider three well-studied learners, i.e. ANN, J48 and SVM. First of all, we use kernel function to get a kernel-as-similarity expression of input data, and then combine with three learners. The first one is kernel mapping and ANN (named KMA), and the second one is kernel mapping and J48 (the same as KMJ proposed in [1]), and the third one is kernel mapping and SVM (we called this algorithm KMS). At last, it is the combination of ensemble learning with these three algorithms. This whole method is called KEHL, which is short for Kernel Ensemble learning with Heterogeneous Learner; we used this method to get a stronger learner. 3.1Data set description and settings Our data sets were obtained from the famous UCI machine learning repository [15]. Ten data sets were used in our experiment. Table 1 shows more detail about the data sets. We only consider binary classification here, so we should do some setting for transform multi-class classification problem to binary classification problem. If a data set has more than two classes, we divide the data set into two classes, -1 or 1. If the classification label of a data set is not numerical value, we set it to numerical type. Table 1: Data set description Data Set Attr. Classes SPECTF Heart 44 2 Ionosphere 34 2 Wine 14 3 Wdbc 31 2 Dermatology 35 4

Instances 267 351 187 569 366

3.2 Evaluation results Each data set was divided into two parts randomly, training data set and testing data set. We set a scale ratio parameter p here to control the ratio. The value range of p is 0.1 to 0.9, step length is 0.1. All results of KEHL method and KESL method are listed in Table 2 and Table 3. They are obtained by the same experiment settings and same data sets. Here, we compare the average value of accuracy with different algorithm of each data set. Table 3 shows the comparing result, and Figure 5 is the histogram of the comparing result. Table 2: Performance of KEHL method Data Set KMJ KMA KMS KEHL SPECTF Heart Ionosphere Wine Wdbc Dermatolog y 82.3% 84.2% 90.2% 87.2% 79.4% 83.6% 86.2% 89.6% 85.4% 78.1% 81.9% 85.4% 90.5% 89.6% 79.4% 88.6% 86.0% 90.3% 90.4% 81.4%

3. EVALUATION
The experiments aim to evaluate the effectiveness of KEHL method and KESL method. And compare the result of ensemble learning and non-ensemble learning in each method. The description of Data set and experiment settings is given, and we also give the flowcharts of these two methods here.

Table 2 shows that the performance of KEHL is better than kernel method with only single learner, especially in these data sets, Ionosphere, Wdbc and Dermatology, the performance of KMA in these data sets are bad, and the accuracies of KMJ and KMS are not very high, but in KEHL algorithm, we obtained high accuracy of these data sets when combining KMJ, KMA and KMS with ensemble learning method, the voting procedure in ensemble learning can give a final result of these three algorithm, so the error rate can be reduced by ensemble learning in KEHL algorithm. Page 174

Volume 1, Issue 3, September October 2012

Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 1, Issue 3, September October 2012 ISSN 2278-6856 4. CONCLUSION
In this paper, we have proposed a kernel ensemble learning method with heterogeneous learners and subspace oriented learner. The main contribution of this paper is to prove that kernel ensemble learning (KEL) method is better than only using ensemble learning or only using kernel mapping, we proposed two combination methods with KEL to prove our idea, one is kernel ensemble learning method with heterogeneous learners (KEHL), the other one is kernel ensemble learning method with subspace oriented learners (KESL), KEHL focus on whether KEL is applicable to different learners and KESL consider the influence of different subsets on KEL with homogeneous learners. The result of experiment shows that KEHL method can increase the prediction accuracy by combining the prediction result of each member, and it has better performance than kernel mapping with single learner.

Figure 1 Evaluation result comparison(KEHL) Its visually shows the comparing result of each data set in Figure 1. Based on the above observations, our conclusion in KEHL method is that kernel ensemble learning with heterogeneous learners provides a more stronger learner by using ensemble learning method, the members of ensemble might be given a bad performance, but when we combine their predications by voting procedure, the committee can reduce the error rate, so KEHL has been proved that it has better performance than kernel mapping with only single learner. KESL is kernel ensemble learning with subspace oriented learner, attributes of each data set is divided into three subsets, and then combining the prediction result of these three subsets in ensemble learning, we focus on the influence of different subsets in prediction, and try to find out which portion of attributes impacts the prediction, it can be applied in attributes selection. KESL provides a good way to reduce the computational overhead. Table 3: Performance of KESL method Data Set S1 S2 S3 SPECTF Heart 88.6% 86.3% 89.2% Ionosphere Wine wdbc Dermatology 86.0% 90.3% 90.4% 81.4% 87.2% 89.8% 88.6% 83.6% 84.1% 87.7% 91.3% 80.4%

Acknowledgment
This work is supported by the 2012 College Student Career and Innovation Training Plan Project of Guangdong Prov. (No. 16).

References
[1] Q. Pan, G. Zhang, X.-Y. Zhang, Z.-J. Cen, Z.-M. Huang, and S.-Q. Chen. Ensemble learning with kernel mapping, 2011 International Conference of Soft Computing and Pattern Recognition, SoCPaR 2011, October 14, 2011 - October 16, 2011, IEEE Computer Society, Dalian, China, 2011, pp. 253257. [2] N.Li. Selective Ensemble under Regularization Framework, in Proceedings of the 8th IWMCS(MCS09), Reykjavik, Iceland, LNCS 5519, 2009, pp 293-303 [3] G.Brown,"Ensemble learning",Encyclopedia of Machine Learning, C.Sammut &G.I.webb(Eds.), Springer Press 2010 [4] Z.H.Zhou, J.Wu, and W.Tang. "Ensemble neural networks: Many could be better than all", Artificial Intelligence, 137(1-2):239- 263,2002 [5] G.Brown, J.L.Wyatt, P.Tino, "Managing diversity in regression ensemble", Journal of Machine Learning Research, pp. 1621-1650, 2005. [6] I.Partalas, Focused ensemble selection: A diversitybased method for greedy ensemble selection, in Proceeding of the 18th European Conference on Artificial Intelligence, pp. 117-121, 2008. [7] Z.Lu, Ensemble pruning via individual contribution ordering, in Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD10), pp. 871880,2010 Page 175

Table 3 shows the performance of KESL in three division strategies, and Figure 2 is the histogram of the comparing result.

Figure 2 Evaluation result comparison(KESL) Volume 1, Issue 3, September October 2012

Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 1, Issue 3, September October 2012 ISSN 2278-6856
[8] Z.H.Zhou, Research Article When Semi-Supervised Learning Meets Ensemble Learning, Front.Electr.Electron. Eng. China, 2010 [9] D.Rosenberg, A kernel for semi-supervised learning with multi-view point cloud regularization, IEEE Signal Processing Magazine, 2009 [10] J.Zhuang, Simple Non-Parametric Kernel Learning, in Proceedings of the 26th International Conference on Machine Learning, Montreal,Canada,2009. [11] A.Bordes, Fast kernel classifiers with online and active learning. Journal of Machine Learning Research, pp 6:1579 1619, 2005. [12] Z.Lu, Exploiting multiple classifier types with active learning, in Proceedings of 2009 Genetic and Evolutionary Computation Conference Montreal Canada. (poster) [13] M.Belkin, Manifold regularization: A geometric framework for learning from labeled and unlabeled examples, Journal of Machine Learning Research, pp. 2399-2434, 2006 [14] T.K.Ho, The random subspace method for constructing decision forests, IEEE transactions on pattern analysis and machine intelligence, 20(8): 832-844,1998 [15] A. Asuncion, D.Newman, UCI machine learning repository,2007

AUTHOR
Shu-yi Chen received his B.S. degrees in Network Engineering from Guangdong University of Technology in 2008 to 2012. He is an embedded software engineer who applies machine learning algorithm to try to understand how to make an embedded product more intelligent. His current research interests include embedded system development, Wireless sensor network, machine learning, and embedded machine learning approach. He has published several papers in international conferences and journals on his research fields. Gang Zhang is PhD candidate in the School of Information Science and Technology at SUN YAT-SEN University, China. He received his MSc Degree in Computer Software and Theory from SUN YAT-SEN University, China, in 2005. His current research interests include data mining, machine learning, and its applications to bioinformatics and Traditional Chinese Medicine. Now he is a lecturer in School of Automation, Guangdong University of Technology.

Volume 1, Issue 3, September October 2012

Page 176

You might also like