Professional Documents
Culture Documents
601
n
the negative class. We denote u+ ki is the degree of member-
ship of xi in the cluster xk+ , the µki+ is defined as follows ∑ αi sgn(µi+ − µi− ) = 0 (14)
i=1
1 xi −x̄
+ ∂ L(w, b, α, β , ξi+ , ξi− )
µki+ =e −2( σ k )2 k = 1, 2....., l. (7) =0→
∂α
n
Then define the membership of xi in the class{yi = +1},
µi+ as
∑ sgn(µi+ − µi− )(wxi + b + µi+ ξi+ − µi− ξi− ) − 1 = 0
i=1
(15)
µi+ = max(µ1i+ , µ2i+ , ........, µli+ ) (8)
∂ L(w, b, α, β , ξi+ , ξi− )
The membership of xi in the class{yi = −1}, µi− is de- =0→
∂β
fined as µi+
n
1 xi −x̄
−
− 2 ( σ k )2
∑ sgn(µi+ − µi− )(µi+ ξi+ − µi− ξi− ) = 0 (16)
− i=1
µki = e k = 1, 2....., m. (9)
According to KKT conditions, the solutions αi∗ to
µi = max(µ1i , µ2i , ........, µli− )
− − −
(10)
Eq.(15) satisfy the following formulae
We give a set S of training points: S =
{(x1 , µ1− , µ1+ ), (x2 , µ2− , µ2+ ), ......., (xn , µn− , µn+ }. The
optimal hyperplane problem is regarded as the solution to αi∗ [sgn(µi+ − µi− )(w∗ xi + b∗ + µi+ ξi+ − µi− ξi− ) − 1] = 0
ξi+ (Cµi+ − αi∗ ) = 0, ξi− (Cµi− − αi∗ ) = 0
Minimize 1
< w, w > + (17)
2
n
C ∑ sgn(µi+ − µi− )(µi+ ξi+ − µi− ξi− )
i=1 4 The evaluation function of SVM’s kernel
subject to sgn(µi+ − µi− )(wxi + b) ≥ parameters
1 − sgn(µi+ − µi− )(µi+ ξi+ − µi− ξi− )
ξi+ ≥ 0, ξi− ≥ 0,C ≥ 0 , i = 1, 2, .....n.
(11)
where C defines how much weight is given to the minimiza-
tion of the slack vector as compared to the weight vector[7].
Hence, the following Lagrange function could be obtained
602
the parameters. For most classes of kernels, for example Where xi , x j are the support vectors. In general, one prob-
polynomial kernel or RBF kernel, it is always possible to lem with the SVM maximize margin suggested is the choice
find kernel parameters for which the data become separa- of parameters, typically a range of values must be tried be-
ble. However, forcing separation of the data can easily lead fore the best choice for a particular training set can be se-
to overfitting, particularly when noise is present in the data. lected. There are many support vectors produced by differ-
In the other hand, the support points contain the informa- ent parameters, let S={x1 , x2 , ...........xl } is the intersection
tion necessary to reconstruct the hyperplane, in general, the of all support vectors, the margin function
fewer the number of support vectors the better generaliza-
l
tion can be expected, but too few support vectors will reduce
w(θ ) = ∑ yi y j ᾱi ᾱ j K(xi , x j ), xi , x j ∈ S (19)
the accuracy of SVM’s classifier. i=1
Fig. 1 and Fig. 2 show SVM for learning a data set
Where θ is the kernel’s parameter vector, and ᾱi is aver-
using RBF kernels with different values of σ and C, we will
age value of all the optimization problem solution αi of the
find that the two cases the classification of the training set
support vectors xi . Then we defined M is the maximize of
is consistent, and the larger value of σ ,the smaller number
w(θi ), M = max(w(θi )), where θi is the i-th parameter of
of support vectors, the more errors of classification will be
all the training parameters. Then define µ(θi ) is the margin
present.
membership of parameter θi as
w(θi )
µ(θi ) = (20)
M
The kernel’s parameters evaluation function is defined
1
g(θi ) = exp[− (µ(θi ) − 1)2 ] (21)
2
Where g(θi ) → [0, 1], which estimates parameter θi ef-
fect to the SVM classifier. Thus, the parameter θi is com-
puted by the evaluation function which is most seemly value
of the kernel’s parameters.
603
function improves the SVM classifier accuracy.
Table 1. Comparisons of two SVM’s classifi-
cation accuracy References
Method Parameter Training AC Testing AC
LS-SVM (0.01, 1) 91.2% 83.2%
[1] N. Cristianni and J. Shawe-Talyor, An introduction
LS-SVM (0.16, 5.2) 95.5% 89.2%
to support vector Machines, Cambridge Uninversity
LS-SVM (1, 10) 87.9% 79.3%
Press, 2000.
FSVM (0.01, 1) 92.1% 84.7%
FSVM (0.16, 5.2) 98.6% 92.5% [2] V.N. Vapnik, An Overview of Statistical Learning The-
FSVM (1, 10) 88.4% 82.6% ory, IEEE Transactions on Neural Networks, 10(5):
988-999, 1999.
[3] C.F. Lin, and S.D. Wang, Fuzzy Support Vector Ma-
from table II, the SVM classifier is more effective by us- chines, IEEE Transactions on Neural Networks, 13(3):
ing kernel’s parameters evaluation function. As compared 466-471, 2002.
to SVM and the classification results, the SVM using fuzzy
theory can not only improve the SVM classifier accuracy, [4] Danisuke Tsujinishi and Shigeo Abe, Fuzzy least
but also get optimized kernel parameters learning algorithm. squares support vector machines for multiclass prob-
lems, Neural Nehuorks, 16: 785-792, 2003.
[5] T. Inoue and S. Abe, Fuzzy support vector machines
for pattem classification, International Joint Confer-
ence on Neural Networks, 2: 1449-1454, July, 2001.
[6] V.N. Vapnik, Statistical Learning Theory, John Wiley
and Sons, New York, 1998.
[7] Shawe Talyor J., Cristianini N. Kernel Methods for
Pattern Analysis. Cambridge, Cambridge Uninversity
Press, 2004.
[8] V.N. Vapnik. The nature of statistical learning theory.
NY: Springer-Verlag, 1995.
[9] C.F. Lin and S.D. Wang, Training algorithms for
fuzzy support vector machines with noisy data, Pattern
Figure 3. Parameters evaluation value Recognition Letters, 25(10): 1647-1656, 2004.
[10] C.Y. Yang, Support vector classifier with a fuzzy-
value class label, Lecture Notes in Computer Science,
6 Conclusion Springer-Verlag, Berlin, 3173: 506-511, 2004.
[11] B. Schölkopf and A. Smola, K.R. Müller, Learn-
In this paper, we proposed fuzzy support vector machine
ing with Kernels: Support Vector Machines, Regular-
for classification. The proposed FSVM resolves unclassi-
ization, Optimization and Beyond, Cambridge, MIT
fiable regions caused by conventional support vector ma-
Press, 2002.
chines. We apply a fuzzy membership function to each data
point and reformulate the SVM such that different input [12] C.L. Blake and C.J. Merz UCI repository of ma-
points can make different contributions to the learning of chine learning databases. Dept. Inform. Comput. Sci.,
Lagrange function. On the other hand, the use of kernel’s Univ. California, Irvine, CA. [Online]. Available:
parameters evaluation function provides the seemly param- http://www.ics.uci.edu/ mlearn/MLRepository.html.
eters of SVM’s kernel. The result of experiments indicates
that the proposed method enhances the SVM in reducing
the effect of noises in the application. FSVM is suitable for
applications in which data points have modeled characteris-
tics. It demonstrates the superiority of the method of FSVM
over standard SVM, and the kernel’s parameters evaluation
604