Professional Documents
Culture Documents
Support Vector
Suppo ec o Machines
ac es
z Topics
SVM classifiers for linearly separable classes
SVM classifiers for non-linearly separable
classes
SVM classifiers for nonlinear decision
boundaries
kernel functions
Other applications of SVMs
Software
Linearly
separable
classes
O possible
One ibl solution
l ti
A th possible
Another ibl solution
l ti
Oth possible
Other ibl solutions
l ti
Whi h one is
Which i better?
b tt ? B1 or B2? How
H do
d you define
d fi better?
b tt ?
B1
test sample
B2
b21
b22
margin
b11
b12
Hyperplane that maximizes the margin will have better generalization
=> B1 is better than B2
Jeff Howbert Introduction to Machine Learning Winter 2012 9
Support vector machines
B1
test sample
B2
b21
b22
margin
b11
b12
Hyperplane that maximizes the margin will have better generalization
=> B1 is better than B2
Jeff Howbert Introduction to Machine Learning Winter 2012 10
Support vector machines
B1
wx +b = 0
w x + b = +1
w x + b = 1
b11
b12
+ 1 if w x + b 1 2
yi = f (x ) = i =
margin
1 if w x + b 1 || w ||
Jeff Howbert Introduction to Machine Learning Winter 2012 11
Support vector machines
2
z We want to maximize: margin =
|| w ||
|| w ||2
z Which is equivalent to minimizing: L(w ) =
2
+ 1 if w x + b 1
yi = f (x ) =
1 if w x + b 1
This is a constrained convex optimization problem
Solve with numerical approaches, e.g. quadratic
programming
Jeff Howbert Introduction to Machine Learning Winter 2012 12
Support vector machines
L p N
= 0 i yi = 0
b i =1
z With SVMs
SVMs, can circumvent all the above via the
kernel trick
z Kernel trick
Don t need to specify the attribute transform ( x )
Dont
Only need to know how to calculate the dot product of
any two transformed samples:
k( x1, x2 ) = ( x1 ) ( x2 )
The kernel function k is substituted into the dual of the
L
Lagrangian,
i allowing
ll i d determination
t i ti off a maximum
i
margin hyperplane in the (implicitly) transformed
space ( x )
All subsequent calculations, including predictions on
test samples, are done using the kernel in place of
( x1 ) ( x2 )
Jeff Howbert Introduction to Machine Learning Winter 2012 25
Support vector machines
linear k (x1 , x 2 ) = x1 x 2
polynomial k (x1 , x 2 ) = ( x1 x 2 + c) d
one-class
Regression
Transduction (semi-supervised learning)
Ranking
Clustering
Structured labels
Jeff Howbert Introduction to Machine Learning Winter 2012 29
Support vector machines
z Software
SVMlight
http://svmlight joachims org/
http://svmlight.joachims.org/
libSVM
http://www.csie.ntu.edu.tw/~cjlin/libsvm/
includes MATLAB / Octave interface