Professional Documents
Culture Documents
Methods
Dave Krebs
CS 3750
Fall 2007
Sources
Bierens, Herman J. Introduction to Hilbert Spaces. Pennsylvania
State University. 24 June 2007.
Bousquet, Perez-Cruz. Kernel Methods and Their Potential Use
in Signal Processing. IEEE Signal Processing Magazine. May
2004.
Burges, Christopher. A Tutorial on Support Vector Machines for
Pattern Recognition.
Cristianini, Shawe-Taylor, Suanders. Kernel Methods: A
Paradigm for Pattern Analysis. Kernel Methods in
Bioengineering, Signal and Image Processing. 2007.
Schlkopf, Bernhard. Statistical Learning and Kernel Methods.
Schlkopf, Bernhard. The Kernel Trick for Distances.
1
Outline
Motivation
Kernel Basics
Definition
Example
Application
Modularity
Mercers Condition
Definitions
Constructing a Feature Space
Hilbert Spaces
Motivation
2
Motivation
Motivation
3
Kernel Definition
k (x, z ) = (x), (z )
is a kernel function
An Important Point
4
Kernel Example
Consider a two-dimensional input space X 2
with the feature map:
: x = ( x1 , x2 ) a (x) = ( x12 , x22 , 2 x1 x2 ) F = 3
5
Kernel Example (continued)
2
Then k (x, z ) = x, z
But k ( x, z ) is also the kernel that computes the
inner product of the map
6
Modularity
Basic approach to using kernel methods is:
Choose an algorithm that uses only inner products between
inputs
Combine this algorithm with a kernel function that calculates
inner products between input images in a feature space
Using kernels, algorithm is then implemented in a
high-dimensional space
Another nice property of kernels is modularity - The
same kernel can be can be reused and adapted to
very different real-world problems
Kernels can be combined together to form complex learning
systems
1. k ( x, z ) = k1 ( x, z ) + k 2 ( x, z )
2. k ( x, z ) = k1 ( x, z ) where +
3. k ( x, z ) = k1 ( x, z )k 2 ( x, z )
4. k ( x, z ) = f ( x) f ( z )
where f () is a real - valued function on
7
Kernel Trick
k (x, z ) = (x), (z )
Mercers Condition
8
Mercers Condition (continued)
2
if and only if, for any g(x) such that g ( x) dx is
finite, then K ( x, y ) g ( x) g ( y ) dxdy 0
Any kernel K (x, y ) = p =0 c p (x y ) , where the cp
p
Definitions
Gram matrix
K (k (x i , x j ))ij
where k is a kernel, x1 ,K , x m are input patterns
and K is m x m
Positive matrix
An m x m matrix Kij satisfying
m
c c K
i , j =1
i j ij 0 where ci C
9
Definitions (continued)
Definitions (continued)
10
Constructing a Feature Space
i =1 j=1
11
Constructing a Feature Space (continued)
i =1 j =1
Hilbert Spaces
Vector Space
A set V endowed with the addition operation and the scalar
multiplication operation such that these operations satisfy certain
rules
It is trivial to verify that the Euclidean space is a real vector space
Inner Product (on a real vector space)
A real function < x, y >: V V such that for all x, y, z in
V and all c in we have
<x,y> = <y,x>
<cx,y> = c<x,y>
<x+y,z> = <x,z> + <y,z>
<x,x> > 0 when x 0
12
Hilbert Spaces (continued)
2. c, x = c x
3. x + y x + y (Triangular inequality)
Cauchy Sequence
A sequence of elements xn of a metric space (i.e. a space
endowed with a metric) with metric d(.,.) such that for
every > 0 there exists an n0 such that for all k , m n0
we have d ( xk , xm ) <
Hilbert Space
A vector space H endowed with an inner product and
associated norm and metric such that every Cauchy
sequence in H has a limit in H
13
Hilbert Spaces (continued)
14
When the feature space is larger than the
number of training examples (continued)
For SVMs, the maximum margin principle does this for
us
We choose the simplest solution (the linear solution with the
largest margin) from the set of solutions with the smallest error
Rationale: We expect a simple solution to perform better
on unseen patterns
15
Kernels as Generalized Distances
16
Kernels as Generalized Distances
(continued)
Given the last result, using cpd kernels we
can generalize all algorithms based on
distances to corresponding algorithms
operating in feature spaces
We thus have a kernel trick for distances
17
How to choose the best feature space
18
Approaches to choosing the best kernel
19
Learning the kernel matrix
Implementation Issues
20
Advantages of Kernels
Questions?
21