Professional Documents
Culture Documents
tor Ma
hines
Yann Guermeur
LORIA - CNRS
http://www.loria.fr/~guermeur
Summer S
hool NN2008
July 4, 2008
Overview
Guaranteed risk for large margin multi-
ategory
lassiers
- Theoreti
al framework
- Basi
uniform
onvergen
e result
-
--dimensions
2/55
Overview
Guaranteed risks for multi-
lass SVMs
- Bounds on the
overing numbers
- Use of the Radema
her
omplexity
3/55
Theoreti al framework
X Y -valued
xX
random pair
(X, Y )
y Y = [[ 1, Q ]]
distributed a ording to a
is unknown
What is available
-
Dm = ((Xi , Yi ))1im
: i.i.d.
m-sample
from
(X, Y )
rules
f,
from
into
{})
The goal
-
minimizing over
the risk
4/55
M)
1
(v, k) RQ [[ 1, Q ]] , M (v, k) =
2
vk max vl
l6=k
5/55
X Y
1 X
R,m (g) =
1l #
m i=1 { gYi (Xi )< }
Class of fun
tions of interest:
For
R+ ,
let
: R [, ]
#
G
#
#
#
#
,
g
=
g
,
G
=
g
:
g
k
k
1kQ
6/55
Figure 1:
-net
and
- over
of a set
in a pseudo-metri spa e
(E, )
N (, E , ):
7/55
R(f ) Rm (f ) +
ln EN F, (Xi )1i2m
is the
1
m
1
4
ln EN F, (Xi )1i2m + ln
+ .
annealed entropy of F
on the sample
(Xi )1i2m .
8/55
dxn )
For
R+ ,
let
N (, G, n) = supxn X n N (, G, dxn ).
2
1
2
ln 2N (p) /4, #
+ .
+ ln
G, 2m
m
9/55
--dimensions
Remark 1
Remark 2
sup N (F, sX n ) .
sX n X
In ontrast with the annealed entropy, the growth fun tion is distribution-free.
10/55
--dimensions
VC dimension
Let F be a
lass of indi
ator
fun
tions on a domain X . A subset sX n = {xi : 1 i n} of X is said to be shattered by F if for
ea
h ve
tor vy in {1, 1}n , there is a fun
tion fy in F satisfying
Denition 8 (VC dimension, Vapnik & Chervonenkis, 1971)
11/55
--dimensions
-dimensions
(i)
fy (xi )
1in
= vy .
12/55
--dimensions
13/55
--dimensions
Fat-shattering or dimension
Let G be a
lass of
real-valued fun
tions on a set X . For R+ , a subset sX n = {xi : 1 i n} of X is said to be
n
-shattered by G if there is a ve
tor vb = (bi ) in Rn su
h that, for ea
h ve
tor vy = (yi ) in {1, 1} ,
there is a fun
tion gy in G satisfying
Denition 12 (Fat-shattering dimension, Kearns & S
hapire, 1994)
i [[ 1, n ]] , yi (gy (xi ) bi ) .
The fat-shattering dimension with margin , or P dimension, of the
lass G , P -dim (G), is the
maximal
ardinality of a subset of X -shattered by G , if this
ardinality is nite. If no su
h
maximum exists, G is said to have innite P dimension.
14/55
--dimensions
--dimensions
Let
Denition 13 ( --dimensions)
Q = 2.
15/55
--dimensions
set
is a set
The Natarajan dimension with margin of the
lass # G , N-dim(# G, ), is the maximal
ardinality of a subset of X -N-shattered by # G , if this
ardinality is nite. If no su
h maximum
exists, # G is said to have innite Natarajan dimension with margin .
16/55
Sauer-Shelah lemma
(Classes of indi
ator fun
tions)
Let F be a
lass of
indi
ator fun
tions on a set X and let F be its growth fun
tion. If its VC dimension d is nite,
then for n d,
Lemma 1 (Vapnik & Chervonenkis, 1971; Sauer, 1972; Shelah, 1972)
F (n)
d
X
i=0
Cni
<
en d
d
17/55
F (n)
d
X
i=0
2
Cni CQ+1
i
<
(Q + 1)2 en
2d
d
18/55
N (, G, n) < 2
4n
2
d log2 (2en/(d))
19/55
N (p) (, G, n) < 2 n Q2 (Q 1)
3MG
2 !
j 3M k m
2
G
2
d log2 enCQ
1 /d
20/55
j 12M k m
u
G
!
u
d
log
emQ(Q1)
2
1
/d
2
2
u2
12M
1
2M
G
G
u ln 4 2m Q2 (Q 1)
+
+
ln
tm
d
m
lim sup P
m+ P
sup sup (R(g) R,n (g)) > = 0
nm gG
lim sup P
m+ P
sup sup |R (g) R,n (g)| > = 0
nm gG
21/55
SVMs: the
k -th
from the
Q1
other ones
Q
2
Use of error
orre
ting output
odes (ECOC) (Allwein et al., 2000)
-
oding matrix
22/55
be a spa e and
(H, h, iH )
X (H RX ).
1. x X , x H ;
23/55
n
n X
X
i=1 j=1
Theorem 4 (Moore-Aronszajn)
24/55
Let
= (H , h, iH )Q
H
H:
and
and let
(H , h, iH )
H = ((H , h, iH ) + {1})
h = (hk )1kQ
from
h() =
mk
X
i=1
into
RQ
su h that:
ik (xik , ) + bk
1kQ
with
25/55
(X ) = { (x) : x X }.
is alled a
Let
spanned by the
(X ).
(X )
26/55
and E Q
H
(X )
h
H
v
v
v
u Q
u Q
u Q
uX
uX
u X
2
t
t
hwk , wk i = t
kwk k2 = kwk
hk H =
=
k=1
k=1
k=1
27/55
Q 3:
((xi , yi ))1im (X [[ 1, Q ])
]
M-SVM :
training set
hinge loss)
min
hH
m
X
i=1
2
M-SVM (yi , h(xi )) + khk
H
s.t.
PQ
k=1
hk = 0
Representer theorem
This theorem states that training (solving Problem 1) amounts to nding the values of the
oe
ients
ik
in
h() =
m
X
i=1
bk
ik (xi , ) + bk
1kQ
28/55
: training set
(hinge loss)
SVM (y, h(x)) = 1 y h(x)
1
2
(hw1 w2 , (x)i + b1 b2 )
min
Representer theorem
m
X
i=1
2
i) +
SVM yi , h(x
h
This theorem states that training (solving Problem 2) amounts to nding the values of the
oe
ients
in
=
h()
m
X
i=1
i (xi , ) + b
29/55
dM-SVM =
dM-SVM,kl =
min
1k<lQ
1
dM-SVM
min
min min (hk (xi ) hl (xi )) , min (hl (xj ) hk (xj ))
i:yi =k
j:yj =l
i:yi =k
j:yj =l
1 + dM-SVM,kl
kwk wl k
X
k<l
kwk wl k2 = Q
Q
X
k=1
Q
X
k=1
2
Q
Q
X
X
kwk k2
wk
wk = 0 =
k=1
k=1
2
X 1 + dM-SVM,kl 2
d
kwk k2 = M-SVM
Q
kl
k<l
30/55
min
Q
1 X
hH 2
k=1
kwk k + C
m
X
i=1 k6=yi
hw w , (x )i + b b 1 ,
yi
k
i
yi
k
ik
s.t.
ik 0,
Remark 6
The onstraint
PQ
k=1
hk = 0
ik
(1 i m), (1 k 6= yi Q)
(1 i m), (1 k 6= yi Q)
is impli it.
31/55
ik :
min
1 T
HWW 1TQm
2
0 C,
(1 i m), (1 k 6= yi Q)
ik
s.t. P
PQ
Pm
i:yi =k
l=1 il
i=1 ik = 0, (1 k Q 1)
HWW =
wk =
yi ,yj yi ,l yj ,k + k,l (xi , xj ) 1i,jm,1k,lQ
Q
X X
i:yi =k l=1
il
(xi )
m
X
i=1
ik
(xi ) =
Q
m X
X
i=1 l=1
(yi ,k k,l ) il
(xi )
32/55
min
1
2
Q
X
k=1
kwk k2 + C
m
X
i=1
The onstraint
PQ
k=1
k = 0
h
is impli it.
33/55
ik :
hwyi wk , (xi )i + yi ,k 1 i
min
1 T
HWW + T
2
0,
(1 i m), (1 k Q)
ik
s.t. PQ
k=1 ik = C, (1 i m)
34/55
min
Q
1 X
hH 2
k=1
kwk k + C
hw
,
(x
)i
+
b
k
i
k
Q1 + ik ,
s.t. ik 0,
PQ
PQ
w
=
0,
k
k=1 bk = 0
k=1
m X
X
i=1 k6=yi
ik
(1 i m), (1 k 6= yi Q)
(1 i m), (1 k 6= yi Q)
35/55
ik :
1
hwk , (xi )i + bk Q1
+ ik
min
1 T
1
HLLW
1TQm
2
Q1
0 ik C,
(1 i m), (1 k 6= yi Q)
s.t. Pm PQ 1
i=1
l=1 Q k,l il = 0, (1 k Q 1)
HLLW =
1
k,l
Q
wk =
(xi , xj )
Q
m X
X
1
i=1 l=1
1i,jm,1k,lQ
(xi )
k,l il
36/55
min
s.t.
hH 2
t +C
m
X
i=1 k6=yi
ik 0,
kwk k t,
ik
(1 i m), (1 k 6= yi Q)
(1 i m), (1 k 6= yi Q)
(1 k Q)
et al., 2006)
(x, x ) = xT x ( = Id)
Problem 10 (1 -norm M-SVM)
min
hH
m
X
i=1
PQ kw k K
k 1
k=1
s.t. PQ
k=1 hk = 0
37/55
Denition 19 (M-SVM )
1
k,l
Q
i,j
1i,jm,1k,lQ
38/55
Problem 11 (M-SVM )
min
Q
1X
kwk k2 + C T M
hH
2
k=1
hw , (x )i + b 1 + , (1 i m), (1 k 6= y Q)
ik
i
i
k
Q1
s.t. PQk
PQ
wk = 0,
bk = 0
k=1
k=1
Dual formulation
Problem 12 (M-SVM )
min
1 T
HLLW +
1
1
M
1TQm
2C
Q1
ik 0,
(1 i m), (1 k 6= yi Q)
s.t. Pm PQ 1
i=1
l=1 Q k,l il = 0, (1 k Q 1)
Summer S
hool NN2008
39/55
0.8
x_2
0.6
0.4
0.2
0
0
0.2
0.4
0.6
0.8
x_1
R2
40/55
41/55
C_1 / C_2
C_2 / C_3
x_3
x_2
x_1
R3
42/55
C_1 / C_2
C_1 / C_3
C_2 / C_3
x_3
x_1
x_2
43/55
Theorem 6
N-dim
H,
w (X ) 2
Q
.
2
The proof
- does not hold true anymore if the operator
-
alls for the use of the
-norm
instead of the
2 -norm
Q=2:
P -dim (H )
w (X )
2
44/55
n (E ) = inf { > 0 : N (, E , ) n} .
45/55
is dened as:
Sxn :
= (wk )
h
1kQ
Proposition 2
= (hwk , (xi )i)
Sxn h
1in, 1kQ
H
: kwk 1
(U = h
-norm
numbers of Sxn is
in the
Qn
Let R+ and n N .
sup p (Sxn ) = N (, U, n) p.
xn X n
46/55
n (S) 4kSkn1/r .
Let H be the
lass of fun
tions that a Q-
ategory M-SVM
an implement under the
hypothesis that (X ) is in
luded in the ball of radius (X ) about the origin in E(X ) , that the
ve
tor w satises kwk w and b [, ]Q . If the dimensionality of the spa
e E(X ) is nite
and equal to d, then for all R+ ,
Theorem 7
Q
Qd
64
8
w
(X )
+1
.
N (p) (/4, H, 2m) 2
1
m
!
47/55
ek (S) c
n
1
log2 1 +
k
k
1/2
kSk,
where the dyadi
entropy number ek (S) is equal to 2k1 (S) and c is a universal
onstant.
Let H be the
lass of fun
tions that a Q-
ategory M-SVM
an implement under the
hypothesis that (X ) is in
luded in the ball of radius (X ) about the origin in E(X ) , that the
ve
tor w satises kwk w and b [, ]Q . Then, for all R+ ,
Theorem 9
Q 16cw
q
(X )
2Qm
8
1
(p)
ln(2)
+1
2
N (/4, H, 2m) 2
.
!
48/55
n
1 X
i a i .
Rn (A) = E sup
aA n
i=1
i [[ 1, n ]] ,
sup
(ti )1in T n ,ti T
where c =
Pn
2
i=1 ci
2c
2c
49/55
R(h) = E (1 hY (X))+
Let H be the
lass of fun
tions that a Q-
ategory M-SVM
an implement under the
hypothesis that (X ) is in
luded in the
losed ball of radius (X ) about the origin in E(X ) , that
the ve
tor w satises kwk w and b = 0. Let KH = w (X ) + 1 and (0, 1). With
probability at least 1 , the risk of any fun
tion h in H is bounded from above by:
Theorem 11
R
m
R h
v
s
um
1
X
u
ln
+ 4 + 4Q(Q 1)w t
.
h
(Xi , Xi ) + KH
m
2m
m
i=1
R h Rm h + O
1
m
50/55
Radius-margin bound
Let us
onsider a hard margin bi-
lass SVM. Let Lm be the
1
denote
number of errors that it makes in a leave-one-out
ross-validation pro
edure and let = kwk
its geometri
al margin. Then the following upper bound holds true:
Theorem 12 (Vapnik, 1998)
Lm
2
Dm
2
where Dm is the diameter of the smallest ball of the feature spa e ontaining the support ve tors.
51/55
Let us
onsider a hard margin Q-
ategory M-SVM of Weston and Watkins (or
Crammer and Singer) on a domain X . Let dm = {(xi , yi ) : 1 i m} be its training set, Lm the
number of errors resulting from applying a leave-one-out
ross-validation pro
edure to this ma
hine,
and Dm the diameter of the smallest sphere of the feature spa
e
ontaining the set
{(xi ) : 1 i m}. Then the following upper bound holds true:
Theorem 13
Lm
Constant
k<l
kl
KCV
- The value of
- For
KCV 2
Dm
Q
X 1 + dW W,kl 2
KCV
Q = 2, KCV = 2,
52/55
Q
Q1
Let us
onsider a hard margin Q-
ategory M-SVM of Lee, Lin and Wahba on a
domain X . Let dm = {(xi , yi ) : 1 i m} be its training set, Lm the number of errors resulting
from applying a leave-one-out
ross-validation pro
edure to this ma
hine, and Dm the diameter of
the smallest sphere of the feature spa
e
ontaining the set {(xi ) : 1 i m}. Then the following
upper bound holds true:
Theorem 14
2
Lm Q2 Dm
X 1 + dLLW,kl 2
k<l
kl
This bound does not redu e itself to the bi- lass one for
Q = 2.
53/55
Con
lusions
Capa
ity measures of the
lasses of fun
tions
- The
--dimensions
play for the M-SVMs (and the MLPs!) the same role as the fat-shattering
Guaranteed risks
- These studies highlight the spe
i
hara
ter of the multi-
lass
ase.
- Model sele
tion should provide a tou
hstone to assess the dierent guaranteed risks derived.
54/55
- Integration in the appli
ations implementing the M-SVMs of pro
edures
hoosing automati
ally
the values of the hyperparameters
55/55