Professional Documents
Culture Documents
INTRODUCTION
TO
MACHNE
LEARNNG
3RD EDTON
ETHEM ALPAYDIN
The MIT Press, 2014
alpaydin@boun.edu.tr
http://www.cmpe.boun.edu.tr/~ethem/i2ml3e
CHAPTER 12:
LOCAL MODELS
Introduction
3
E m i ki1 X t i bit xt m i
1 if xt m i min xt m l
b
t
i
l
0 otherwise
Batch k - means : m i
b
t i
t t
x
t i
b t
Online k - means :
E t
mij bit x tj mij
mij
4
Winner-take-all
network
5
Adaptive Resonance Theory
m k 1 xt if bi
im x t
m i otherwise
6
Self-Organizing Maps
7
m l el , i xt m l
1 l i 2
el , i exp 2
2 2
Radial-Basis Functions
Locally-tuned units:
xt m 2
pht exp
h
2sh2
H
y w h pht w 0
t
h 1
8
Local vs Distributed Representation
9
Training RBF
10
Hybrid learning:
First layer centers and spreads:
Unsupervised k-means
Second layer weights:
Supervised gradient-descent
Fully supervised
(Broomhead and Lowe, 1988; Moody and Darken,
1989)
Regression
11
E m h , sh , w ih i ,h | X ri y i
1 t t 2
2 t i
H
y it w ih pht w i 0
h 1
w ih rit y it pht
t
mhj ri y i w ih ph
t t t
x j mhj
t
2
t i s h
2
t x mh
t
sh ri y i w ih ph
t t
3
t i s h
Classification
12
y
exp h wih pht wi 0
t
k h kh h wk 0
i t
exp w p
Rules and Exceptions
13
H
y t wh pht v T xt v 0
h 1
Default
Exceptions rule
Rule-Based Knowledge
14
pht
g
t
l 1 l
h H
t
p
exp x m h / 2 sh2 t 2
l exp x m l
/ 2 sl
2 t 2
H
y w ih ght
t
i
h 1
mhj r y w y g
x t t t t
t
j mhj
i i ih i h
t i sh2
Competitive Basis Functions
16
pr | x ph| xt pr t |h, xt
H
Mixture model: t t
h 1
ph| x
t p x t
|hph
p
l |l pl
x t
a exp x m / 2s
t 2 2
a exp x m / 2s
h h h
g
t
h 2
t 2
l l l l
Regression rit yit
pr t |xt
1
exp
17 i 2 2 2
1 t 2
L m h , sh ,wih i ,h | X log g exp ri y ih
t t
h
t h 2 i
y iht wih is the constant fit
w r y f m f g
t t t
x t t
t
j mhj
ih i ih h hj h h
t t sh2
g exp 1/ 2 r y
t t t 2
g exp 1/ 2 r y
f
t h i i ih
h t t t 2
l l i i il
ph | x pr | h, x
ph | r , x
l pl | x pr | l , x
Classification
18
L m h , sh ,wih i ,h | X log g y
t
t t ri
h ih
t h i
t
log g exp ri log y ih
t t
h
t h i
expwih
y ih
t
k expwkh
g exp r log y
t t t
h i i ih
g exp r log y
t
fh t t t
l l i i il
EM for RBF (Supervised EM)
19
fht pr |h, xt
E-step:
M-step:
mh
t h x
f t t
t h
f t
f x x
T
t t
mh t
mh
sh t h
t h
f t
w ih
t h ri
f t t
t h
f t
Learning Vector Quantization
20
i
m x t
mi otherwise
x
mi
mj
Mixture of Experts
21
MoE as Models Combined
Radial gating:
exp x m h / 2 sh2
t 2
g
t
l
h 2
exp x m l
t
/ 2 sl
2
Softmax gating:
gh
t exp m
T t
hx
l exp m T t
lx
22
Cooperative MoE
23
Regression
E m h , sh ,wih i ,h | X ri y i
1 t t 2
2 t i
v ih rit y iht ght xt
t
1 t 2
L m h , sh ,w ih | X log g
i ,h
t
h
exp ri y ih
t
t h 2 i
y iht wih v ih xt
v ih rit y iht fht xt
t
m h fht ght xt
t
Competitive MoE: Classification
25
L m h , sh , w ih i ,h | X log g y
t
t t ri
h ih
t h i
t
log g exp ri log y ih
t
h
t
t h i
t
exp w exp v x
y iht ih
ih
k exp w kh k kh
exp v x t
Hierarchical Mixture of Experts
26