I2ml3e Chap12

Lecture Slides for
INTRODUCTION
TO
MACHNE
LEARNNG
3RD EDTON
ETHEM ALPAYDIN
The MIT Press, 2014
alpaydin@boun.edu.tr
http://www.cmpe.boun.edu.tr/~ethem/i2ml3e
CHAPTER 12:
LOCAL MODELS
Introduction
3
Divide the input space into local regions and learn

simple (constant/linear) models in each patch
Unsupervised: Competitive, online clustering

Supervised: Radial-basis functions, mixture of
experts
Competitive Learning

E m i ki1 X t i bit xt m i
1 if xt m i min xt m l
b
t
i
l
0 otherwise
Batch k - means : m i
b
t i
t t
x
t i
b t
Online k - means :
E t
mij bit x tj mij
mij
4
Winner-take-all
network
5
Adaptive Resonance Theory
Incremental; add a new cluster if

not covered; defined by vigilance,

k
b x m i min xt m l
t
i
t
l 1
m k 1 xt if bi

im x t
m i otherwise
(Carpenter and Grossberg, 1988)
6
Self-Organizing Maps
7
Units have a neighborhood defined; mi is between

mi-1 and mi+1, and are all updated together
One-dim map:
(Kohonen, 1990)
m l el , i xt m l
1 l i 2
el , i exp 2
2 2
Radial-Basis Functions
Locally-tuned units:
xt m 2

pht exp
h
2sh2

H
y w h pht w 0
t
h 1
8
Local vs Distributed Representation
9
Training RBF
10
Hybrid learning:
First layer centers and spreads:
Unsupervised k-means
Second layer weights:
Supervised gradient-descent
Fully supervised
(Broomhead and Lowe, 1988; Moody and Darken,
1989)
Regression
11
E m h , sh , w ih i ,h | X ri y i
1 t t 2
2 t i
H
y it w ih pht w i 0
h 1
w ih rit y it pht
t

mhj ri y i w ih ph
t t t
x j mhj
t
2
t i s h
2
t x mh
t

sh ri y i w ih ph
t t
3
t i s h
Classification
12
E m h , sh ,wih i ,h | X rit log yit

t i
y

exp h wih pht wi 0

t
k h kh h wk 0
i t
exp w p
Rules and Exceptions
13
H
y t wh pht v T xt v 0
h 1
Default
Exceptions rule
Rule-Based Knowledge
14
IF x1 a AND x 2 b OR x3 c THEN y 0.1

x1 a 2 x 2 b2
p1 exp 2 exp 2 with w1 0.1
2s1 2 s2
x3 c 2
p2 exp 2 with w 2 0.1
2s3
Incorporation of prior knowledge (before training)

Rule extraction (after training) (Tresp et al., 1997)
Fuzzy membership functions and fuzzy rules
Normalized Basis Functions
15
pht
g
t
l 1 l
h H
t
p

exp x m h / 2 sh2 t 2

l exp x m l
/ 2 sl
2 t 2

H
y w ih ght
t
i
h 1
wih rit y it ght

t
mhj r y w y g
x t t t t
t
j mhj
i i ih i h
t i sh2
Competitive Basis Functions
16
pr | x ph| xt pr t |h, xt
H
Mixture model: t t
h 1
ph| x
t p x t
|hph
p
l |l pl
x t
a exp x m / 2s
t 2 2
a exp x m / 2s
h h h
g
t
h 2
t 2
l l l l
Regression rit yit
pr t |xt
1
exp
17 i 2 2 2

1 t 2
L m h , sh ,wih i ,h | X log g exp ri y ih
t t
h
t h 2 i
y iht wih is the constant fit
w r y f m f g
t t t
x t t
t
j mhj
ih i ih h hj h h
t t sh2
g exp 1/ 2 r y
t t t 2
g exp 1/ 2 r y
f
t h i i ih
h t t t 2
l l i i il
ph | x pr | h, x
ph | r , x
l pl | x pr | l , x
Classification
18
L m h , sh ,wih i ,h | X log g y
t
t t ri
h ih
t h i
t
log g exp ri log y ih
t t
h
t h i
expwih
y ih
t
k expwkh
g exp r log y
t t t

h i i ih
g exp r log y
t
fh t t t
l l i i il
EM for RBF (Supervised EM)
19
fht pr |h, xt
E-step:
M-step:
mh
t h x
f t t
t h
f t
f x x
T
t t
mh t
mh
sh t h
t h
f t
w ih
t h ri
f t t
t h
f t
Learning Vector Quantization
20
H units per class prelabeled (Kohonen, 1990)

Given x, mi is the closest:
mi x mi if label( x ) label(mi )
t t

i
m x t
mi otherwise
x
mi
mj
Mixture of Experts
In RBF, each local fit is a

constant, wih, second
layer weight
In MoE, each local fit is
a linear function of x, a
localwexpert:
t
v t xt
ih ih
(Jacobs et al., 1991)
21
MoE as Models Combined
Radial gating:

exp x m h / 2 sh2
t 2

g

t
l
h 2
exp x m l
t
/ 2 sl
2
Softmax gating:
gh
t exp m
T t
hx
l exp m T t
lx
22
Cooperative MoE
23
Regression
E m h , sh ,wih i ,h | X ri y i
1 t t 2
2 t i
v ih rit y iht ght xt
t
mhj rit y iht wiht y it ght x tj

t
Competitive MoE: Regression
24
1 t 2
L m h , sh ,w ih | X log g
i ,h
t
h
exp ri y ih
t
t h 2 i
y iht wih v ih xt
v ih rit y iht fht xt
t
m h fht ght xt
t
Competitive MoE: Classification
25
L m h , sh , w ih i ,h | X log g y
t
t t ri
h ih
t h i
t
log g exp ri log y ih
t
h
t
t h i
t
exp w exp v x
y iht ih
ih
k exp w kh k kh
exp v x t
Hierarchical Mixture of Experts
26
Tree of MoE where each MoE is an expert in a

higher-level MoE
Soft decision tree: Takes a weighted (gating)
average of all leaves (experts), as opposed to
using a single path and a single leaf
Can be trained using EM (Jordan and Jacobs,
1994)

I2ml3e Chap12

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

I2ml3e Chap12

Uploaded by

Copyright:

Available Formats

Lecture Slides for

Divide the input space into local regions and learn

Unsupervised: Competitive, online clustering

Incremental; add a new cluster if

(Carpenter and Grossberg, 1988)

Units have a neighborhood defined; mi is between

E m h , sh ,wih i ,h | X rit log yit

IF x1 a AND x 2 b OR x3 c THEN y 0.1

Incorporation of prior knowledge (before training)

wih rit y it ght

H units per class prelabeled (Kohonen, 1990)

In RBF, each local fit is a

(Jacobs et al., 1991)

mhj rit y iht wiht y it ght x tj

Tree of MoE where each MoE is an expert in a

You might also like