You are on page 1of 26

Lecture Slides for

INTRODUCTION
TO
MACHNE
LEARNNG
3RD EDTON
ETHEM ALPAYDIN
The MIT Press, 2014

alpaydin@boun.edu.tr
http://www.cmpe.boun.edu.tr/~ethem/i2ml3e
CHAPTER 12:

LOCAL MODELS
Introduction
3

Divide the input space into local regions and learn


simple (constant/linear) models in each patch

Unsupervised: Competitive, online clustering


Supervised: Radial-basis functions, mixture of
experts
Competitive Learning


E m i ki1 X t i bit xt m i
1 if xt m i min xt m l
b
t
i
l

0 otherwise

Batch k - means : m i
b
t i
t t
x
t i
b t

Online k - means :
E t
mij bit x tj mij
mij

4
Winner-take-all
network

5
Adaptive Resonance Theory

Incremental; add a new cluster if


not covered; defined by vigilance,

k
b x m i min xt m l
t
i
t
l 1

m k 1 xt if bi


im x t
m i otherwise

(Carpenter and Grossberg, 1988)

6
Self-Organizing Maps
7

Units have a neighborhood defined; mi is between


mi-1 and mi+1, and are all updated together
One-dim map:
(Kohonen, 1990)

m l el , i xt m l
1 l i 2
el , i exp 2
2 2
Radial-Basis Functions

Locally-tuned units:
xt m 2

pht exp
h

2sh2

H
y w h pht w 0
t

h 1

8
Local vs Distributed Representation
9
Training RBF
10

Hybrid learning:
First layer centers and spreads:
Unsupervised k-means
Second layer weights:
Supervised gradient-descent
Fully supervised
(Broomhead and Lowe, 1988; Moody and Darken,
1989)
Regression
11

E m h , sh , w ih i ,h | X ri y i
1 t t 2

2 t i
H
y it w ih pht w i 0
h 1

w ih rit y it pht
t


mhj ri y i w ih ph
t t t
x j mhj
t

2
t i s h
2
t x mh
t

sh ri y i w ih ph
t t
3
t i s h
Classification
12

E m h , sh ,wih i ,h | X rit log yit


t i

y

exp h wih pht wi 0

t

k h kh h wk 0
i t
exp w p
Rules and Exceptions
13

H
y t wh pht v T xt v 0
h 1
Default
Exceptions rule
Rule-Based Knowledge
14

IF x1 a AND x 2 b OR x3 c THEN y 0.1


x1 a 2 x 2 b2
p1 exp 2 exp 2 with w1 0.1
2s1 2 s2
x3 c 2
p2 exp 2 with w 2 0.1
2s3

Incorporation of prior knowledge (before training)


Rule extraction (after training) (Tresp et al., 1997)
Fuzzy membership functions and fuzzy rules
Normalized Basis Functions
15

pht
g
t

l 1 l
h H
t
p


exp x m h / 2 sh2 t 2


l exp x m l
/ 2 sl
2 t 2

H
y w ih ght
t
i
h 1

wih rit y it ght


t

mhj r y w y g
x t t t t
t
j mhj
i i ih i h
t i sh2
Competitive Basis Functions
16

pr | x ph| xt pr t |h, xt
H
Mixture model: t t

h 1

ph| x
t p x t
|hph
p
l |l pl
x t

a exp x m / 2s
t 2 2

a exp x m / 2s
h h h
g
t
h 2
t 2
l l l l
Regression rit yit
pr t |xt
1
exp
17 i 2 2 2

1 t 2
L m h , sh ,wih i ,h | X log g exp ri y ih
t t
h
t h 2 i
y iht wih is the constant fit

w r y f m f g
t t t
x t t
t
j mhj
ih i ih h hj h h
t t sh2
g exp 1/ 2 r y
t t t 2

g exp 1/ 2 r y
f
t h i i ih
h t t t 2
l l i i il

ph | x pr | h, x
ph | r , x
l pl | x pr | l , x
Classification
18

L m h , sh ,wih i ,h | X log g y
t
t t ri
h ih
t h i

t
log g exp ri log y ih
t t
h
t h i
expwih
y ih
t

k expwkh
g exp r log y
t t t


h i i ih

g exp r log y
t
fh t t t
l l i i il
EM for RBF (Supervised EM)
19

fht pr |h, xt
E-step:

M-step:
mh
t h x
f t t

t h
f t

f x x
T
t t
mh t
mh
sh t h

t h
f t

w ih
t h ri
f t t

t h
f t
Learning Vector Quantization
20

H units per class prelabeled (Kohonen, 1990)


Given x, mi is the closest:
mi x mi if label( x ) label(mi )
t t



i
m x t
mi otherwise
x

mi
mj
Mixture of Experts

In RBF, each local fit is a


constant, wih, second
layer weight
In MoE, each local fit is
a linear function of x, a
localwexpert:
t
v t xt
ih ih

(Jacobs et al., 1991)

21
MoE as Models Combined

Radial gating:


exp x m h / 2 sh2
t 2

g

t

l
h 2
exp x m l
t
/ 2 sl
2

Softmax gating:

gh
t exp m
T t
hx
l exp m T t
lx
22
Cooperative MoE
23

Regression

E m h , sh ,wih i ,h | X ri y i
1 t t 2

2 t i
v ih rit y iht ght xt
t

mhj rit y iht wiht y it ght x tj


t
Competitive MoE: Regression
24

1 t 2
L m h , sh ,w ih | X log g
i ,h
t
h
exp ri y ih
t

t h 2 i
y iht wih v ih xt
v ih rit y iht fht xt
t

m h fht ght xt
t
Competitive MoE: Classification
25

L m h , sh , w ih i ,h | X log g y
t
t t ri
h ih
t h i

t
log g exp ri log y ih
t
h
t

t h i
t
exp w exp v x
y iht ih
ih

k exp w kh k kh
exp v x t
Hierarchical Mixture of Experts
26

Tree of MoE where each MoE is an expert in a


higher-level MoE
Soft decision tree: Takes a weighted (gating)
average of all leaves (experts), as opposed to
using a single path and a single leaf
Can be trained using EM (Jordan and Jacobs,
1994)

You might also like