Professional Documents
Culture Documents
Neural Networks
Lecture 3:Multi-Layer Perceptron
H.A Talebi
Farzaneh Abdollahi
Winter 2011
H. A. Talebi, Farzaneh Abdollahi
Neural Networks
Lecture 3
1/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Neural Networks
Lecture 3
2/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Neural Networks
Lecture 3
3/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Example XOR
I
XOR is nonseparable
x1 x2
1
1
1
0
0
1
0
0
Output
1
-1
-1
1
Neural Networks
Lecture 3
4/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Neural Networks
Lecture 3
5/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
I
I
The set of linearly nonseparable pattern is mapped into the image space
where it becomes linearly separable.
This can be done by proper selecting weights of the first layer(s)
Then in the next layer they can be easily classified
The pattern parameters and center of clusters are not always known a
priori
Neural Networks
Lecture 3
6/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Usually the mapping error is computed over the full training set.
EBP alg. is working in two stages:
Neural Networks
Lecture 3
7/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Multilayer Perceptron
input vec. z
output vec. o
output of first
layer, input of
hidden layer y
activation fcn.
(.) =
diag {f (.)}
Neural Networks
Lecture 3
8/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Feedforward Recall
I
[W [Vz]]
Neural Networks
Lecture 3
9/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Back-Propagation Training
I
p: pth pattern
dp : desired output for pth pattern
Neural Networks
Lecture 3
10/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Back-Propagation Training
I
J
X
wkj yj
j=1
ok
I
= f (netk )
Neural Networks
Lecture 3
11/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Neural Networks
Lecture 3
12/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
PI
E
vji
E netj
, i = 1, ..., n j = 1, ...n
netj vji
vji
E
vji
i=1 vji zi
netj
vji
netj =
E
where yj = (net
for j = 1, ..., J is signal error of hidden layer
j)
Neural Networks
Lecture 3
13/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
E yj
.
yj netj
= f 0 (netj )
=
R
X
(dk ok )f 0 (netk )
k=1
X
netk
ok wkj
=
yj
k=1
= f 0 (netj )zi
R
X
ok wkj
(1)
k=1
Neural Networks
Lecture 3
14/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
= z
(2)
Delta training rule of output layer and generalized delta learning rule
for hidden layer have fairly uniform formula.
But
Neural Networks
Lecture 3
15/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Back-Propagation Training
Neural Networks
Lecture 3
16/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Back-Propagation Training
I
I
I
After all training patterns are applied, the learning procedure stops
when the final error is below the upper bound Emax
In fig of the next page, the shaded path refers to feedforward path and
blank path is Back-Propagation (BP) mode
Neural Networks
Lecture 3
17/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Neural Networks
Lecture 3
18/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Neural Networks
Lecture 3
19/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Neural Networks
Lecture 3
20/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Neural Networks
Lecture 3
21/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Assume P samples of
the function are
known {x1 , ..., xp }
a staircase
approximation
H(w , x) of the
continuous function
h(x) is obtained
Neural Networks
Lecture 3
22/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Neural Networks
Lecture 3
23/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Kolmogorov Theorem
Any continuous function f (x1 , ..., xn ) of several variables defined on
I n (n 2) where I = [0 1], can be represented in the form
f (x) =
2n+1
X
j (
j=1
n
X
ij (xi )
i=1
Neural Networks
Lecture 3
24/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Hecht-Nielsen Theorem:
Given any continuous function f : I n R m , where I is closed unit
interval [0 1] f can be represented exactly by a feedforward neural
network having n input units, 2n + 1 hidden units, and m output units.
The activation function jthn hidden unit is
X
zj =
i (xi + j) + j
i=1
Neural Networks
Lecture 3
25/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Cybenko Theorem Let In denote the n-dimensional unit cube, [0, 1]n .
The space of continuous functions on In is denoted by C (In ). Let g be
any continuous sigmoidal
function of the form
1 as t
g
0 as t
Then the finite sums of the form
N
n
X
X
F (x) =
vi g (
wijT xj + )
i=1
j=1
are dense in C (In ). In other words, given any f C (In ) and > 0,
there is a sum F (x) of the above form for which
|F (x) f (x)| < x In
H. A. Talebi, Farzaneh Abdollahi
Neural Networks
Lecture 3
26/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
is bias
wij is weights of input layer
vi is output layer weights
Inadequate learning
Inadequate # of hidden neurons
Lack of deterministic relationship between the input and target output
Neural Networks
Lecture 3
27/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Example
I
Consider a three
neuron network
Bipolar activation
function
Objective:
Estimating a function
which computes the
length
qof input vector
d=
o12 + o22
o5 = [W [Vo]],
o = [1 o1 o2 ]
Inputs o1 , o2 are
chosen 0 < oi < 0.7
for i = 1, 2
Neural Networks
Lecture 3
28/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
= 0.2
T
The weights
are W = [0.03 3.66
2.73] ,
1.29 3.04 1.54
V =
0.97
2.61
0.52
Magnitude of error associated with each
training pattern are shown on the surface
Neural Networks
Lecture 3
29/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
= 0.4
Neural Networks
Lecture 3
30/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
= 0.4
The weights are
0.64
2.94
3.51
0.11
0.03
0.58
0.01
0.89
Neural Networks
Lecture 3
31/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Initial Weights
I
They should be chosen s.t do not make the activation function or its
derivative zero
If all weights start with equal values, the network may not train
properly.
Some improper inial weights may result in increasing the errors and
decreasing the quality of mapping.
Neural Networks
Lecture 3
32/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Error
I
In deltaPrule algorithm,
Cumulative error is calculated
P
2
(d
E = 12 Pp=1 R
k=1 pk opk )
p
1
Sometimes it is recommended to use Erms = pk
(dpk opk )2
I
I
Nerr
pk
Emax for discrete output can be zero, but in continuous output may
not be.
Neural Networks
Lecture 3
33/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Neural Networks
Lecture 3
34/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Neural Networks
Lecture 3
35/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Note that considering large number of neurons and layers may cause
overfitting and decrease the generalization capability
Number of Hidden Layers
I
Neural Networks
Lecture 3
36/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Neural Networks
Lecture 3
37/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Learning Constant
I
When broad minima yields small gradient values, larger makes the
convergence more rapid.
Neural Networks
Lecture 3
38/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Neural Networks
Lecture 3
39/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Delta-Bar-Delta
I
E (n)
The updating rule for weight is: wij (n + 1) = wij (n) ij (n + 1) w
ij (n)
E (n)
ij (n)
Neural Networks
Lecture 3
40/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Momentum method
I
Generally, if the training data are not accurate, the weights oscillate
and cannot converge to their optimum values
If the gradients in two consecutive steps have the same sign, the
momentum term is pos. and the weight changes more
Neural Networks
Lecture 3
41/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Start form A
E
w2
Neural Networks
Lecture 3
42/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Neural Networks
Lecture 3
43/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Neural Networks
Lecture 3
44/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
disadvantage: The network trained this way, may be skewed toward the
most recent patterns in the cycle.
P
X
4wp
p=1
Neural Networks
Lecture 3
45/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Normalization
I
Neural Networks
Lecture 3
46/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Offline training :
I
Online training:
I
I
I
I
Neural Networks
Lecture 3
47/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
It will end the solution even if the initial values are very far off the final
minimum.
E
The learning rule based on gradient decent alg is wkj = w
kj
Neural Networks
Lecture 3
48/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
GNA method:
I
1 w 1 ...w 1 w 2 ...w P ], e = [e , . . . , e
Define x = [w11
11
PK ]
nm 11
nm
12
I
I
e11
1
w11
e21
1
w11
e11
1
w12
e21
1
w12
eP1
1
w11
e12
1
w11
eP1
1
w12
e12
1
w12
..
.
..
.
..
..
.
.
T
and Gradient E (x) = J (x)e(x)
Hessian Matrix 2 E (x) ' J T (x)J(x)
...
...
..
.
...
...
..
.
e11
1
w1m
e21
1
w1m
..
.
eP1
1
w1m
e12
1
w1m
...
...
..
.
...
Neural Networks
Lecture 3
49/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Marquardt-Levenberg Alg
x = [J T (x)J(x) + I ]1 J T (x)e
(3)
is a scalar
I
I
In NN is adjusted properly
Neural Networks
Lecture 3
50/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Neural Networks
Lecture 3
51/52
Outline Linearly Nonseparable Pattern Error Back Propagation Algorithm Universal Approximator Learning Factors Adaptive ML
Adaptive MLP
I
Neural Networks
Lecture 3
52/52