Professional Documents
Culture Documents
The basic computational unit in the nervous system is the nerve cell, or
neuron. A neuron has:
Dendrites (inputs)
Cell body
Axon (output)
Slide 6
Biological neuron
A neuron has
n
n
The information circulates from the dendrites to the axon via the
cell body
Axon connects to dendrites via synapses
n
n
Biological inspiration
Dendrites
Axon
10
Biological inspiration
axon
dendrites
synapses
The information transmission happens at the synapses.
11
Biological inspiration
12
Biological inspiration
13
Biological inspiration
Presynaptic Neuron
Postsynaptic Neuron
Interconnections in Brain
15
Brain Computation
The human brain contains about 10 billion nerve cells, or
neurons. On average, each neuron is connected to other
neurons through approximately 10,000 synapses.
16
ANN
Speed
Few ms.
Depends on designer
Storage capacity
Tolerance
Control mechanism
Complicated involves
chemicals in biological
neuron
Simpler in ANN
17
NNs vs Computers
Digital Computers
Computation is centralized,
synchronous, and serial.
Exact.
Static connectivity.
Neural Networks
Interconnections
Learning rules
Activation function
19
Types of Learning
Supervised learning : In this kind of learning, both the inputs and the
outputs are well determined and supplied to the training algorithm.
Hence whenever an input is applied, we can calculate the error. We
try to adjust the weights in such a way that this error is reduced.
Types of Learning
Unsupervised learning: In this type of learning, the target outputs are
unknown. The inputs are applied, and the system is adjusted based on these
inputs only. Either the supporting weights of the problem are added or the
dissipative nodes are decreased. In either case, the system changes according
to the inputs.
Types of Learning
Reinforcement learning: This type of learning is based on the reinforcement
process. In this system, the input is applied. Based on the output, the system
either gives some reward to the network or punishes the network. In this
learning technique, the system tries to maximize the rewards and minimize
the punishment. The basic block diagram is given
23
Models of Neuron
McCulloch-Pitts Model:
In McCulloch-Pitts model the activation (x) is given by a weighted sum of its
M input values (ai) & a bias term (). The output signal (s) is typically a
nonlinear function f (x) of the activation value x. The following equation
describe the operation of an MP model:
M
x= wi ai -
Activation:
j=1
Output Signal:
a1
w1
a2
w2
aM
input
s=f (x)
wM
weights
(fixed)
M
wi ai
i=1
s= f (x)
25
Sensory units
Association
units
a
A1
1
A2
AM
weights
w1
a2
w2
aM
wM
(adjustable)
s= f (x)
Activation:
x= wi ai -
j=1
Output Signal:
Error:
Weight change:
s=f (x)
= b - s
wi = ai
a1
w1
a2
w2
aM
wM
input
weights
M
wi ai
i=1
Activation value
Output Signal
s= f (x)
Activation:
x= wi ai -
j=1
Output Signal:
Error:
s=f (x)
= b s = b - x
Weight change:
wi = ai
29
30
The parameter gain affects the slope of the function around zero
31
outputi
e activationi e activationi
= activationi
e
+ e activationi
Has a shape similar to the sigmoid (like an S), with the difference being that
the value of outputi ranges between 1 and 1.
32
x1
Input
values
x2
w2
!
xm
!
wm
weights
Induced
Field
Summing
function
Activation
function
Output
()
y
n 1
y = f w0 + wi xi
i =1
w0
x1
x2
x3
35
Feedback network
When outputs are directed back as inputs to same or preceding layer nodes it
results in the formation of feedback networks
36
37
Dr R R Janghel
38
ADALINE Model
Dr R R Janghel
39
ADALINE Netwok
Initialize
Training
Thinking
Initialize
n Assign random weights to all links
Training
n Feed-in known inputs in random sequence
n Simulate the network
n Compute error between the input and the output
(Error Function)
n Adjust weights (Learning Function)
n Repeat until total error <
Thinking
n Simulate the network
n Network will respond to any input
n Does not guarantee a correct solution even for
trained inputs
40
Multilayer Perceptron
Output Values
Output Layer
Adjustable
Weights
Input Layer
41
Dr R R Janghel
42
43
44
Example (contd)
Each neuron is composed of two units. First unit adds products of weights
coefficients and input signals. The second unit realize nonlinear function,
called neuron activation function. Signal e is adder output signal, and y = f (e)
is output signal of nonlinear element. Signal y is also output signal of neuron.
45
Example (contd)
To teach the neural network we need training data set. The training data set
consists of input signals(x1 and x2 ) assigned with corresponding target (desired
output) z.
Pictures below illustrate how signal is propagating through the network,
Symbols w(xm)n represent weights of connections between network input xm and
neuron n in input layer. Symbols yn represents output signal of neuron n.
46
Example (contd)
47
Example (contd)
Propagation of signals through the hidden layer. Symbols wmn represent
weights of connections between output of neuron m and input of neuron n in
the next layer.
48
Example (contd)
Propagation of signals through the output layer.
In the next algorithm step the output signal of the network y is compared
with the desired output value (the target), which is found in training data
set. The difference is called error signal d of output layer neuron.
49
Example (contd)
It is impossible to compute error signal for internal neurons directly, because
output values of these neurons are unknown. The idea is to propagate error
signal d (computed in single teaching step) back to all neurons, which output
signals were input for discussed neuron.
50
Example (contd)
The weights' coefficients wmn used to propagate errors back are equal to this
used during computing output value. Only the direction of data flow is
changed.
51
Example (contd)
When the error signal for each neuron is computed, the weights coefficients
of each neuron input node may be modified. In formulas below df(e)/de
represents derivative of neuron activation function (which weights are
modified).
52
Example (contd)
53
Example (contd)
54
Example (contd)
1 .
1+ e-(-)
56
X2
.
.
Xp
x1
h1
y1
x2
h2
y2
xp3
hm3
yn3
Input layer
Hidden layer
Output layer
57
58
60
61
Derivation (contd)
Input of hidden layer:
p
hj = wjk * xk ..(1)
k=1
Output of hidden layer:
vj = f(hj)
-h
vj = 1/(1+ e j) ...(2)
62
Derivation (contd)
Input of output layer:
m
gi= wij * vj ..(3)
j=1
Output of output layer:
yi = f(gi)
g
yi = 1/(1+e- i) ..(4)
63
Derivation (contd)
Error function:
n
E(t)= 1/2 (yid - yi )2..(5)
i=1
Weight function:
wij(t+1)= wij(t) + wij(t).....(6)
64
Derivation (contd)
Updating weights between output layer and hidden layer:
n
E(t) = E(t) * yi
(8)
wij(t) i=1 yi
wij(t)
From equation (4) and (5):
yi = 1/(1+e-gi)
E(t)= 1/2 (yid - yi )2
E(t) = - (yid - yi ) .(9)
yi
65
Derivation (contd)
-g
yi
= (1+e i) -1
wij(t)
wij
yi
= yi
wij(t) gi
gi
wij
m
-g
yi
= (1+e i) -1
wij(t)
gi
( wijvj)
j=1
wij
66
Derivation (contd)
yi
= -1(-e-gi)
wij(t)
(1+e-gi) 2
yi
= 1/(1+e-gi)
(1-yi)
= 1- 1/(1+e-gi)
(1-yi)
= e-gi/(1+e-gi)
vj(10)
yi(1-yi) = e-gi/(1+e-gi)2
67
Derivation (contd)
substituting this value in eq (10)
yi = (1-yi) yi * vj(11)
wij(t)
substituting the value of eq (9) and (11) in eq (8)
E(t) = -(yid - yi ) * (1-yi) yi * vj ....(12)
wij(t)
68
Derivation (contd)
i = -yi(1-yi)(yid yi)---------------------------- (13)
E(t) = - ni=1 i vj------------------------------(14)
wij(t)
where wij(t) = - E(t)
wij(t)
wij(t) = i vj-----------------------------------(15)
Hence updated weight will be
wij(t+1) = wij (t) + i vj-----------------------(16)
69
Derivation (contd)
Updating weights between hidden layer and input layer:
n
Derivation (contd)
D = yi = yi(1-yi) ----------------------------------(21)
gi
E = gi = wij ----------------------------------------(22)
vj
F = vj = (1 + e-hj)-1 = vj(1-vj)-----------------(23)
hj hj
G = hj = xk -----------------------------------------(24)
wjk
Derivation (contd)
Substituting the values of equation (19),(20),(21),(22), (23), (24) into
equation (18), we get
E(t) = -(yid yi) * yi(1-yi) * vj(1-vj) xk wij
wjk(t)
= -vj(1-vj) xk ni=1 iwij
where i = yi(yid yi)(1-yi)
j = -vj(1-vj) ni=1 iwij
Change in weight between hidden & input layer is:
wij(t) = j xk
Hence updated weight will be
wjk(t+1) = wjk (t) + j xk
72
Momentum
The training algorithm is an iterative algorithm, which means that at each
step, the algorithm tries to move in such a way that the total error is reduced.
In the error graph the deeper valley corresponds to a lower error, and hence
to a better configuration of the ANN, than the shallower valley. The deeper
valley is known as the global minima, which is the least error in the entire
surface. The shallower valley is called the local minima.
Momentum
The training algorithm is trying to move in such a way that error is
reduced, it keeps following the local minima.
It can easily be seen that if it continues to move in the same direction, it
would eventually attain the global minima.
The momentum keeps pushing the training algorithm to continue moving in
the previous direction, making it possible for the training algorithm to
escape out of the local minima.
The meaning of momentum in this case is analogous to the meaning of
momentum in the physical world.
For example, a ball moving has momentum that keeps it moving in the
same direction.
Stopping Condition
The algorithm stops according to the stopping condition. Normally one or
more of the following criteria are used as stopping conditions:
Time: The algorithm may be stopped when the time taken to
execute exceeds more than a threshold.
Epoch: The algorithm has a specified maximum number of epochs.
Upon exceeding this number, the algorithm may be stopped
Goal: The algorithm may be stopped if the error measured by the
system reduces to more than a specific value. It may not be useful to
continue training after this point.
Stopping Condition
Validating data: If the error on validation data starts increasing, even
if there is a decrease in the testing data, it would be better to stop further
training.
Gradient: Gradient refers to the improvement of the performance or
the lowering of the error in between epochs.
79