2 Learning

1
The Process of Learning
Learning Tasks
The learning algorithm for a neural network is depended on

the learning tasks to be performed by the network. Such
learning tasks include
Pattern association
Pattern recognition
Function approximation
Filtering
Beam forming
Identification and Control
Learning Methods
Supervised Learning
This is learning with a teacher

Obviously the environment is unknown to the
neural network
Conceptually, the teacher is having knowledge of
the environment
As a result we have a set of input-output examples
This input-output examples will provide the
samples for training
Suppose we have a set of input signal (input vector)
from the environment and the teacher is capable of
2
supplying the desired response, we have a training

set
So we have an input matrix P and an output matrix
T as the training set
For a particular input, the network will give an
output which is different from the desired out put
given by the teacher
So there is an error between the actual and desired
response
This error is used then to correct the free parameters
of the network
This correction is continuous till the error between
the actual and the desired output is the same (with
in a tolerance limit)
Thus using the matrix P and T, we will train the
neural network. Then the network will be adapted to
the environment
Now suppose we give an arbitrary input (vector) to
the network, the network will supply the required
response (vector)
Unsupervised Learning
This is unsupervised learning where there is no

teacher, to oversee the learning process
This is self organized learning
This attempts to develop a network on the basis of
given sample data
3
One popular, efficient, and somewhat obvious

approach is clustering
Clustering is nothing but mode separation or class
separation
The objective is to design a mechanism that clusters
the given sample data
This can be achieved by computing similarity
In many occasions, the data fall in to easily
observable groups, where the task is simple. But in
some occasions this is not the case
To perform unsupervised learning we may use a
competitive learning rule
Fore example, let there is a neural network
consisting an input node and a competitive layer.
Here what is used is a task independent measure of
quality that the network is designed to learn
The free parameters of network will be adapted and
finally optimized on the basis of the task
independent measure mentioned early
Fundamentally the unsupervised learning
algorithms (or laws) may be characterised by first
order differential equations
These equation describe how the networks free
parameters evolve (adjust) over time (or iteration, in
the discrete case)
Here some sort of pattern associability (similarity)
is used to guide the learning process
Such an operation leads to network correlation,
clustering or competitive behavior
4
Some Superwised / Unsuperwised Learning Rules

1. Perceptron learning rule
2. Widrow-Hoff learning rule
3. Delta learning rule
4. Hebbian learnig
5. Competitive learning
1. The Rosenblatts perceptron learning rule
The learning signal is the difference between the desired

and the actual response. This learning is supervised. This
type of learning can be applied only if the neuron response
is binary (0 or 1) or bipolar (1 or 1). The weight
adjustment in this method is obtained as
wkj ( n) [d k sgn (vk ( n))] x j
if wT x 0
(vk (n)) 1 T
1 if w x0
wkj ( n 1) wkj ( n) wkj ( n)
x1
x2 (vk ) -1
vk yk dk
5
wkj ek
xm
Where n=1,2, is the iteration number, x j j 1,2,..., m is the

input, is the learning rate parameter, v k (n ) is the net
activity of the neuron k, ( v k ( n )) y k ( n ) is the out put of the
neuron k, dk is the desired response, ek (n) is the error
between the output and the desired response of the neuron
k and wkj (n ) is the correction applied to the synaptic
weight between the neuron k and the input node j=1,2, ,
m. There will be no weight correction for the cases were the
actual response and the desired response is equal.
Example
Consider a single perceptron with the set of input training
vectors (samples) and initial weight vector
1 0 1 1
2 1.5 1 1
x1 , x2 , x3 ; w (1)
0 0.5 0.5 0

1 1 1 0.5
6
Let the learning rate parameter =0.1. The teachers desired

response for x1, x 2 , x 3 are d1 1, d 2 1, and d 3 1, respectively. The
learning according to the perceptron learning rule progress
as follows:
Step 1 Input is x1 and the desired response is d1
1
2
(vk (1)) w (1)T x1 1 1 0 0.5 2.5
0

1
w ( 2) w (1) 0.1 ( 1 1) x1
1 1 0.8
1 2 0.6
0.2
0 0 0

0.5 1 0.7
0.8
0.6
(vk (2)) w (2)T x 2 0 1.5 0.5 1 1.6
0

0.7
No correction is performed in this step because

d 2 sgn (vk (2)) 1 hence; w (3) w (2)

7
0.8
0.6
(vk (3)) w (3)T x 3 1 1 0.5 1 2.1
0

0.7
w ( 4) w (3) 0.1 (1 1) x 3
0.8 1 0.6
0.6 1 0.4
0.2
0 0.5 0.1

0.7 1 0.5
This completes one epoch of training. Now the training

examples are again presented to the network. As an
exercise you may do this and commend on the result
obtained.
2. Widrow-Hoff learning rule

Here the neurons are assumed to be with linear activation
functions characterized by
y k (n) (vk ( n)) vk (n)
The correction in the weights in each time step n is

obtained as
wkj (n) ek (n) x j (n)
wkj (n 1) wkj (n) wkj (n)
Remarks:
This is learning with a teacher.
8
The output of the neuron k should be directly available

so that the desired response can be supplied.
The correction in synaptic weight applied is
proportional to the product of the error signal and the
input signal.
wkj (n)
and wkj ( n 1)
may be viewed as past and present
values of the synaptic weight wkj . In computational
terms we may write
wkj ( n) z 1[ wkj ( n 1)]
Where z 1 is the unit delay operator and represent a

storage element. We see that the error correction
learning is a closed loop control system.
Method of steepest descend

Consider the cost function (w ) of some unknown weight
vector w . The function (w ) maps w in to real numbers and
let it is continuously differentiable w.r.t w . The problem is
to find out the optimal weight vector w * such that
9
(w*) (w ) . This is an unconstrained optimization problem

which can be stated as follows:
Minimize the cost function (w ) with respect to the
weight vector w .
In this method the correction in weight is applied in the
direction of steepest descent, that is, in a direction opposite
to the gradient vector (w ) where
T

, ,,
w1 w2 wm

T

( w ) , ,,
w1 w2 wm

Now the weight correction is effected as

w (n 1) w (n) (w )
w (n 1) w (n) w ( n)
w (n) ( w )
Using the first order Taylor series expansion around w (n) to

approximate ( w ( n 1))
( w ( n 1)) ( w ( n)) ( ( w ( n)))T ( w ( n))

( w ( n)) ( ( w ( n)))T ( w ( n))
2
( w ( n)) ( w )
Thus we see that ( w ( n 1)) ( w ( n)) ie, the performance index

decreases iteration after iteration. Finally it converges to
the optimal solution w*. The convergence behavior
depends on the learning rate parameter. The following
points are worth noting:
10
When is small, the transient response of the

algorithm is over damped and the trajectory traced by
w(n) take a smooth but slow path in the w-plane
When is large, the transient response of the
algorithm is under damped and the trajectory traced by
w(n) take an fast but oscillatory path in the w-plane
When exceeds a critical value, the algorithm
becomes unstable.
3. The delta learning rule

The delta learning rule is also built around a single neuron
and is valid only for continuous activation functions. This
can be achieved by minimizing a cost function or
performance index. Since we are interested in the error
correction learning, this performance index can take the
form
(w ) 12 (ek 2 )
1 1
(d k yk ) 2 (d k (vk )) 2
2 2
Where w [ wkj ] . The cost function (w ) denotes the

instantaneous energy, which can be used to make the
necessary changes in the synaptic weights. This is
11
obviously error correction learning. From the steepest

descend algorithm, the minimization of error requires the
weight changes to be in the direction of the negative
gradient, we take
w ( w ) where the operator
T

, , , and
w
1k w 2k w mk
T

( w ) , , ,
w1k w2 k wmk
For a particular k. Now, the components of the gradient vector are

(w )
(d k (vk )) ' (vk )
wkj
Since the minimization of the error requires the changes in

weight to be in the negative gradient direction, we have
wkj ( d k (vk )) ' (vk ) x j
ek ' ( v k ) x j
Exapmple:
Consider the set of input training vectors and initial weight
vector
1 0 1 1
2 1.5 1
x1 , x 2 , x 3 ; w 1 1
0 0.5 0.5 0

1 1 1 0.5
12
Let the learning rate parameter =0.1. The teachers desired

response for x1, x 2 , x3 are d1 1, d 2 1, and d3 1, respectively. Let
the continuous bipolar activation function be
1 e vk
(v k )
1 e vk
2e vk 1
' (v k ) (1 2 (vk ))
vk 2 2
(1 e )
Such an activation function is continuous and bipolar. Here

the slope of activation function is expressed in terms the
output signal of the neuron. For the given learning rate
parameter the delta rule training can be summarized as
follows:
Step 1 We will present the first input sample x1 and the

initial weight vector w1 , yielding
v1k ( w1 )T x1 2.5
y1k (v1k ) 0.848
1
' (v1k ) [1 2 (v1k )] 0.140
2
0.974
0.948
w 2 w1 0.1[ d1 (v1k )] ' (v1k ) x1
0

0.526
Step 2 We will present the second input sample x 2 and the

weight vector w2 , yielding
13
vk2 ( w 2 )T x 2 1.948
yk2 (vk2 ) 0.75
1
' (vk2 ) [1 2 (vk2 )] 00.218
2
0.974
0.956
w 3 w 2 0.1[ d 2 (vk2 )] ' (vk2 ) x 2
0.002

0.531
Step 3 We will present the second input sample x 3 and the

weight vector w3 , yielding
v k3 (w 3 ) T x 3 2.46
y k3 (v k3 ) 0.842
1
' (v k3 ) [1 2 (v k3 )] 0.145
2
0.947
0.929
w w 0.1[d 3 (v k )] ' (v k ) x 3
4 3 3 3
0.016

0.505
Since the desired values are 1 or -1, correction is applied in

every step. Since the algorithm did not converge, the
training samples should be presented again.
All the learning methods, we have seen so far, involves a

single output neuron k at a time. In what follows a learning
14
method where a layer of output neurons k=1,2,,p is

involved.
4. Hebbian learning
To Donald Hebb in his famous book organizational
behavior (1949)
When an axon of cell A is near enough to excite a cell
B and repeatedly or persistently takes part in firing it,
some growth process or metabolic change takes place
in one or both cells such that A's efficiency, as one of
the cells firing B, is increased.
The above statement is in a neurobiological sense. For
more complex kinds of learning, almost every learning
modal that has been proposed, involves both output activity
and input activity in the learning rule. The essential idea is
that the amount of synaptic change is a function of both
pre-synaptic and post-synaptic activity. Based on the above
fact, Hebbian learning is the oldest and most famous of all
learning rules
15
The above statement is made in a neurobiological context.

We may expand and rephrase it as a two part rule
If two neurons on either side of a synaptic connection

are activated simultaneously (synchronously), then
the strength of the synapse is selectively increased
If two neurons on either side of a synaptic connection
are activated not simultaneously (asynchronously),
then the strength of the synapse is selectively
decreased
Hebbian learning can be applied for neurons with binary
and continuous activation function. Putting the above
mathematically:
Consider the single neuron k. The net activity of the neuron

k is obtained as;
vk ( n) w T ( n) x( n)
yk ( n) ( w T ( n) x( n))
In the case of bipolar continuous activation function and,

y k (n) sign(vk (n)) 1
in the case of a logistic bipolar activation function

16
The corresponding weight is effected as

w (n) f ( yk (n), x(n))
In the above the function f can take a veriety of different

forms. One such form is,
w ( n) y k (n) x(n)
w ( n 1) w ( n) w (n) OR
wkj (n) y k (n) x j (n)
wkj (n 1) wkj (n) wkj (n)
Where j denotes the neuron just before the neuron k.
Example
Consider a single perceptron with the set of input training
vectors (samples) and initial weight vector
1 1 0 1
2 0.5 1 1
x1 , x2 , x3 ; w (1)
1.5 2 1 0

0 1 .5 1.5 0.5
Assume the learning rate 1 and the nonlinear activation

as bipolar hard limiting function with the out put as 1
Step 1
T
1 1
T 1 2
vk w1 x1 3
0 1.5

0.5 0
The updated weight is

17
2
3
w 2 w1 sgn(v k ) x1 w1 x1
1.5

0.5
Step 2
T
2 1`
3 0.5
vk w 2 x 2 0.25
1.5 2

0.5 1.5
1

3 2 2 2 2 2.5
w w sgn(vk ) x w x
3.5

2
Step 3
1
3.5
w 4 w 3 sgn( y k ) x 3 w 3 x 3
4.5

0.5
Exercise:
Repeat the same problem using a continuous bipolar
activation function instead of the logistic function
18
5. The competitive learning

With no available information regarding the desired
outputs, unsupervised learning networks update weights
only on the basis of input patterns. The competitive
learning network is a popular scheme to achieve this type
of unsupervised data clustering or classification.
Here we have p number of output neurons. The output of
the winning neuron k is set equal to one, and for all others
the output equal to zero.
1 if vk vi for all i , i k
y k 0 otherwise

The weights connected to the neuron k is normalized as,

w
j
kj 1
for all k
19
The weight correction is effected as

(x wkj ) if neuron k wins the compititio n
wkj
0
j
if neuron k losses the compition

O1
Ok

Op
Now consider three inputs that fall in to the range [0,1].
One can see that all the activities are taking place on the
surface of a unit sphere. The rule has the overall effect of
moving the synaptic weight of the winning neuron towards
the input pattern x . So the final result is that the weight
vector of the winning neuron k orients itself towards the
input pattern x .
Using Euclidean distance as dissimilarity measure is a more

general scheme of competitive learning, in which the
activation of the output unit k is
0.5
3 2

vk x j wkj
x wK
j 1

The weight correction is effected as
w ( n) x (n) w ( n)
kj j kj
wkj ( n 1) wkj ( n) wkj (n)
And in vector terms

w ( n) x( n) w k ( n)
w ( n 1) w ( n) w ( n)
In this case neither the data nor the weights need be of unit
length.
Exercise
20
The weight correction proposed in the competitive learning

method may be viewed as that in the steepest descent
method so as to minimize a cost function. What is that cost
function?

2 Learning

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2 Learning

Uploaded by

Copyright:

Available Formats

1

The Process of Learning

The learning algorithm for a neural network is depended on

This is learning with a teacher

supplying the desired response, we have a training

This is unsupervised learning where there is no

One popular, efficient, and somewhat obvious

Some Superwised / Unsuperwised Learning Rules

1. The Rosenblatts perceptron learning rule

The learning signal is the difference between the desired

Where n=1,2, is the iteration number, x j j 1,2,..., m is the

Let the learning rate parameter =0.1. The teachers desired

Step 2 Input is x2 and the desired response is d2

No correction is performed in this step because

Step 3 Input is x3 and the desired response is d3

This completes one epoch of training. Now the training

2. Widrow-Hoff learning rule

The correction in the weights in each time step n is

The output of the neuron k should be directly available

Where z 1 is the unit delay operator and represent a

Method of steepest descend

(w*) (w ) . This is an unconstrained optimization problem

Now the weight correction is effected as

Using the first order Taylor series expansion around w (n) to

( w ( n 1)) ( w ( n)) ( ( w ( n)))T ( w ( n))

Thus we see that ( w ( n 1)) ( w ( n)) ie, the performance index

When is small, the transient response of the

3. The delta learning rule

Where w [ wkj ] . The cost function (w ) denotes the

obviously error correction learning. From the steepest

For a particular k. Now, the components of the gradient vector are

Since the minimization of the error requires the changes in

Let the learning rate parameter =0.1. The teachers desired

Such an activation function is continuous and bipolar. Here

Step 1 We will present the first input sample x1 and the

Step 2 We will present the second input sample x 2 and the

Step 3 We will present the second input sample x 3 and the

Since the desired values are 1 or -1, correction is applied in

All the learning methods, we have seen so far, involves a

method where a layer of output neurons k=1,2,,p is

The above statement is made in a neurobiological context.

If two neurons on either side of a synaptic connection

Consider the single neuron k. The net activity of the neuron

In the case of bipolar continuous activation function and,

in the case of a logistic bipolar activation function

The corresponding weight is effected as

In the above the function f can take a veriety of different

Where j denotes the neuron just before the neuron k.

Assume the learning rate 1 and the nonlinear activation

The updated weight is

5. The competitive learning

The weights connected to the neuron k is normalized as,

The weight correction is effected as

Using Euclidean distance as dissimilarity measure is a more

wkj ( n 1) wkj ( n) wkj (n)

And in vector terms

The weight correction proposed in the competitive learning

You might also like