You are on page 1of 25

1

2.1 The Process of Learning


2.1.1 Learning Tasks
The learning algorithm for a neural network is depended on
the learning tasks to be performed by the network. Such
learning tasks include

Pattern association
Pattern recognition
Function approximation
Filtering
Beam forming
Identification and Control

2.2 Learning Methods


2.2.1 Supervised Learning
This is learning with a teacher
Obviously the environment is unknown to the
neural network
Conceptually, the teacher is having knowledge of
the environment
As a result we have a set of input-output examples
This input-output examples will provide the
samples for training
Suppose we have a set of input signal (input vector)
from the environment and the teacher is capable of
supplying the desired response, we have a training
set

So we have an input matrix P and an output matrix


T as the training set
For a particular input, the network will give an
output which is different from the desired out put
given by the teacher
So there is an error between the actual and desired
response
This error is used then to correct the free parameters
of the network
This correction is continuous till the error between
the actual and the desired output is the same (with
in a tolerance limit)
Thus using the matrix P and T, we will train the
neural network. Then the network will be adapted to
the environment
Now suppose we give an arbitrary input (vector) to
the network, the network will supply the required
response (vector)
2.2.2 Unsupervised Learning
This is unsupervised learning where there is no
teacher, to oversee the learning process
This is self organized learning
This attempts to develop a network on the basis of
given sample data
One popular, efficient, and somewhat obvious
approach is clustering

Clustering is nothing but mode separation or class


separation
The objective is to design a mechanism that clusters
the given sample data
This can be achieved by computing similarity
In many occasions, the data fall in to easily
observable groups, where the task is simple. But in
some occasions this is not the case
To perform unsupervised learning we may use a
competitive learning rule
Fore example, let there is a neural network
consisting an input node and a competitive layer.
Here what is used is a task independent measure of
quality that the network is designed to learn
The free parameters of network will be adapted and
finally optimized on the basis of the task
independent measure mentioned early
Fundamentally
the
unsupervised
learning
algorithms (or laws) may be characterised by first
order differential equations
These equation describe how the networks free
parameters evolve (adjust) over time (or iteration, in
the discrete case)
Here some sort of pattern associability (similarity)
is used to guide the learning process
Such an operation leads to network correlation,
clustering or competitive behavior

2.3 Some Supervised / Unsupervised Learning Rules


1. Perceptron learning rule
2. Widrow-Hoff learning rule
3. Delta learning rule
4. Hebbian learning
5. Competitive learning
1. The Rosenblatts perceptron learning rule
The learning signal is the difference between the desired
and the actual response. This learning is supervised. This
type of learning can be applied only if the neuron response
is binary (0 or 1) or bipolar (1 or 1). The weight
adjustment in this method is obtained as

wkj (n) [d k sgn (vk (n))] x j


1 if wT x0
(vk (n))
T
1 if w x0
wkj (n 1) wkj (n) wkj (n)

x1
(vk )

x2

vk

wkj

-1

dk

yk

ek

xm
Where n=1,2, is the iteration number, x j j 1,2,..., m is
the input, is the learning rate parameter, vk (n) is the net
activity of the neuron k, (vk (n)) yk (n) is the out put of
the neuron k, d k is the desired response, ek (n) is the error
between the output and the desired response of the neuron k
and wkj (n) is the correction applied to the synaptic
weight between the neuron k and the input node j=1,2, ,
m. There will be no weight correction for the cases were the
actual response and the desired response is equal.

Example
Consider a single perceptron with the set of input training
vectors (samples) and initial weight vector
1
0
1
1
2
1.5
1
1
, x3 ; w (1)
x1 , x 2
0
0.5
0.5
0



1
1
1
0.5

Let the learning rate parameter =0.1. The teachers desired


response

for

x1, x 2 , x3 are d1 1, d 2 1, and d3 1,

respectively. The learning according to the perceptron


learning rule progress as follows:
Step 1 Input is x1 and the desired response is d1
1
2
T
(vk (1)) w(1) x1 1 1 0 0.5 2.5
0

1
w ( 2) w (1) 0.1 ( 1 1)x1
1
1 0.8
1
2 0.6

0.2
0
0 0

0.5
1 0.7

Step 2 Input is x 2 and the desired response is d 2


1
1
T
(vk (2)) w(2) x 2 0.8 0.6 0 0.7 1.6
0.5

1

No correction is performed in this step because


d2 sgn (vk (2)) 1 hence; w(3) w(2)

Step 3 Input is x 3 and the desired response is d 3


1
1
T
(vk (3)) w(3) x3 0.8 0.6 0 0.7 2.1
0.5

1

w (4) w (3) 0.1 (1 1)x 3


0.8
1 0.6
0.6
1 0.4
0.2

0
0.5 0.1

0.7
1 0.5

This completes one epoch of training. Now the training


examples are again presented to the network. As an
exercise you may do this and commend on the result
obtained.

2. Widrow-Hoff learning rule


Here the neurons are assumed to be with linear activation
functions characterized by

yk (n) (vk (n)) vk (n)


The correction in the weights in each time step n is
obtained as

wkj (n) ek (n) x j (n)


wkj (n 1) wkj (n) wkj (n)
Remarks:
This is learning with a teacher.

The output of the neuron k should be directly available


so that the desired response can be supplied.

The correction in synaptic weight applied is


proportional to the product of the error signal and the
input signal.
wkj (n) and wkj (n 1) may be viewed as past and
present values of the synaptic weight wkj . In
computational terms we may write

wkj (n) z 1[wkj (n 1)]


1
Where z is the unit delay operator and represent a

storage element. We see that the error correction


learning is a closed loop control system.
3. The delta learning rule
The delta learning rule is also built around a single
neuron and is valid only for continuous activation
functions. This can be achieved by minimizing a cost
function

or

performance

index.

Since

we

are

interested in the error correction learning, this


performance index can take the form

(w ) 12 (ek 2 )
1
1
(d k y k ) 2 (d k (vk ))2
2
2
Where w [ wkj ] . The cost function (w ) denotes
the instantaneous energy, which can be used to make
the necessary changes in the synaptic weights. This

10

is obviously error correction learning. From the


steepest descend algorithm, the minimization of error
requires the weight changes to be in the direction of
the negative gradient, we take

w (w ) where the operator


,
, ,

wmk
w1k w2 k

and


( w )
,
, ,

w
2k
mk
1k

For a particular k. Now, the components of the


gradient vector are

(w )
(d k (vk )) ' (vk )
wkj
Since the minimization of the error requires the
changes in weight to be in the negative gradient
direction, we have

wkj (d k (vk )) ' (vk ) x j


ek ' (vk ) x j

11

Exapmple:
Consider the set of input training vectors and initial
weight vector
1
0
1
1
2
1.5
1
1
1
, x3 ; w
x1 , x 2
0
0.5
0.5
0



1
1
1
0.5

Let the learning rate parameter =0.1. The desired


response for the three inputs are

d1 1, d 2 1, and d3 1,

respectively.

Let the continuous bipolar activation function be

(v k )

1 e vk
1 e vk
2e vk

1
' (v k )
(1 2 (vk ))
(1 e vk ) 2 2
Such an activation function is continuous and bipolar.
Here the slope of activation function is expressed in
terms the output signal of the neuron. For the given
learning rate parameter the delta rule training can be
summarized as follows:

12

Step 1 We will present the first input sample x1 and


1
the initial weight vector w , yielding

v1k (w1 )T x1 2.5


y1k (v1k ) 0.848
1
' (v ) [1 2 (v1k )] 0.140
2
1
k

0.974
0.948

w 2 w1 0.1[d1 (v1k )] ' (v1k ) x1


0

0
.
526

Step 2

We will present

the second input sample

x 2 and the weight vector w 2 , yielding

vk2 ( w 2 )T x 2 1.948
yk2 (vk2 ) 0.75

' (vk2 )

1
[1 2 (vk2 )] 00.218
2

13

0.974
0.956

w 3 w 2 0.1[d 2 (vk2 )] ' (vk2 ) x 2


0.002

0
.
531

Step 3 We will present the second input sample


3
x 3 and the weight vector w , yielding

v k3 (w 3 ) T x 3 2.46
y k3 (v k3 ) 0.842
1
2

' (v k3 ) [1 2 (v k3 )] 0.145
0.947
0.929

w 4 w 3 0.1[d 3 (v k3 )] ' (v k3 ) x 3
0.016

0
.
505

Since the desired values are 1 or -1, correction is


applied in every step. Since the algorithm did not
converge, the training samples should be presented
again.

14

Example:
Consider a single perceptron with the set of input
training vectors (samples) and initial weight vector
1
1
0
1
2
0.5
1
1
, x3 ; w(1)
x1 , x 2
1.5
2
1
0



0
1.5
1.5
0.5

Assume the learning rate

0.5

and the nonlinear

activation as bipolar hyperbolic tangent function,

av
( v ) tan h
2
1 e av
2

1
av
1 e av 1 e
This is bipolar continuous activation function lies
between 1 and 1 as a .

15

4. Hebbian learning
To Donald Hebb in his famous book organizational
behavior (1949)
When an axon of cell A is near enough to excite a cell
B and repeatedly or persistently takes part in firing it,
some growth process or metabolic change takes place
in one or both cells such that A's efficiency, as one of
the cells firing B, is increased.
The above statement is in a neurobiological sense. For
more complex kinds of learning, almost every learning
modal that has been proposed, involves both output activity
and input activity in the learning rule. The essential idea is
that the amount of synaptic change is a function of both
pre-synaptic and post-synaptic activity. Based on the above
fact, Hebbian learning is the oldest and most famous of all
learning rules
The above statement is made in a neurobiological context.
We may expand and rephrase it as a two part rule

16

If two neurons on either side of a synaptic connection


are activated simultaneously (synchronously), then
the strength of the synapse is selectively increased
If two neurons on either side of a synaptic connection
are activated not simultaneously (asynchronously),
then the strength of the synapse is selectively
decreased
Hebbian learning can be applied for neurons with binary
and continuous activation function. Putting the above
mathematically:
Consider the single neuron k. The net activity of the neuron
k is obtained as;

vk (n) w T (n) x(n)


yk (n) (w T (n) x(n))
The corresponding weight is effected as

w(n) f ( yk (n), x(n))


In the above the function f can take a veriety of different
forms. One such form is,

17

w(n) yk (n) x(n)


w(n 1) w(n) w(n)

OR

wkj (n) yk (n) x j (n)


wkj (n 1) wkj (n) wkj (n)
Where j denotes the neuron just before the neuron k.

Hebbian Learning cont..

w jk
xj
Pre-synaptic N. j

yk
Post-synaptic N. k

w jk is the synaptic weight between the pre and post


synaptic variables.
The following remarks about the Hebbian learning are
in order.
It is the most natural learning of all types of all
other types of learning.
There is strong psychological evidence for
Hebbian learning.
Hebbian learning (memory) is taking place in the
area of the brain called hippocampus

18

It is most natural of all other types of learning.


All types of learning (memory) can be classified
as Hebbian, anti Hebbian (both Hebbian in
nature) or non-Hebbian (other types of
supervised learning).
It is unsupervised learning, but the output of the
post synaptic neuron is available in
aneurobiological sense, or you can compute it
from the mathematical expressions you have
formulated for your work.

This learning is localized in nature, since only two


neurons are involved and refers to short term
memory. It is already mentioned that it is

19

happening in the area of the brain called


hippocampus, as shown in the figure above and
it connects the left and right side of the memory.
In due course it turns out to be a long term
memory, if required.

In Alzheimer's disease, the hippocampus is one


of the first regions of the brain to suffer damage.
memory problems and disorientation appear
among the first symptoms.

Another example of the function f is

wkj (n) ( x j (n) x (n)) ( yk (n) y (n))


Where

x (n) and y (n) are the time depended average

of the corresponding variables.


Exercise: A generalized Hebbian rule is described by

w kj (n) f ( yk (n), x(n))


f ( yk (n)) g ( x j (n)) wkj (n) f ( yk (n))
Where f is the derivative w.r.t. its arguments.
Obtain the following:

20

(i) a plot between w kj (n) and w kj (n)


(ii) the balance point where w kj (n) 0
(iii) the maximum depression where w kj (n) is
minimum

w kj (n)

w kj (n)

g ( x j (n))

wkj (n) f ( yk (n))

(balance point)

(max. depression)

Method of steepest descend


Consider the cost function (w ) of some unknown
weight vector w . The function (w ) maps w in to real
numbers and let it is continuously differentiable w.r.t

w . The problem is to find out the optimal weight


vector

w * such that (w*) (w ) . This is an

21

unconstrained optimization problem which can be


stated as follows:
Minimize the cost function (w ) with respect
to the weight vector w .

In this method the correction in weight is applied in


the direction of steepest descent, that is, in a direction
opposite to the gradient vector (w ) where

,
, ,

w
1
2
m



( w )
,
, ,

w
2
m
1

Now the weight correction is effected as

w (n 1) w (n) (w )
w (n 1) w (n) w (n)
w (n) (w )

22

Using the first order Taylor series expansion around

w (n) to approximate ( w ( n 1))

(w (n 1)) (w (n)) ( (w (n)))T (w (n))


(w (n)) ( (w (n)))T (w (n))
( w (n)) (w (n))2
Thus we see that

(w (n 1)) (w (n)) ie, the

performance index decreases iteration after iteration.


Finally it converges to the optimal solution w*. The
convergence behavior depends on the learning rate
parameter. The following points are worth noting:
When is small, the transient response of the
algorithm is over damped and the trajectory
traced by w(n) take a smooth but slow path in the
w-plane
When

is large, the transient response of the

algorithm is under damped and the trajectory


traced by w(n) take a fast but oscillatory path in
the w-plane

23

When

exceeds a critical value, the algorithm

becomes unstable.

5. The competitive learning


Here we have p number of output neurons. The output of
the winning neuron k is set equal to one, and for all others
the output equal to zero.

1 if vk vi for all
yk 0 otherwise

i, i k

The weights connected to the neuron k is normalized as,

wkj 1
i

for all k

The weight correction is effected as

( x j wkj ) if neuron k wins the compitition

wkj 0
if neuron k losses the compition

O1
Ok

Op

24

The rule has the overall effect of moving the synaptic


weight of the winning neuron towards the input pattern
x . So the final result is the weight vector of the winning
neuron k orient itself towards the input pattern x .
Example: Consider the delta learning and the Hebb,s
rule, whose learning signals is given by

w ji e j xi and
w ji y j xi

Distinguish between them.


Ans: (i) Both the rules involves multiplication of the
term

e j xi

(ii) The error of the neuron j in the delta rule is replaced


by output of the neuron j in the Hebb,s rule
(iii) The delta rule requires a desired response where as the
Hebbs rule does not.

Exercise: A generalized Hebbian rule is described by

w kj (n) f ( yk (n), x(n))


f ( yk (n)) g ( x j (n)) wkj (n) f ( yk (n))
Obtain the following:

25

(i) a plot between w kj (n) and w kj (n)


(ii) the balance point where w kj (n) 0
(iii) the maximum depression where w kj (n) is
minimum

w kj (n)

w kj (n)

g ( x j (n))

wkj (n) f ( yk (n))


(max. depression)

(balance point)

You might also like