Artificial Neural Networks

MODULE-IV
STATISTICAL METHODS IN ANN
Module 4 Statistical Methods: Boltzmann's Training - Cauchy training - Artificial specific heat methods - applications to general non-linear optimization problems
Statistical Methods are used for
Training ANN Producing output from trained network

Training Methods
Deterministic Methods Statistical Training Methods
Deterministic Training Method
Follows a step by step procedure. Weights are changed based on

values of weight.
their current
It
also based on the desired output and the
actual output.
E.g.:-Perceptron Training Algorithm.

Back Propagation Algorithm etc
Statistical Training Methods
Make pseudo random change in the weights Retains

only those change which results in
improvements.
GENERAL PROCEDURE ( FOR STTISTICAL TRAINING METHOD)
Apply
output
a set of input and compute the resulting
Compare the result with target, find the error.

The objective of the training is to minimize the error.
Select a weight in random and adjust it by a small

random amount.
If
the adjustment improves our objective retain
the change
Otherwise
value
return the weight to the previous
Repeat
the procedures until the network is
trained to the desired level
The local minima problem
The objective function minimization problem can get trapped in poor solution.
Objective Function
B
Weight
If the objective function is at A and if the random weight changes are small then the weight adjustment will be rejected. The superior weight setting at point B will never found and the system will be trapped in local minima instead of global minima at point B.
If the random weight changes are large both point

A and B are visited frequently, but so will every other point.
The weight will change so drastically that it will

never settle at desired point.
Solution & Explanation
Statistical method overcome local minima problem by a weight adjustment strategy.

Example:
Let
the fig. represents a ball on a surface in a
box. If the box is shaken violently ,then the ball will move rapidly from one side to the other side. The probability to occupy any point on the surface is equal for all points.
If the violence
of shaking is gradually reduced the ball
will stick to both point A and B.
If the shaking is again reduced it will settle to point B.
The ANN are trained in the same way as through random weight adjustment. At first large random adjustment are made. The weight change that improves the objective function is retained. The average step size is hence gradually reduced to reach global minimum.
Annealing [ Boltzmann Law ] Annealing:-If a metal is raised to a temperature above melting point ,the atoms are in violent random motion. The atoms always tend to reach a minimum energy state. As the metal is gradually cooled the atoms enters a minimum possible energy state corresponds to each temperature.
P(e) exp
( e / kT )
P(e)=probability that the system is in a state with energy e.,k Boltzmanns constant. T temperature.
Simulated Annealing [Boltzmann Traing]
Define a variable T that represents an artificial

temperature. (Start with T at large value).
Apply a set of input to the network, and calculate

the outputs and objective function.
Make a random change weight and recalculate the

network output.
Calculate new objective function.
If the objective function is reduced, retain the

weight change.
If the weight
change results in an increase in
objective function ,calculate the probability of accepting the weight change.
P(c) exp
( c / kT )
P(c)=probability of a change of c in the objective function,k Boltzmanns constant. T temperature.
Select
a random number r from a uniform distribution between zero and one. If p(c) is greater than r, retain the change otherwise return the weight to previous value. This allows the system to take a step in a direction that worsen the objective function, hence escapes from local minimum. Repeat the weight change process over each of the weights in the network, gradually reducing the temperature T until an acceptably low value for objective function is obtained.
How to select weights/artificial Temperature for training
The size of the random weight change is selected by various methods. 2 2 Eg:- P ( ) exp( w / T ) P(w)=Probability of a weight change of size w.
T=artificial temperature
To achieve global minimum at the earliest the cooling rate is usually expressed as follows
T0 T (t ) (log(1 t ))
The main disadvantage of Boltzmanns training is very
low cooling rate and hence long computations.

Boltzmanns machine usually takes impractical time
for training.
Cauchy Training
Cauchy
training method is more rapid than
Boltzmann training.
Cauchy training substitutes cauchys distribution for

Boltzmann's distribution.
Caushys distribution has longer tails", hence more

probability for larger step size.
The
temperature reduction rate is changed to
inverse linear. (For Boltzmann training it was inverse

logarithmic.)
Cauchy s distribution is
T (t ) P( x) [T (t ) 2 x 2 ]
The inverse linear relationship reduction reduces the training time.
for
temperature
T0 T (t ) (1 t )

Artificial Neural Networks

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Artificial Neural Networks

Uploaded by

Copyright:

Available Formats

MODULE-IV

STATISTICAL METHODS IN ANN

Statistical Methods are used for

Training ANN Producing output from trained network

Deterministic Methods Statistical Training Methods

Deterministic Training Method

Follows a step by step procedure. Weights are changed based on

also based on the desired output and the

E.g.:-Perceptron Training Algorithm.

Statistical Training Methods

Make pseudo random change in the weights Retains

GENERAL PROCEDURE ( FOR STTISTICAL TRAINING METHOD)

a set of input and compute the resulting

Compare the result with target, find the error.

Select a weight in random and adjust it by a small

the adjustment improves our objective retain

return the weight to the previous

the procedures until the network is

trained to the desired level

The local minima problem

If the random weight changes are large both point

The weight will change so drastically that it will

Solution & Explanation

Statistical method overcome local minima problem by a weight adjustment strategy.

the fig. represents a ball on a surface in a

of shaking is gradually reduced the ball

will stick to both point A and B.

If the shaking is again reduced it will settle to point B.

Simulated Annealing [Boltzmann Traing]

Define a variable T that represents an artificial

Apply a set of input to the network, and calculate

Make a random change weight and recalculate the

Calculate new objective function.

If the objective function is reduced, retain the

change results in an increase in

objective function ,calculate the probability of accepting the weight change.

P(c)=probability of a change of c in the objective function,k Boltzmanns constant. T temperature.

How to select weights/artificial Temperature for training

The main disadvantage of Boltzmanns training is very

low cooling rate and hence long computations.

training method is more rapid than

Cauchy training substitutes cauchys distribution for

Caushys distribution has longer tails", hence more

temperature reduction rate is changed to

inverse linear. (For Boltzmann training it was inverse

The inverse linear relationship reduction reduces the training time.

You might also like