The Multilayer Perceptron

The artificial neuron revisited
The synthetic or artificial neuron, which is a simple model of the biological neuron,
was first proposed in 1943 by McCulloch and Pitts. It consists of a summing function
with an internal threshold, and "weighted" inputs as shown below.
For a neuron receiving n inputs, each input xi ( i ranging from 1 to n) is weighted by

multiplying it with a weight wi . The sum of the wixi products gives the net activation
of the neuron. This activation value is subjected to a transfer function to produce the
neurons output.
The weight value of the connection or link carrying signals from a neuron i to a
neuron j is termed wij..
wij
Direction of signal flow
Transfer functions
One of the design issues for ANNs is the type of transfer function used to compute the
output of a node from its net activation. Among the popular transfer functions are:
Step function
Signum function
Sigmoid function
Hyperbolic tangent function
In the step function, the neuron produces an output only when its net activation
reaches a minimum value known as the threshold. For a binary neuron i, whose
output is a 0 or 1 value, the step function can be summarised as:
0 if activationi T
outputi
1 if activationi T
337309777.doc
When the threshold T is 0, the step function is called signum function.
337309777.doc
Another type of signum function is

1 if activationi 0
output i 0 if activationi 0
1 if activation 0
i
The sigmoid transfer function produces a continuous value in the range 0 to 1. It has
the form:
1
output i
gain . activation
1 e
i
The parameter gain is determined by the system designer. It affects the slope of the
transfer around zero. The multilayer perceptron uses the sigmoid as the transfer
function.
A variant of the sigmoid transfer function is the hyperbolic tangent function. It has the
form:
output i
e activationi e activationi
activationi activationi
e
e
where u = gain . activationi. This function has a shape similar to the sigmoid (shaped
like an S), with the difference that the value of outputi ranges between 1 and 1.
337309777.doc
output i
output i
Step function
Signum
0
T
activation i
activation i
output i
output i
1
Sigmoid
0.5
0
activation i
Figure Functional form of transfer functions
-1
activation i
Hyperbolic Tangent
Introduction to the multilayer perceptron

To be able to solve nonlinearly separable problems, a number of neurons are
connected in layers to build a multilayer perceptron.
Each of the perceptrons is used to identify small linearly separable sections of the
inputs.
Outputs of the perceptrons are combined into another perceptron to produce the final
output.
The hard-limiting (step) function used for producing the output prevents information
on the real inputs flowing on to inner neurons. To solve this problem, the step function
is replaced with a continuous function- usually the sigmoid function.
337309777.doc
The Architecture of the Multilayer Perceptron

In a multilayer perceptron, the neurons are arranged into an input layer, an output
layer and one or more hidden layers.
The Generalised Delta Rule

The learning rule for the multilayer perceptron is known as "the generalised delta
rule" or the "backpropagation rule".
The generalised delta rule repetitively calculates an error function for each input and
backpropagates the error from one layer to the previous one.
The weights for a particular node are adjusted in direct proportion to the error in the
units to which it is connected.
Let
Ep
tpj
opj
wij
=
=
=
=
error function for pattern p

target output for pattern p on node j
actual output for pattern p on node j
weight from node i to node j
The error function Ep is defined to be proportional to the square of the difference tpj opj
(1)
Ep = 1/2(tpj - opj)2
j
The activation of each unit j, for pattern p, can be written as
net
337309777.doc
pj = wijopi
(2)
5
i
The output from each unit j is determined by the non-linear transfer function fj
opj = fj(netpj)
We assume fj to be the sigmoid function, f(net) = 1/(1 + e-k.net),
where k is a positive constant that controls the "spread" of the function.
The delta rule implements weight changes that follow the path of steepest descent on
a surface in weight space. The height of any point on this surface is equal to the error
measure Ep. This can be shown by showing that the derivative of the error measure
with resepect to each weight is proportional to the weight change dictated by the delta
rule, with a negative constant of proportionality, i.e.,
pwi -Ep/wij
The multilayer perceptron learning algorithm using the generalised delta

rule
1.
2.
3.
Initialise weights (to small random values) and transfer function

Present input
Adjust weights by starting from output layer and working backwards
wij(t + 1) = wij(t) + pjopi
wij(t) represents the weights from node i to node j at time t, is a gain
term, and pj is an error term for pattern p on node j.
For output layer units
pj = kopj(1 - opj)(tpj - opj)
For hidden layer units
pj = kopj(1 - opj) pkwjk
k
where the sum is over the k nodes in the following layer.
The learning rule in a multilayer perceptron is not guaranteed to produce convergence,

and it is possible for the network to fall into a situation (the so called local minima) in
which it is unable to learn the correct output.
337309777.doc
Multilayer Perceptrons as Classifiers

The single layer perceptron is limited to calculating a single line of separation
between classes.
Let us consider a two layer perceptron with two units in the input layer.
If one unit is set to respond with a 1 if the input is above its decision line, and the
other responds with a 1 if the input is below its decision line, the second layer
produces a solution in the form of a 1 if its input is above line 1 and below line 2.
line 1
line 2
Fig. A 2-layer perceptron and the resluting decesion region.

A three layer perceptron can therefore produce arbitrarily shaped decision regions,
and are capable of separating any classes. This statement is referred to as the
Kolmogorov theorem.
Considering pattern recognition as a mapping function from unknown inputs to
known classes, any function, no matter how complex, can be represented by a
multilayer perceptron of no more than three layers.
The energy landscape

The behaviour of a neural network as it attempts to arrive at a solution can be
visualised in terms of the error or energy function Ep.
The energy is a function of the input and the weights. For a given pattern, Ep can be
plotted against the weights to give the so called energy surface. The energy surface is
a landscape of hills and valleys, with points of minimum energy corresponding to
wells and maximum energy found on peaks.
The generalised delta rule aims to minimise Ep by adjusting weights so that they
correspond to points of lowest energy. It does this by the method of gradient descent
where the changes are made in the steepest downward direction.
All possible solutions are depressions in the energy surface, known as basins of
attraction.
337309777.doc
Learning Difficulties in Multilayer Perceptrons

Occasionally, the multilayer perceptron fails to settle into the global minimum of the
energy surface and instead find itself in one of the local minima. This is due to the
gradient descent strategy followed. A number of alternative approaches can be taken
to reduce this possibility:
Lowering the gain term progressively
Addition of more nodes for better representation of patterns
Introduction of a momentum term which determines the effect of past weight

changes on the current direction of the movement in weight space:
wij(t + 1) = wij(t) +
pjopi + (wij(t) - wij(t - 1))
where momentum term 0 < < 1.
Addition of random noise to perturb a system out of a local minima.
Advantages of Multilayer Perceptrons

The following two features characterise multilayer perceptrons and artificial neural
networks in general. They are mainly responsible for the "edge" these networks have
over conventional computing systems.
Generalisation
Neural networks are capable of generalisation, that is, they classify an unknown
pattern with other known patterns that share the same distinguishing features. This
means noisy or incomplete inputs will be classified because of their similarity with
pure and complete inputs.
Fault Tolerance
Neural networks are highly fault tolerant. This characteristic is also known as
"graceful degradation". Because of its distributed nature, a neural network keeps on
working even when a significant fraction of its neurons and interconnections fail.
Also, relearning after damage can be relatively quick.
337309777.doc
Applications of Multilayer Perceptrons

The multilayer perceptron with backpropagation has been applied in numerous
applications ranging from OCR (Optical Character Recognition) to medicine. Brief
accounts of a few are given below.
Speech synthesis
A very well known use of the multilayer perceptron is NETtalk [1], a text-to-speech
conversion system, developed by Sejnowski and Rosenberg in 1987.
It consists of 203 input units, 120 hidden units, and 26 output units with over 27000
synapses. Each output unit represents one basic unit of sound, known as a phoneme.
Context is utilised in training by presenting seven successive letters to the input and
the net learns to pronounce the middle letter.
90% correct pronunciation achieved with the training set (80-87% with unseen set).
Resistant to damage and displays graceful degradation.
Multilayer perceptrons are also being used for speech recognition to be used in voice
activated control systems.
Financial applications
Examples include bond rating, loan application evaluation and stock market
prediction.
Bond rating involves categorising the bond issuer's capability. There is no hard and
fast rules for determining these ratings. Statistical regression is inappropriate because
the factors to be used are not well defined. Neural networks trained with
backpropagation has consistently outperformed standard statistical techniques [2].
Pattern Recognition
337309777.doc
For many of the applications of neural networks, the underlying principle is that of
pattern recognition.
Target identification from sonar echoes has been developed. Given only a day of
training, the net produced 100% correct identification of the target, compared to 93%
scored by a Bayesian classifier.
There are many commercial applications of networks in character recognition. One
such system performs signature verification on bank cheques.
Networks have been applied to the problems of aircraft identification, and to terrain
matching for automatic navigation.
Limitations of Multilayer Perceptrons

1.
Computationally expensive learning process
Large number of iterations required for learning, not suitable for real-time
learning
2. No guaranteed solution
Remedies such as the "momentum term" add to computational cost
Other remedies:
using estimates of transfer functions
using transfer functions with easy to compute derivatives
using estimates of error values, eg., a single global error value for the
hidden layer
3. Scaling problem
Do not scale up well from small research systems to larger real systems.
Both too many and too few units slow down learning.
Biological arguments against Backpropagation

Backpropagation not used or used through different pathways in
biological systems.
Biological systems use only local information for self-adjustments.
The question one might ask at this point is - does an effective system need to
mimic nature exactly?
337309777.doc
10
REFERENCES
Beale, R., & Jackson, T., "Neural Computing: An Introduction",
Bristol : Hilger, c1990.
(Contains full derivation of the generalised delta rule. Available at Murdoch
library)
337309777.doc
11

The Multilayer Perceptron

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Multilayer Perceptron

Uploaded by

Copyright:

Available Formats

The artificial neuron revisited

For a neuron receiving n inputs, each input xi ( i ranging from 1 to n) is weighted by

Direction of signal flow

When the threshold T is 0, the step function is called signum function.

Another type of signum function is

Figure Functional form of transfer functions

Introduction to the multilayer perceptron

The Architecture of the Multilayer Perceptron

The Generalised Delta Rule

error function for pattern p

The multilayer perceptron learning algorithm using the generalised delta

Initialise weights (to small random values) and transfer function

The learning rule in a multilayer perceptron is not guaranteed to produce convergence,

Multilayer Perceptrons as Classifiers

Fig. A 2-layer perceptron and the resluting decesion region.

The energy landscape

Learning Difficulties in Multilayer Perceptrons

Lowering the gain term progressively

Addition of more nodes for better representation of patterns

Introduction of a momentum term which determines the effect of past weight

pjopi + (wij(t) - wij(t - 1))

where momentum term 0 < < 1.

Addition of random noise to perturb a system out of a local minima.

Advantages of Multilayer Perceptrons

Applications of Multilayer Perceptrons

Limitations of Multilayer Perceptrons

Computationally expensive learning process

Biological arguments against Backpropagation

Biological systems use only local information for self-adjustments.

You might also like