Professional Documents
Culture Documents
The synthetic or artificial neuron, which is a simple model of the biological neuron,
was first proposed in 1943 by McCulloch and Pitts. It consists of a summing function
with an internal threshold, and "weighted" inputs as shown below.
Transfer functions
One of the design issues for ANNs is the type of transfer function used to compute the
output of a node from its net activation. Among the popular transfer functions are:
Step function
Signum function
Sigmoid function
Hyperbolic tangent function
In the step function, the neuron produces an output only when its net activation
reaches a minimum value known as the threshold. For a binary neuron i, whose
output is a 0 or 1 value, the step function can be summarised as:
0 if activationi T
outputi
1 if activationi T
337309777.doc
337309777.doc
output i 0 if activationi 0
1 if activation 0
i
The sigmoid transfer function produces a continuous value in the range 0 to 1. It has
the form:
1
output i
gain . activation
1 e
i
The parameter gain is determined by the system designer. It affects the slope of the
transfer around zero. The multilayer perceptron uses the sigmoid as the transfer
function.
A variant of the sigmoid transfer function is the hyperbolic tangent function. It has the
form:
output i
e activationi e activationi
activationi activationi
e
e
where u = gain . activationi. This function has a shape similar to the sigmoid (shaped
like an S), with the difference that the value of outputi ranges between 1 and 1.
337309777.doc
output i
output i
Step function
Signum
0
T
activation i
activation i
output i
output i
1
Sigmoid
0.5
0
activation i
-1
activation i
Hyperbolic Tangent
337309777.doc
Ep
tpj
opj
wij
=
=
=
=
The error function Ep is defined to be proportional to the square of the difference tpj opj
(1)
Ep = 1/2(tpj - opj)2
j
The activation of each unit j, for pattern p, can be written as
net
337309777.doc
pj = wijopi
(2)
5
i
The output from each unit j is determined by the non-linear transfer function fj
opj = fj(netpj)
We assume fj to be the sigmoid function, f(net) = 1/(1 + e-k.net),
where k is a positive constant that controls the "spread" of the function.
The delta rule implements weight changes that follow the path of steepest descent on
a surface in weight space. The height of any point on this surface is equal to the error
measure Ep. This can be shown by showing that the derivative of the error measure
with resepect to each weight is proportional to the weight change dictated by the delta
rule, with a negative constant of proportionality, i.e.,
pwi -Ep/wij
337309777.doc
line 1
line 2
wij(t + 1) = wij(t) +
Fault Tolerance
Neural networks are highly fault tolerant. This characteristic is also known as
"graceful degradation". Because of its distributed nature, a neural network keeps on
working even when a significant fraction of its neurons and interconnections fail.
Also, relearning after damage can be relatively quick.
337309777.doc
337309777.doc
For many of the applications of neural networks, the underlying principle is that of
pattern recognition.
Target identification from sonar echoes has been developed. Given only a day of
training, the net produced 100% correct identification of the target, compared to 93%
scored by a Bayesian classifier.
There are many commercial applications of networks in character recognition. One
such system performs signature verification on bank cheques.
Networks have been applied to the problems of aircraft identification, and to terrain
matching for automatic navigation.
Large number of iterations required for learning, not suitable for real-time
learning
2. No guaranteed solution
Remedies such as the "momentum term" add to computational cost
Other remedies:
using estimates of transfer functions
using transfer functions with easy to compute derivatives
using estimates of error values, eg., a single global error value for the
hidden layer
3. Scaling problem
Do not scale up well from small research systems to larger real systems.
Both too many and too few units slow down learning.
The question one might ask at this point is - does an effective system need to
mimic nature exactly?
337309777.doc
10
REFERENCES
Beale, R., & Jackson, T., "Neural Computing: An Introduction",
Bristol : Hilger, c1990.
(Contains full derivation of the generalised delta rule. Available at Murdoch
library)
337309777.doc
11