Professional Documents
Culture Documents
Agenda
Review of Neural Nets and Backpropagation Backpropagation: The Math Advantages and Disadvantages of Gradient Descent and other algorithms Enhancements of Gradient Descent Other ways of minimizing error
Review
Approach that developed from an analysis of the human brain Nodes created as an analog to neurons Mainly used for classification problems (i.e. character recognition, voice recognition, medical applications, etc.)
Review
Neurons have weighted inputs, threshold values, activation function, and an output
Weighted inputs
Output
Review
4 Input AND
Inputs
Threshold = 1.5
Inputs Threshold = 1.5 All weights = 1 and all outputs = 1 if active 0 otherwise
Review
(0,0)
(1,0)
Input 2
Review
Output space for XOR gate Demonstrates need for hidden layer
Input 1
(0,1) (1,1)
Input 2
(0,0) (1,0)
0
X0,0 X1,0
8
X9,0
Hidden Layer
0
W0,0
1
W1,0 Wi,0
Input Layer
Backpropagation
Backpropagation
Backpropagation
Calculation of error
dk = f(Dk) -f(Ok)
Backpropagation
Backpropagation
Error at k
Backpropagation
Advantages
Disadvantages
Back-propagation is one of the simplest and most general methods for training of multilayer neural networks. The power of back-propagation is that it enables us to compute an effective error for each hidden unit, and thus derive a learning rule for the input-to-hidden weights. Our goal now is to set the interconnection weights based on the training patterns and the desired outputs Slow convergence speed, is Disadvantages of error backpropagation algorithm.
MLP and BP is used in Cognitive and Computational Neuroscience modelling but still the algorithm does not have real neuro-physiological support The algorithm can be used to make encoding / decoding and compression systems. Useful for data pre-processing operations The MLP with the BP algorithm is a universal approximator of functions The algorithm is computationally efficient as it has O(W) complexity to the model parameters The algorithm has local robustness The convergence of the BP can be very slow, especially in large problems, depending on the method
Advantages
A neural network can perform tasks that a linear program can not. When an element of the neural network fails, it can continue without any problem by their parallel nature. A neural network learns and does not need to be reprogrammed. It can be implemented in any application. It can be implemented without any problem
Disadvantages
The neural network needs training to operate. The architecture of a neural network is different from the architecture of microprocessors therefore needs to be emulated. Requires high processing time for large neural networks.
Simulated Annealing
Advantages
Disadvantages
Advantages
Faster than simulated annealing Less likely to get stuck in local minima
Disadvantages
Simplex Algorithm
Advantages
Disadvantages
Momentum
Momentum
Useful to get over small bumps in the error function Often finds a minimum in less steps w(t) = -n*d*y + a*w(t-1)
w is the change in weight n is the learning rate d is the error y is different depending on which layer we are calculating a is the momentum parameter
It assigns each weight a learning rate That learning rate is determined by the sign of the gradient of the error function from the last iteration
If the signs are equal it is more likely to be a shallow slope so the learning rate is increased The signs are more likely to differ on a steep slope so the learning rate is decreased
Adaptive Backpropagation
Possible Problems:
Since we minimize the error for each weight separately the overall error may increase
Solution:
Calculate the total output error after each adaptation and if it is greater than the previous error reject that adaptation and calculate new learning rates
Combines the momentum and adaptive methods. Uses adaptive method and momentum so long as the sign of the gradient does not change
This is an additive effect of both methods resulting in a faster traversal of gradual slopes
When the sign of the gradient does change the momentum will cancel the drastic drop in learning rate
This allows for the function to roll up the other side of the minimum possibly escaping local minima
SuperSAB
Experiments show that the SuperSAB converges faster than gradient descent Overall this algorithm is less sensitive (and so is less likely to get caught in local minima)
Adding neurons speeds up learning but may cause loss in generalization Removing neurons has the opposite effect
Resources
Artifical Neural Networks, Backpropagation, J. Henseler Artificial Intelligence: A Modern Approach, S. Russell & P. Norvig 501 notes, J.R. Parker www.dontveter.com/bpr/bpr.html www.dse.doc.ic.ac.uk/~nd/surprise_96/journal/vl4/cs 11/report.html
Local Minima
Local Minimum
Global Minimum
Where the height of the hills is determined by error But there are many dimensions to the space
Therefore backpropagation
Can find its ways into local minima Random re-start: learn lots of networks
Can take best network Or can set up a committee of networks to categorise examples
Adding Momentum
Without Momentum
With Momentum
Momentum in Backpropagation
Then the movement of search gets bigger The amount of additional extra is compounded in each epoch May mean that narrow local minima are avoided May also mean that the convergence rate speeds up
Caution:
May not have enough momentum to get out of local minima Also, too much momentum might carry search