Professional Documents
Culture Documents
Evolutionary Computing
Evolutionary computing produces
high-quality partial solutions
to problems through
natural selection and
survival of the fittest
– Compare to natural
biological systems that
adapt and learn over time
Genetic Algorithm Example
Find the maximum value of function
f(x) = –x2 + 15x
40
30
20
10
0
0 5 10 15
x
(a) Chromosome initial locations.
Genetic Algorithm Example
fitness function here is
simply the original function
Determine chromosome fitness for f(x) = –x2 + 15x
each chromosome:
Chromosome Chromosome Decoded Chromosome Fitness
label string integer fitness ratio, %
X1 1 1 00 12 36 16.5
X2 0 1 00 4 44 20.2
X3 0 0 01 1 14 6.4
X4 1 1 10 14 14 6.4
X5 0 1 11 7 56 25.7
X6 1 0 01 9 54 24.8
218 100.0
Genetic Algorithm Example
Use fitness ratios to determine which
chromosomes are selected for crossover
and mutation operations:
100 0
X1: 16.5%
X2: 20.2%
75.2 X3: 6.4%
X4: 6.4%
X5: 25.3%
36.7 X6: 24.8%
49.5 43.1
Genetic Algorithms – Step 1
Represent the problem domain as
a chromosome of fixed length
– Use a fixed number of genes to represent a solution
– Use individual bits or characters for efficient
memory use and speed
1 0 1 1 0 1 0 0 0 0 0 1 0 1 0 1
– e.g. Traveling Salesman Problem (TSP)
http://www.lalena.com/AI/Tsp/
Genetic Algorithms – Step 2
Define a fitness function f(x) to measure
the quality of individual chromosomes
The fitness function determines
– which chromosomes carry over to the next generation
– which chromosomes are crossed over with one another
– which chromosomes are individually mutated
Genetic Algorithms – Step 3
Establish our genetic algorithm parameters:
– Choose the size of the population, N
– Set the crossover probability, pc
– Set the mutation probability, pm
X6i 1 0 00 1 0 1 00 00 X2i
X1i 10 11 00 00 0 11 11 11 X5i
Genetic Algorithms – Step 6
Mutate an offspring chromosome by applying
a mutation operation:
X6'i 1 0 0 0
X2'i 0 1 0 10
X1'i 10 1 1 1 1 1 X1"i
X5'i 0 1 01 01
X2i 0 1 0 0 1 0 X2"i
Genetic Algorithms – Steps 7 & 8
Step 7:
– Place all generated offspring
chromosomes in a new population
Step 8:
– Go back to Step 5 until the size of the new population
is equal to the size of the initial population, N
Genetic Algorithms – Steps 9 & 10
Step 9: Generation i
Crossover
– X3i
X4i
0 0 0 1
1 1 1 0
f = 14
f = 14 X1i 0 11 00 00
1 0 11 11 11 X5i
X5'i 0 1 0
X2i 0 1
1 0
1
1 1
0
1
0 1
1 1 X1"i
0 X2"i
–
until termination criteria are satisfied
– Typically repeat this process for 50-5000+ generations
Iteration Generation i
Crossover
X2i 0 1 0 0 1 0 X2"i
X5i 0 1 1 1
Genetic Algorithms
Advantages of genetic algorithms:
– Often outperform “brute force” approaches by
randomly jumping around the search space
– Ideal for problem domains in which near-optimal
(as opposed to exact) solutions are adequate
In the training mode, the neuron can be trained to fire (or not),
for particular input patterns.
In the using mode, when a taught input pattern is detected at
the input, its associated output becomes the current output. If
the input pattern does not belong in the taught list of input
patterns, the firing rule is used to determine whether to fire or
not.
The firing rule is an important concept in neural networks and
accounts for their high flexibility. A firing rule determines how
one calculates whether a neuron should fire for any input
pattern. It relates to all the input patterns, not only the ones on
which the node was trained on previously.
Pattern Recognition
Here the top row is 2 errors away from a ‘T’ and 3 errors away
from an H. So the top output is a black.
The middle row is 1 error away from both T and H, so the
output is random.
The bottom row is 1 error away from T and 2 away from H.
Therefore the output is black.
Since the input resembles a ‘T’ more than an ‘H’ the output of
the network is in favor of a ‘T’.
Different types of Neural Networks
Feed-forward networks
– Feed-forward NNs allow signals to travel one way
only; from input to output. There is no feedback
(loops) i.e. the output of any layer does not affect
that same layer.
– Feed-forward NNs tend to be straight forward
networks that associate inputs with outputs. They
are extensively used in pattern recognition.
– This type of organization is also referred to as
bottom-up or top-down.
Continued
Feedback networks
– Feedback networks can have signals traveling in both
directions by introducing loops in the network.
– Feedback networks are dynamic; their 'state' is changing
continuously until they reach an equilibrium point.
– They remain at the equilibrium point until the input changes
and a new equilibrium needs to be found.
– Feedback architectures are also referred to as interactive or
recurrent, although the latter term is often used to denote
feedback connections in single-layer organizations.
Backprop algorithm
The Backprop algorithm searches for weight values that
minimize the total error of the network over the set of training
examples (training set).
Backprop consists of the repeated application of the following
two passes:
– Forward pass: in this step the network is activated on one
example and the error of (each neuron of) the output layer is
computed.
– Backward pass: in this step the network error is used for
updating the weights. Starting at the output layer, the error
is propagated backwards through the network, layer by
layer. This is done by recursively computing the local
gradient of each neuron.
Back Propagation
Back-propagation training algorithm
Network activation
Forward Step
Error
propagation
Backward Step
Backprop adjusts the weights of the NN in order to
minimize the network total mean squared error.
Neural Network in Use
Since neural networks are best at identifying patterns or trends
in data, they are well suited for prediction or forecasting
needs including:
– sales forecasting
– industrial process control
– customer research
– data validation
– risk management
Initialize the weights: The weights in the network are initialized to small random num-
bers (e.g., ranging from −1.0 to 1.0, or −0.5 to 0.5). Each unit has a bias associated with
it, as explained below. The biases are similarly initialized to small random numbers.
Each training tuple, X, is processed by the following steps.
Propagate the inputs forward: First, the training tuple is fed to the input layer of the
network. The inputs pass through the input units, unchanged. That is, for an input unit,
j, its output, O j , is equal to its input value, I j . Next, the net input and output of each
unit in the hidden and output layers are computed. The net input to a unit in the hidden
or output layers is computed as a linear combination of its inputs. To help illustrate this
point, a hidden layer or output layer unit is shown in Figure 6.17. Each such unit has a
number of inputs to it that are, in fact, the outputs of the units connected to it in the
previous layer. Each connection has a weight. To compute the net input to the unit, each
input connected to the unit is multiplied by its corresponding weight, and this is summed.
Given a unit j in a hidden or output layer, the net input, I j , to unit j is
I j = ∑ wi j Oi + θ j , (6.24)
i
where wi j is the weight of the connection from unit i in the previous layer to unit j; Oi is
the output of unit i from the previous layer; and θ j is the bias of the unit. The bias acts
as a threshold in that it serves to vary the activity of the unit.
Each unit in the hidden and output layers takes its net input and then applies an acti-
vation function to it, as illustrated in Figure 6.17. The function symbolizes the activation
Weights
w1 j Bias
y1
j
w2 j
y2
∑ f Output
...
wnj
yn
Figure 6.17 A hidden or output layer unit j: The inputs to unit j are outputs from the previous layer.
These are multiplied by their corresponding weights in order to form a weighted sum, which
is added to the bias associated with unit j. A nonlinear activation function is applied to the net
input. (For ease of explanation, the inputs to unit j are labeled y1 , y2 , . . . , yn . If unit j were in
the first hidden layer, then these inputs would correspond to the input tuple (x1 , x2 , . . . , xn ).)
332 Chapter 6 Classification and Prediction
of the neuron represented by the unit. The logistic, or sigmoid, function is used. Given
the net input I j to unit j, then O j , the output of unit j, is computed as
1
Oj = . (6.25)
1 + e−I j
This function is also referred to as a squashing function, because it maps a large input
domain onto the smaller range of 0 to 1. The logistic function is nonlinear and differ-
entiable, allowing the backpropagation algorithm to model classification problems that
are linearly inseparable.
We compute the output values, O j , for each hidden layer, up to and including the
output layer, which gives the network’s prediction. In practice, it is a good idea to cache
(i.e., save) the intermediate output values at each unit as they are required again later,
when backpropagating the error. This trick can substantially reduce the amount of com-
putation required.
Backpropagate the error: The error is propagated backward by updating the weights and
biases to reflect the error of the network’s prediction. For a unit j in the output layer, the
error Err j is computed by
where O j is the actual output of unit j, and T j is the known target value of the given
training tuple. Note that O j (1 − O j ) is the derivative of the logistic function.
To compute the error of a hidden layer unit j, the weighted sum of the errors of the
units connected to unit j in the next layer are considered. The error of a hidden layer
unit j is
Err j = O j (1 − O j ) ∑ Errk w jk , (6.27)
k
where w jk is the weight of the connection from unit j to a unit k in the next higher layer,
and Errk is the error of unit k.
The weights and biases are updated to reflect the propagated errors. Weights are updated
by the following equations, where ∆wi j is the change in weight wi j :
8
A method of gradient descent was also used for training Bayesian belief networks, as described in
Section 6.4.4.
6.6 Classification by Backpropagation 333
in decision space (i.e., where the weights appear to converge, but are not the optimum
solution) and encourages finding the global minimum. If the learning rate is too small,
then learning will occur at a very slow pace. If the learning rate is too large, then oscilla-
tion between inadequate solutions may occur. A rule of thumb is to set the learning rate
to 1/t, where t is the number of iterations through the training set so far.
Biases are updated by the following equations below, where ∆θ j is the change in
bias θ j :
∆θ j = (l)Err j (6.30)
θ j = θ j + ∆θ j (6.31)
Note that here we are updating the weights and biases after the presentation of each
tuple. This is referred to as case updating. Alternatively, the weight and bias increments
could be accumulated in variables, so that the weights and biases are updated after all
of the tuples in the training set have been presented. This latter strategy is called epoch
updating, where one iteration through the training set is an epoch. In theory, the math-
ematical derivation of backpropagation employs epoch updating, yet in practice, case
updating is more common because it tends to yield more accurate results.
Terminating condition: Training stops when
All ∆wi j in the previous epoch were so small as to be below some specified threshold, or
The percentage of tuples misclassified in the previous epoch is below some threshold,
or
A prespecified number of epochs has expired.
Example 6.9 Sample calculations for learning by the backpropagation algorithm. Figure 6.18 shows
a multilayer feed-forward neural network. Let the learning rate be 0.9. The initial weight
and bias values of the network are given in Table 6.3, along with the first training tuple,
X = (1, 0, 1), whose class label is 1.
This example shows the calculations for backpropagation, given the first training
tuple, X. The tuple is fed into the network, and the net input and output of each unit
are computed. These values are shown in Table 6.4. The error of each unit is computed
334 Chapter 6 Classification and Prediction
x1 1 w14
w15 4
w46
w24
x2 2 6
w25
w56
w34 5
x3 3 w35
and propagated backward. The error values are shown in Table 6.5. The weight and bias
updates are shown in Table 6.6.
Several variations and alternatives to the backpropagation algorithm have been pro-
posed for classification in neural networks. These may involve the dynamic adjustment of
the network topology and of the learning rate or other parameters, or the use of different
error functions.