You are on page 1of 6

Combining particle swarm optimization and genetic algorithms in the

supervised training of artificial neural networks

Among the many evolutionary methods used in supervising the training of artificial Intelligence
algorithms, in the search for the optimal solution for a certain type of prediction or classification, are the
Genetic Algorithms (GA) and Particle Swarm Optimization (PSO). Often, the combined use of such
algorithms can result in models with greater predictive capacity, and generalization (less overfitting),
besides requiring shorter training time, than what could be achieved using only one of them alone.

When supervising the training of an artificial neural network (ANN), genetic algorithms create crossovers
and mutations of the network parameters: numbers of neurons in intermediate layers (and number of
layers), parameters of the squashing functions (beyond the functions themselves). They produce
generation after generation of ANNs, always creating the next one by applying small mutations (combined
with crossover of the parameters) to the networks that performed better, according a "Darwinian
selection" filter, based on a pre-established criterion, such as maximizing R2, for example) and neglecting
the others in each "generation", as illustrated below (for a case of variable – increasing - population of
candidate solutions).

First generation

Second generation

Third generation

In each generation, the solutions that best meet the selective criteria (higher R2 or lower RMSE, for example), shown here in
red, are used as the gene pool to populate the next generation of solutions.

Solutions end up moving to a global or local optimum, in small steps given in all directions (the mutations
are random). The more the generations go further, the more candidate solutions included therein get
closer to each other by the chosen selection criterion, so a larger subpopulation will be used in the cross-
over and mutation to create the next generation of candidate solutions. This makes the ANN-GA training
process slower when evolving through many generations, although it exhibits a significant probability of
reaching a global optimum.

Pág. 1 de 6
On the other hand, the Particle Swarm optimization method is inspired by another behavior of nature,
the flocks of birds in search of food. As an optimization algorithm, it also works in generations, so it is
evolutionary, but not in a genetic concept, considering that it does not operate mutations or "crossover"
solutions as genetic algorithms do.

Imagine a flock of birds in search of food. Initially none of them know the exact location of the target (and
there may be more than one concentration of food, one being the largest). However, each of them has
an idea of direction and distance, through the sense of smell and visual "tips". As the flock moves, some
birds get closer to the food, converging faster ("diving") to reach it. Therefore, for everyone in the pack,
the most effective flight-targeting strategy is to follow the bird closest to the likely position, which is
proceeding faster, in the approximate direction initially sensed as the food location.

As an optimization algorithm, PSO follows a very similar strategy (see illustrative animation here or using
the URL in the references at the end of the text). Each candidate solution (parameters of an ANN in our
case) is a "bird" in the search space (or candidate solutions’ space), which we call a "particle". To each of
them we assign a “fit” measure (R2, or RMSE, for example), the criterion to be optimized, as well as a
“speed” (module value and direction) that directs its "flight". The particles "fly" through the search space
adjusting speed and direction based on their own proximity to the local or global optimum and on the
proximity of the "best" particles (the ones sporting the best fit value) at each given generation.

The algorithm initializes a "population" of random particles (candidate solutions) with positions and
velocities (random as well or may be set to zero in the case of velocity). To the set of initial random
parameters, besides the population size, we give the collective name of “seed”. Then, the PSO algorithm
looks for better solutions by adjusting the velocities (variations / derivatives of ANN parameters in our
case) in successive generations. In each iteration of the algorithm, each particle is updated following the
"best" values, considering its own, the whole population and, eventually, the particles “flying” in its
immediate neighborhood.

Differently from a genetic algorithm, where the population as a whole approaches the local or global
optimum without many differences between the various individual solutions (in terms of the fitting
criterion), PSO converges more quickly, even if, in doing so, it does not involve the entire population of
candidate solutions (particles), performing a less cohesive move. Therefore, it is slightly more susceptible
than genetic algorithms to converge to a local optimum and get stuck to it, stopping looking for a global
optimum, if it is not (and is distant from) the initial “attractor”.

However, if the initial convergence move of candidate solutions is in the direction of a global optimum,
or of a local optimum close to it, a PSO can become an important shortcut, "a scout overflying the search
space" that can substantially speed up the search for the best solution (in this case, best set of ANN
parameters), creating several candidate solutions clustered close to the optimal one. After this rapid
initial approaching move, the optimization method can be switched to a genetic algorithm that will make
the final approach more slowly, but with a greater chance of reaching a global optimum. The following
diagrams further illustrate this hybrid optimization strategy PSO + GA, which has been used in some
specific situations with surprisingly superior results.

Pág. 2 de 6
In the initial phase of ANN training, supervised by In the next phase, after switching to GA
PSO, solutions quickly converge to a local or supervision, the (not necessarily convergent)
global optimum in the search space (represented displacements are smaller, allowing a more
here as three-dimensional only to ease accurate approaching to better local or even the
visualization) global optimum.

At IntellISearch we have adopted this combination with some frequency, partly because we use the
versatile Ward Systems' AI tools and one of them, Chaos Hunter, supports the development of predictive
models based on ANNs (and / or conventional analytic functions) and not only allows us to choose
between optimization by GA or PSO, but enables the combination of the two, starting with PSO and
concluding the training supervision with GA.

However, to obtain a better result than the use of GA only, it is necessary to work the optimization
parameters, especially the seed, the initial population and the switching point (generation number) from
PSO to GA optimization.

The following example illustrates the practical application of this strategy. It is the optimization
(supervised training) of a model aimed to predict the expected loss of a credit portfolio with high risk
debtors, using 8 independent variables, the dependent variable being the expected loss (%).

The data for the independent variables (columns 2 to 9) Optimization by GA only, population size of 100 and seed of
and the dependent variable (expected loss) to be 10. Fitting criterion: maximization of R2
predicted by the ANN.

Pág. 3 de 6
For simplicity, the model will be purely an ANN,
with as many as possible topological
alternatives.

The following graph shows the progress of the ANN optimization cycle, supervised by GA only.

After more than 4,600 generations, and over 7 minutes of GA-only optimization, the R2 of the ANN population (size=100) is still
below 60%
If, on the other hand, we start with a short cycle of optimization by PSO (just two generations, taking 2
seconds), switching to GA right after, better results are achieved in a shorter period and in a smaller
number of generations.

Pág. 4 de 6
Starting the new cycle of optimization by PSO *keeping the Interrupting the PSO optimization phase right after the first
same value for the seed and the population size), jump in the R2 value
interrupting it after just two generations

After interruption of optimization by PSO in the second generation, and the restart of the process by GA, in little more than
3,400 generations, and after about six minutes of optimization by GA, the R2 of the ANNs (population=100) exceeds 90%.

In order to assure that this gain in speed and accuracy did not come at the price of ANN overfitting, we
must apply the models to the out-of-sample data set, and the result is reflected in the scatter plots
presented in table below. The ANN obtained by the hybrid optimization presents much better
performance, especially for the smaller values of expected loss.

Pág. 5 de 6
Dispersion graph for the training case by GA only Scatter plot for the Hybrid PSO + GA training case (PSO
(optimization interrupted after 4.613 generations) optimization interrupted after 2 generations followed by GA
interrupted after 3,877 generations)

Of course, the fine tuning of the seed parameters and the maximum population size of candidate solutions
is essential to obtain such benefits of this hybrid strategy. The switching point, from PSO to GA, is also
vital in achieving good predictive capacity. As a rule of thumb, the PSO should be interrupted immediately
after the first jump in the value of the fitness criterion (R2 in our example). It is easy to understand that
this is so, since it is when the population of particles (candidate solutions) "flying" the research space,
identified the location of a local or global optimum and began to move in its direction. From then on, the
more accurate approximation is due to the genetic algorithm, which presents a greater capacity to "get
around" local optima towards the global one.

It is also easy to understand why the opposite hybridization (GA followed by PSO) does not work very well
(usually worse than either method alone). It would be as if, after excavating the first steps leading to a
pharaoh tomb, the archaeologists interrupted the process and resorted to an aerial photogrammetric
survey (or even a LIDAR survey) to locate it.

All of this fine-tuning is as much art as science, and over the past 15 years of our experience in developing
AI applications for many purposes, we have a lot of these “tricky” parameter configurations that result in
high-performance ANN models.

References:

http://www.swarmintelligence.org/tutorials.php

http://www.chaoshunter.com/company_info.html

https://en.wikipedia.org/wiki/Particle_swarm_optimization#/media/File:ParticleSwarmArrowsAnimatio
n.gif

Pág. 6 de 6

You might also like