You are on page 1of 5

On the Usage of Differential Evolution for Function

Optimization

by Rainer Storn

Siemens AG, ZFE T SN2, Otto-Hahn Ring 6, D-81739 Muenchen, Germany, currently on leave at ICSI,
1947 Center Street, Berkeley, CA 94704, storn@icsi.berkeley.edu

Abstract assumed unless otherwise stated. Basically, DE


Differential Evolution (DE) has recently proven to generates new parameter vectors by adding the
be an efficient method for optimizing real-valued weighted difference between two population
multi-modal objective functions. Besides its good vectors to a third vector. If the resulting vector
convergence properties and suitability for yields a lower objective function value than a
parallelization, DE's main assets are its predetermined population member, the newly
conceptual simplicity and ease of use. This generated vector replaces the vector, with which
paper describes several variants of DE and it was compared, in the next generation;
elaborates on the choice of DE's control otherwise, the old vector is retained. This basic
parameters which corresponds to the application principle, however, is extended when it comes to
of fuzzy rules. Finally the design of a howling the practical variants of DE. For example an
removal unit with DE is described to provide a existing vector can be perturbed by adding more
real-world example for DE's applicability. than one weighted difference vector to it. In most
cases, it is also worthwhile to mix the
1 Introduction parameters of the old vector with those of the
Differential Evolution (DE) [1], [2] has proven to perturbed one before comparing the objective
be a promising candidate for minimizing real- function values. Several variants of DE which
valued, multi-modal objective functions. Besides have proven to be useful will be described in the
its good convergence properties DE is very following.
simple to understand and to implement. DE is
also particularly easy to work with, having only a
2 Scheme DE/rand/1
few control variables which remain fixed
For each vector x i ,G , i = 0,1,2,...,NP-1, a
throughout the entire minimization procedure.
perturbed vector v i ,G +1 is generated according
DE is a parallel direct search method which
to v i,G +1 = x r1 ,G + F ⋅ ( x r2 ,G − x r 3,G ) (2)
utilizes NP D-dimensional parameter vectors

xi,G, i = 0, 1, 2, ... , NP-1, (1) with r1 , r2 , r3 ∈ 0, NP − 1 , integer and mutually


different, and F > 0.
as a population for each generation G, i.e. for
each iteration of the minimization. NP doesn't The randomly chosen integers r1, r2 and r3 are
change during the minimization process. The also chosen to be different from the running
initial population is chosen randomly and should index i. F is a real and constant factor ∈ [0, 2]
try to cover the entire parameter space which controls the amplification of the differential
uniformly. As a rule, a uniform probability variation ( x r2 ,G − x r3 ,G ) . Note that the vector
distribution for all random decisions will be

1
x r1 ,G which is perturbed to yield v i ,G +1 has no where rand() is supposed to generate a random
relation to x i ,G but is a randomly chosen number ∈ [0,1):
population member. Fig. 1 shows a two- L = 0;
dimensional example that illustrates the different
do {
vectors which play a part in the vector-generation
L = L + 1;
scheme. The notation: DE/rand/1 specifies that
the vector to be perturbed is randomly chosen, }while(rand()< CR) AND (L < D));
and that the perturbation consists of one
Hence the probability Pr(L>=ν) = (CR)ν-1, ν > 0.
weighted difference vector.
CR is taken from the interval [0, 1] and
X1
constitutes a control variable in the design
x N P Param e ter vectors from g eneration G
N ewly generated param eter vector v process. The random decisions for both n and L
F( x r2 ,G - x r3 ,G ) are made anew for each newly generated vector
M INIM UM ui,G+1.
x x x
x
x
x To decide whether or not it should become a
x
x i,G x member of generation G+1, the new vector
x r3 ,G x r ,G
2
x
x r1 ,G x ui,G+1 is compared to x i ,G . If vector ui,G+1
x
yields a smaller objective function value than
x
v i,G + 1 = x r1 ,G
x i,G , then x i,G +1 is set to ui,G+1; otherwise, the
+ F ( x r2 ,G - x r3 ,G )
X0 old value x i,G is retained.
Fig.1: An example of a two-dimensional 3 Scheme DE/best/1
objective function showing its contour
Basically, scheme DE/best/1 works the same
lines and the process for generating
way as DE/rand/1 except that it generates the
vi,G+1 in scheme DE/rand/1.
vector vi,G+1 according to:
v i,G +1 = x best ,G + F ⋅ ( x r1 ,G − x r2 ,G ) . (5)
In order to increase the potential diversity of the
This time, the vector to be perturbed is the best
perturbed parameter vectors, crossover is
performing vector of the current generation.
introduced. To this end, the vector:
Again, the computation of ui,G+1 is defined by
u i,G +1 = (u0i,G +1 , u1i,G +1 ,..., u( D −1)i,G +1 ) (3) eq. (4). This will be also be the case for the
remaining variants.
4 Scheme DE/best/2
with
%Kv j = n D , n + 1 D ,..., n + L − 1
Scheme DE/best/2 uses two difference vectors
=&
ji, G +1 for D
as a perturbation:
K' x
u ji,G +1
ji, G for all other j ∈[ 0, D − 1] v i,G +1 = x best ,G + F ⋅ ( x r1 ,G + x r2 ,G − x r3 ,G − x r4 ,G ) . (6)
(4)
Due to the central limit theorem the random
is formed. The acute brackets denote the
D variation is shifted slightly into gaussian direction
modulo function with modulus D. The starting which seems to be beneficial for many functions.
index, n, in (4) is a randomly chosen integer from
5 Scheme DE/rand-to best/1
the interval [0,D-1]. The integer L, which denotes
the number of parameters that are going to be Scheme DE/rand-to-best/1 places the
exchanged, is drawn from the interval [1, D]. perturbation at a location between a randomly
The algorithm which determines L works chosen population member and the best
according to the following lines of pseudo code population member:

2
vi,G+1 = xi,G + λ ⋅ ( xbest,G − xi,G ) + F ⋅ ( xr2 ,G − xr3 ,G ) . (7) crucial. The more knowledge one includes, the
more likely the minimization is going to converge.
λ controls the greediness of the scheme. To
The sum of error squares is not always a good
reduce the number of control variables we
choice as it has the potential to hide the path to
usually set λ =  .
the global minimum. To minimize the maximum
error is often a better objective but seems to
6 Rules for the usage of DE yield more local minima.
Since it's invention [1], DE's has been tested
extensively against artificial and real-world
7 Design of a howling remover
minimization problems. So far, the following set
In order to demonstrate DE's applicability to real-
of linguistic rules has emerged to be useful when
world problems a howling removal unit has been
it comes to choose the control variables F, CR
designed with DE. In modern audio
and NP:
communication applications hands free
# At initialization the population should be spread environments are the current trend where
as much as possible over the objective function headsets are replaced by loudspeakers and
surface. microphones. The preferred way of audio
communication is full duplex, i.e. all
# Most often the crossover probability CR ∈ [0,
loudspeakers and microphones are active as
1] must be considerably lower than one (e.g.
opposed to half duplex or "walkie talkie" mode
0.3). If no convergence can be achieved,
where only one party is allowed to talk at a time.
however, CR ∈ [0.8, 1] often helps.
Howling is one of the problems in full duplex
# For many applications NP=10*D is a good communication and builds up due to the acoustic
choice. F is usually chosen ∈ [0.5, 1]. feedback path. One way to reduce howling is to
# The higher the population size NP is chosen, frequency-shift the signal that is picked up by a
the lower one should choose the weighting factor microphone by 10Hz to 20Hz before it is sent to
F. the other communicating parties. This shift is
usually not perceived as unnatural by the human
# watching the parameters: it's a good ear. The shifted signal appears at the destination
convergence sign if the parameters of the best loudspeakers and travels back to the originator,
population member change a lot from generation shifted by another 10Hz to 20Hz. The signal
to generation, especially at the beginning of the travels many times through this acoustic path
minimization and even if the objective function and is quickly shifted out of band, thus reducing
value of the best population member decreases the feedback problems. Fig. 2 shows the block
slowly. diagram of the howling removal unit.
# watching the objective function: it is not B and pa ss D ow nsam p ler

necessarily bad, if the objective function value of xk 4 BP LP 4 yk

the best population member exhibits plateaus


U p sam p ler Δω π n L ow pass
during the minimization process. However, it is cos(1- )
ωA 2

an indication that the minimization might take a Fig. 2: Howling removal unit.
long time or that the increase of the population
The upsampler fills in three zero samples
size NP might be beneficial for convergence.
between the adjacent signal samples xk and
# The objective function value of the best xk+1. The bandpass, which operates at four
population member shouldn't drop too fast, times the sampling frequency ωA, retains the
otherwise the optimization might get stuck in a components of the spectrum which are
local minimum. sopposed to be frequency-shifted by Δω. The
# The proper choice of the objective function is actual shift is performed via multiplication with an

3
appropriate cosine signal. The lowpass removes scheme the magnitude response of the
some artifacts which appear due to the shifting corresponding filter was sampled in the
operation, and the downsampler takes every frequency domain. The number of samples that
fourth sample of the lowpass result to yield the were used are indicated in figs. 3 and 4 which
output signal yk at the original sampling also show the final results of the design
frequency. procedure.
1.2
One of the most important features of the 10 sam ples 4 0 s am p les 10 sam ples
tra ns itio n b an d p as s ba nd transition b and
howling remover is low computational complexity
1
1.0 1
as the unit has to operate in a real-time
0.99
environment. Therefore it was crucial to design a 0.8
10 sam ples

M agnitude
20 sam ples
bandpass as well as a lowpass with minimum
stop b and sto p b an d
degree. To this end the bandpass was chosen to 0.6

be a recursive digital filter (IIR-filter) the transfer


0.4
function of which had to meet specific magnitude
constraints defined by a tolerance scheme. The 0.2
lowpass was designed as a transversal filter 0.01 0.01
(FIR-filter) without the usual linear phase 0
0 0.1 0.2 0.3 0.4 0.5
requirement. Also the lowpass had to meet Normalized Frequency
magnitude specifications defined by a tolerance
Fig. 3: Magnitude response of the bandpass after
scheme.
the design process.
The objective function in both cases was defined
to be the maximum deviation from the
1.2
corresponding tolerance scheme or to be one of 1.005
the following penalty terms pk, whichever value 1
was greater: 0.995
M agnitud e

0.8
a) For all parameters par[i]:

p1 = 20, 000 + par[i]*100 if par[i] < 0 0.6


40 sam ples 20 sam ples

pass band 10 sam ples stop band


b) For all par[i] denoting the radius of a zero [3] 0.4
transition
of the transfer function: band
0.2
100 0.01 0.001
p2 = 2 + if par[i ] < 1
par[i ] + 1. e − 7 0
0 0.1 0.2 0.3 0.4 0.5
Normalized Frequency
c) For all p[i] denoting a zero angle:
Fig. 4: Magnitude response of the lowpass after
p3 = 2 + par[i ]* 100 if par[i ] < 0. 5 the design process.
d) For all p[i] denoting a pole radius:

p4 = 2 + par[i ]* 100 if par[i ] >= 1 The bandpass filter result of fig. 3 was obtained
using strategy DE/best/2 with NP=300, F=0.5
e) For all p[i] denoting a pole angle: and CR=1. The entire design took 2,210,400
evaluations of the bandpass transfer function. A
p5 = 2 + par[i ]* 100 if
total of 24 parameters was used, 6 zero radii, 6
par[i ] ∈" stopband of filter " zero angles, 6 pole radii and 6 pole angles in the
complex z-plane [3]. The overall gain constant a0
To determine the deviation from the tolerance

4
was set to 0.005. The magnitude response of the  π "# .
filter still violates some parts of the tolerance
interval
! 2$
x ∈ 0, The strategy used was

scheme slightly, yet the design was satisfactory DE/best/1 with NP=20, F=0.9 and CR=1. It took
for the howling remover. 30,020 function evaluations to get the result of
The lowpass filter result of fig. 4 was obtained fig. 5. The final speed increase of the cosine
using strategy DE/best/2 with NP=200, F=0.5 function computed by opti(x) was 17% compared
and CR=1. The entire design took 83,800 to the library function cos(x).
evaluations of the lowpass transfer function. A
total of 16 parameters was used, 8 zero radii and Conclusion
8 zero angles in the complex z-plane [3]. The Several variants of Differential Evolution (DE)
overall gain constant a0 was set to 0.005. have been introduced and general hints about
The third optimization for the howling remover their usage have been provided. Three real-world
was concerned with the cosine function the design tasks appearing in the development of a
evaluation of which takes up a non-neglectable howling remover for audio communications have
been solved successfully by applying DE. All
0, π "#
amount of computing time. Hence an
three design tasks could have been performed
approximation of cos(x) in the interval
! 2$ with specialized design tools; the advantage of
using DE, however, was that neither specialized
was performed using a polynomial opti(x) of third
and most probably expensive tools nor expert
degree. Fig. 5 shows that opti(x) yields an
knowledge concerning the design tasks
improved approximation compared to the taylor
themselves was necessary.
polynomial taylor(x) of third degree.

1
References
0.8 opti(x)=0.9975575805 [1] Storn, R. and Price, K., Differential Evolution
cos(x), taylor(x), p(x)

+0.03400468081*x
0.6 - 0.6044035554*x^2
- a simple and efficient adaptive scheme for
0.4 +0.1129638031*x^3 global optimization over continuous spaces,
0.2 Technical Report TR-95-012, ICSI,
0 http://http.icsi.berkeley.edu/~storn/litera.html
-0.2
-0.4 [2] Storn, R. and Price, K., Minimizing the real
-0.6 cos(x) functions of the ICEC'96 contest by
taylor(x) = 1 + 0.5*x^2
-0.8 Differential Evolution, Int. Conf. on
-1
0 0.5 1 1.5 2 2.5 π Evolutionary Computation, Nagoya, Japan.
x
[2] Mitra, S.K. and Kaiser, J.F., Handbook for
Fig. 5: Approximation of cos(x) by means of a digital signal processing, John Wiley, 1993.
polynomial of third order.

The optimization of the coefficients in opti(x) was


performed by minimization of the sum of error
squares obtained by 100 sampling points in the

You might also like