You are on page 1of 7

A Crossover Operator Using Independent Component Analysis for

Real-Coded Genetic Algorithms

Masato Takahashi Hajime Kita


Interdisciplinary Graduate School of Faculty of University Evaluation and Research
Science and Engineering National Institution for Academic Degrees
Tokyo Institute of Technology 3-29-1 Otsuka, Bunkyo, Tokyo 112-0012, JAPAN
4259 Nagatsuta, Midori, Yokohama 226-8502, JAPAN kita@niad.ac.jp
jin@ktl.dis.titech.ac.jp

Abstract- For Real-coded Genetic Algorithms, normal distribution crossover[8], its multi-parental
there have been proposed many crossover opera- extension[9], and the simplex crossover[l0] shows good
tors. The blend crossover (BLX-a) proposed by search ability. Because of their linear combination op-
Eshelman and Schaffer shows good search ability eration, these operators cope with non-separability well.
for separable fitness functions. However, because On the other hand, they loses fundamental -concept of
of its component-wise operation, the BLX-a faces crossover that crossover operation combines the build-
difficulties in optimization of non-separable fit- ing blocks composing parents in generating offspring.
ness functions. The present paper proposes a This paper proposes a novel approach to cope with
novel crossover operator that combines the BLX- non-separability using the component-wise crossover.
a with the Independent Component Analysis That is, through statistical analysis of the population
(ICA). That is, by applying the ICA to the popu- by means of the principal component analysis (PCA)
lation, the coordinate system of the search space and/or the independent component analysis (ICA) [16,
is transformed so as to increase separability of 17, 181, the coordinate system of the search space is
the fitness function, and then the BLX-a is a p transformed so as to eliminate the non-separability of
plied. Computer simulation shows good search the fitness function. Then the component-wise crossover
ability of the proposed method for non-separable operator is adopted.
fit ness functions.
2 Crossover Operators for RCGA
1 Introduction
In this section, as representatives of the aforesaid two
Real-Coded Genetic Algorithms (RCGAs) that uti- categories of the crossover, two crossover operators show-
lize floating point representation attract attention as ing good search ability are introduced briefly.
methods for global optimization in continuous search
spaces as well as the Evolution Strategies[l4], and there 2.1 Blend Crossover
has been proposed many crossover operators for the
In the BLX-a[6], offspring are generated as follows:
RCGA[l, 2, 3, 4,5, 6, 7, 8, 9, 10, 111. The RCGAs with
sophisticated crossover operators have shown excellent (1) Choose two parents x1,x2randomly from the pop-
optimization abilities[5, 6, 7, 8, 9, 10, 111. Further, as ulation.
for the design guidelines of the crossover operators for
RCGAs, Kita et al. has proposed those of ‘preservation (2) A value of each element x,“of the offspring vector
of statistics’ and ‘diversity of offsprings’ based on theo- zcis chosen randomly from the interval [ X i ,X;]
retical analysis[l3, 91. following the uniform distribution, where
Crossover operators for the RCGA and recombina-
tion operators for the ES[14] can be categorized into two: X: = min(zf,x:) - cvdi
Operators in the first category combine parental infor- X? = max(xt,x:) +adi (1)
mation in a component-wise manner in generating off- di 1 lx? -221
2 2
spring. Among these operators, the blend crossover[6, 71
and the simulate binary crossover[5] shows good search and xt and x; are the i-th elements of x1 and x2,
ability. However, because of their component-wise op- respectively, and a a positive parameter.
eration, these crossover operators face difficulty in opti-
mization of non-separable functions[l2, 7, 81. For the value of a , Eshelman and Schaffer have used
In the second category, for mixture of the parental 0.5[6]. With our calculation, the value of a that pre-
information, linear combination is used. The unimodal serves the variance of the parental population is 0.366.

02001 IEEE
0-7803-6657-3/0l/$l0.00 643
Preliminary experiments shows that the latter value in the UNDX-m, and therefore, t'he UNDX-m well pre-
achieves better performance. serves the mutual relationship among variables. How-
Because of the component-wise nature of the BLX-a, ever, since it uses relatively small m, diversity of off-
mutual dependency among the variables are not well con- spring generated by the UNDX-vz may not sufficient.
sidered. Hence, the BLX-a faces difficulty in optimiza- F'urther, the basic concept of crossover that crossover
tion of non-separable fitness functions[8]. Considering operation combines the building blocks of parents is lost
this problem, Eshelman et al. have proposed a method in the crossover operators of the linear combination type.
that restrict generation of offspring around the diagonal
region connecting the parents[7]. However, it introduces 3 Proposed Method
more parameters to be adjusted, and therefore, tuning
of the parameters by trial-and-error gets difficult. 3.1 Concept of the Proposed Method
In the previous section, we have discussed that the BLX-
2.2 Unimodal Normal Distribution Crossover
a has an advantage in generating offspring having diver-
Ono et al. have proposed the unimodal normal distribu- sity on one hand, and has a disadvantage in consideration
tion crossover (UNDX)[ 8 ] , and Kita et al. has extended of mutual relationship among variables. This paper pur-
the original UNDX to utilize the information of multi- sues a method of finding the mutual relationship through
ple parents[9]. It is called the UNDX-m. The UNDX- statistical analysis of the population, t r ansforniing the
1 corresponds to the original UNDX. The procedure of coordinate system of the search space so as to reduce
generating offspring in the UNDX-m is as follows: the identified relationship, and then applying the BLX-
a.
+
Choose m 1 parents x l , . . .,x m S 1randomly from Due to the selection operation in GAS, the population
the population. distribute in a region of small fitness values (minimiza-
tion problems are considered). If there exists mutual
Let the center of mass of these parents be p =
relationship among the variables, the distribution is not
m+l E, x2,and let the difference vectors between
I
parallel t o the axes. Applying multivariate analysis t o
x a and p be dz = x 2 - p . the population, we extract such structure, and transform
Choose another parent xmt2 from the population the coordinate system so as to reduce the mutual rela-
randomly. tionship among variables. As for statistical methods, we
employ the principal component analysis (PCA), a well
Let D be the length of elements of dm+2= xm+2- established method for extracting the structure of mul-
p orthogonal t o d l , . . ., d". tivariate data, and the independent component analysis
(ICA). The ICA is a method for such purpose studied in-
Let e l , . . ., en-" be the orthonormal basis of the tensively these years[l6, 17, 181. 'The reason of adopting
subspace orthogonal t o d', . . ., d". the ICA is discussed in Appendix.
Generate offspring zc as follows:
m n
.. - m
...

x C= p + widi + viDei (2)


i=l i=l
where wi and vi are normal random numbers fol-
lowing N ( 0 ,a t ) and N ( 0 ,o;),respectively, and or" Figure 1: Concept of the F'roposed Method
and U: are parameters.
For the parameters of the UNDX-m, Kita et al. has
3.2 Principal Component Analysis
proposed
Suppose there exists m sets of data of n variables, which
og = 1/ Jm,a q = 0 . 3 5 I J n - m is represented by an n x m matrix X = { x i j } (i = 1, ...,n,
j = 1,. . .,m). The variance-covariance matrix S = { s i j }
which are obtained from the condition of preserving the
of X is given by
variance-covariance matrix of the parental population
and empirical values suggested for the UNDX[8]. As
for the value of the parameter m they shows that 4 5 N (3)
gives good results empirically.
Linear combination of the parents (the first and the where data are normalized in advance so that the mean
second terms of the RHS of Eq. 2.2 plays dominant role
of each variables becomes zero. Transforming the data

644
X with a n x n matrix A t o Y = AX, the variance- An example of the PCA and the ICA is shown in Fig.
covariance matrix S' = {si3), siJ = & ETElyzkyJk of 2 . The dimension of the space and the number of the
Y = {yZ3}is given by data are n = 2 and m = 2000, respectively. The data
X is generated in a parallelogram uniformly. With the
sf = A S A ~ (4) PCA, an uncorrelated distribution Y is obtained. How-
ever, it is still not parallel to the axes. With the ICA, a
The principal component analysis (PCA) is to find a distribution parallel t o the axes is obtained. Thus, trans-
matrix A that makes S' the identical matrix. That is, formation t o an independent distribution is achieved.
t o find a matrix that eliminates the correlation among
variables of Y .
Since S is a real symmetric matrix, there exists a or-
thogonal matrix P and a diagonal matrix A such that
PTSP = A. A diagonal matrix A whose diagonal ele-
ments are eigen values lambdal, . . . ,A, of S , and a ma-
trix P whose columns are corresponding eigen vectors
are a pair of such matrices. Hence, we obtain

s' = (AP)A(AP)~. (5) 'Y-2 , 12-2 I

To make S' the identical matrix, a transformation matrix


A can be obtained as follows:

A = d i a g ( l / ~ , 1 / J X 2 , . . . , 1 / ~ ) P T (6)
j__ -L L -
r
Assuming all the eigen values are positive, we can ob-
tained the inverse transformation A': also as follows: (b) After PCA (c) After ICA
Y=AX Z = CX = B A X
A-' =Pdiag(fil,fi2,... ,fin) (7)
Figure 2 : Transform of the population with the PCA
3.3 Independent Component Analysis andtheICA.

As shown in the previous section, the PCA is a method


of eliminating the correlation among the data. The in- 3.4 Crossover Combined with the PCA/ICA
dependent component analysis (ICA) is a method that
makes the p.d.f. of variables mutually independent. Let
the transformation matrix and the transformed data be
C and Z = C X , respectively.
In the fixed point algorithm proposed by Hyvarinen
et a1.[17], first, the data are made uncorrelated by the
PCA:
Y=AX (8)
Then, by obtaining a unit weight vector b that minimizes
or maximizes the kurtosis of b y , and with n such vectors
B = (bl,...,bn)T,weobtain
Z = B Y = BAX (9)

That is, C = BA. Since B is obtained as an orthogonal


matrix, we obtain the inverse transformation as
c-1 = ~ - 1 ~ =
- A
1-~BT Figure 3: Procedures of BLX-a, BLXPCA and BLXICA

= Pdiag(fil,...,h,)BT (10) Let X be an n x m matrix representing the popula-


tion of m individuals in the n dimensional search space.
In our study, the method of obtaining b l , . . ., b, one Using X, calculate transformation matrices A(X) and
after another takes long computation time, and therefore C(X) = B ( X ) A ( X ) of the PCA and the ICA, respec-
we employed the method of calculating the basis vectors t ively.
in parallel[l8].

645
Then, choose parents x1 and x2 randomly from the estimated p.d.f. The approach proposed in this paper
population, and apply the ICA transformation: shares a common idea of applying statistical analysis to
the population with EDA on one Inand, and on the other
z1 = C ( X ) Z ' (11) hand, it shares crossover operation for sampling novel
z2 = C(X)X2 (12) solutions with the conventional RCGAs.

Next, with the BLX-a applied to the transformed par-


4 Numerical Experiments
ents, we obtain a child in the transformed space
So as to examine the search ability of the BLXPCA and
Z = BLX-a
' ( % I , z2) (13) BLXICA, we have carried out numerical experiments.
Finally, applying the inverse transformation to it, we The purpose of the experiments is t o capture the charac-
obtain a child in the original search space: teristics of the search of the proposed method. Compre-
hensive test t o confirm the search ability of the proposed
X' = C-'(X)Z' = A-'(X)B-'(X)z" (14) method for broad range of the tect functions is i t subject
of future study.
We call this procedure 'the BLXICA.' A similar opera-
tion can be obtained using the PCA instead of the ICA. 4.1 Test Functions
We call it 'the BLXPCA.' Figure 3 illustrates the process
of the original BLX-a, the BLXPCA and the BLXICA. For experiments, three non-separable functions are cho-
The following is a realization of a genetic algorithm sen to examine the effects of the proposed method. The
using the BLXICA (or BLXPCA). For generation al- dimension of the space 'n is 20 for all the functions. All
ternation, the minimal generation gap (MGG) model, the functions are to be minimized.
a variation of the steady state GA, proposed by the Rotated Rastrigin Function : it is a function ob-
Satoh et a1.[15] is used. No mutation is used because tained by the rotation transformation by ~ / for 6
the BLX-a shows basically good search ability with- all the pairs of the axes to the Rastrigin function:
out mutation, and simple-minded mutation without self-
adaptation having difficulty in the RCGA: n
f(2) = 10n + E{.: - 10c0s(2~2,))
1. Generate an initial population X. 2=1
2. Obtain matrices A ( X ) and B(X) applying the It is highly multi-modal. Non-separability is in-
PCA and ICA t o the population. troduced by the rotation transformation. Optimal
solution is the origin of the coordinate system. Ini-
3. Choose parents x1 and x2 randomly and apply the
tial population is generated following the uniform
ICA (or PCA) transformation t o them.
distribution on [-5.12, 5.12I2O.
4. Generate c children of the parents with the BLX-a
Rosenbrock Function : it is given by
in the transformed space, where c is a parameter
of the MGG. n

5. Apply the inverse transformation t o the children.


f(2) = ~ { l O O ( Z l- 2:)2 + (22 - 1)2}
2=2

6. Evaluate the fitness values of the children, and re- -2.048 < X, < 2.048
place the parents with the individual having the
While this function is unimodal, it is strongly non-
best fitness and an individual chosen by roulette
separable. The optimal solution is (1, ..., l ) , which
selection among the union of the parents and chil-
is located on a curved and steep-walled valley.
dren.
Ill-scaled Rosenbrock Function : it is given by
7. If the generation reaches the prescribed value, ter-
minate the algorithm. Otherwise, go to step 2. n

It should be also noted that the proposed approach


f(z) = -y{loo(Sl- (i22)2)2 + (izz- 1 ) 2 }
~. .~ 2=2
is an intermediate one between the conventional RC- -2.048/i < X, < 2.048/i
GAS and the optimization technique called 'the esti-
mation of distribution algorithms (ESA)' in continuous This function is a poorly scaled version of the pre-
domain[l9, 20, 211. In the latter approach, first, the vious Rosenbrock function.
probability density function (p.d.f.) of good solutions
is estimated through statistical analysis of the popula-
tion, and second, novel solutions are sampled using the
646
\-

10000, . . . . . . . . . I 10000, . . . . . . . . .
100
1
$ 0.01
5
i.-le-06
BLX-cu 10.0001 0.0001
le-06
le-06
1-10 le-10
0 4000 8000 12000 16000 0 4000 6000 12000 16000 20000 0 4000 8000 12000 16000 20000
generation generation generation

l000Ol . , . , , , , , , 1 loOOol , , , . , . , , . 1 10000

UNDX E 0.0001 5 0.0001


i le-06
" 4W" LIWU lL"W IOUW LUUW
leo6
le-10 .
1
0 4000 8000 12000 16000 20000
1e-06
le-10
0 4000 6000 12000 16000 20000
generation generation generation

10000 lO00Or.. , . , . , , . I

100 100 100


1 1 1
$
c
0.01 g 0.01
t O.O1
UNDX-4 c'o.0001 5 0.0001 ~0.0001
11-06 i 1le-06
I

18-06
1e-08 18-08 1e-08
le-10
0 4000 8000 12000 16000 20000
le-10
o 41x0 B~OO~,I;,~PO
_I
16011IO 20000
le-IO'
0 4000
" " 6000
""' 12000 16000 20000
generation L._. . generation

10000, . . . . . . . . . , 10000, , . , , . . , , .
100 100
1
$ 0.01
c
BLXPCA ..
c' 0.0001
2 le-06

19-10 UiL1
0 4000 8000 12000 16(100 20000
le08
-
iPin
.-
0 W O 6000 12000 16000 20000
aeneration generation

10000, . . . , , . , , I 10000, . . . . . . . . . I tow0 , . . . . . . . . ,

I
1

100 100 I
, 100
1 1
0.01
g
BLXICA -
c'o.0001
i 19-06
O.O1
eo.0001
i le-06
le-06
le-10
0 4000 6000 12000 16000 20000
generation
leo8
le-10 f
0 4000 6000 12000 16000
generation
20000
1e-06
- ._
inin
0 4000 8000 12000 16000 20000
generation

Rotated Rastrigin Rosenbrock Ill-scaled Rosenbrock

Figure 4: Comparison of the crossover operators. The abscissa shows the generation, and the ordinate shows the
best fitness value among the population.

4.2 Setup of Experiments 0 Trials: 10 runs with different random seeds for each
test function and crossover operator.
Using the aforesaid test functions, performances of the
proposed methods (BLXPCA and BLXICA) are com- 0 Initial Population: Generated randomly in the pre-
pared with those of the conventional crossover operators. scribed region following the uniform distribution.
0 Compared Crossover Operators: Original BLX-a, 0 Parameter value of the BLX-a: Q = 0.366.
UNDX, UNDX-m, BLXPCA and BLXICA. 0 Parameter values of the UNDX: CTF = 1.0 and CT, =
0 Population Size: 300. 0.35/ d m
0 Parameter value of the UNDX-m: m = 4 , o ~= 1.0
0 Number of Children c: 200/generation.
and C T ~= 0 . 3 5 / d m

647
4.3 R e s u l t s reduces computational load remarkably without perfor-
mance degradation.
The results of the experiments are shown in Fig. 4. For
the rotated Rastrigin function, besides a few runs that
fails in finding the optimum, the UNDX-4 finds solution Acknowledgments
faster than the the BLX-a and the UNDX. Performance
The authors are grateful to Professor Shigenobu
of the BLXPCA is simular to that of BLX-a. Contrary
Kobayashi of Tokyo Institute of Technology for his
to this, quicker convergence is achieved by the BLXICA.
valuable comments on this research. This research
The performance of similar to that of the BLX-a for the
was supported by The ‘Research for the Future” Pro-
original Rastrigin function which is a multi-modal but
gram, Biologically Inspired Ada,ptive Systems (JSPS-
separable function. It suggests that the effectiveness of
RFTF96100105j’ of The Japan Society for the Promo-
combining the ICA with the BLX-a for non-separable
tion of Science.
multi-modal fitness function.
For the Rosenbrock function that has strongly non-
separable, the original BLX-a fails in finding the op- Bibliography
timum. The UNDX and UNDX-4 achieve successful L. Davis: The Handbook of Genetic Algorithms, Van
search, however, convergence of the UNDX is rather Nostrand Reinhold (1990).
slower than the UNDX-4. With the BLXICA and BLX-
PCA, optimization performance similar to that of the C.Z. Janikow and Z. Michalewicz: An Experimental
UNDX-4 is achieved. It shows that with the transfor- Comparison of Binary and Floating Point Represen-
mation of coordinate system using the PCA or ICA, the tations in Genetic Algorithms, R. K. Belew and L.B.
BLX-(r can find the optimal solution effectively. The Booker Eds.: Proceedings 0.f the 4th International
convergence obtained by the BLXICA fluctuates by runs Conference on Genetic Algorithms, 31-36 (1991).
more than the BLXPCA. Reasons of this instability have
not been clarified yet. A. Wright: Genetic Algorithms for Real Parameter
For the ill-scaled Rosenbrock function, not only the Optimization, Foundations of Genetic Algorithms,
BLX-a but also the UNDX fails in finding optimum. pp. 205-218 (1991).
Convergence with the UNDX-4 also slows down com-
pared with the case of the Rosenbrock function. Con-
Z. Michalewicz: Genetic Algorithms +Data
Structures = Evolution Programs, Springer-Verlag
trary to this, with the BLXPCA and BLXICA, similar (1992).
performance to the case of the Rosenbrock function is ob-
tained. It shows that the transformation with the PCA K. Deb and R. B. Agrawal: Simulated Binary
and the ICA absorbs the effect of scale change of the Crossover for Continuous Search Space, Complex
coordinate systems. Systems, 9, 115-148 (1995).
Thus, the BLXICA and the BLXPCA achieves good
search ability for non-separable, and poorly scaled fitness L.J. Eshelman and J.D. Schaffer: Real-Coded Ge-
functions. netic Algorithms and Interval-Schemata, Founda-
tions of Genetic Algorithms 2, pp. 187-202 (1993).
5 Conclusions L.J. Eshelman, K.E. Mathias and J.D. Schaffer:
Crossover Operator Biases: Exploiting the Popu-
In this paper, for the real-coded genetic algorithms, a
lation Distribution, Proc. ICGA97, 354/361 (1997)
novel crossover operators that combine transformation
of the coordinate system through the statistical analysis I. Ono and S. Kobayashi: “A Real-coded Genetic
of the population and the blend crossover in the trans- Algorithm for Function Optimization Using Uni-
formed spaces. While the numerical experiments are pre- modal Normal Distribution Crossover,” Proc. 7th
liminary, the results shows that the performance of the ICGA, pp. 246-253 (1997).
proposed methods is promising. Subjects of future study
are 1) treatment of the degeneration in the PCA/ICA, H. Kita, I. Ono and S. Kobayashi: Multi-Parental
2) reduction of computational load in the PCA/ICA, 3) Extension of the Unimoda.1 Normal Distribution
refinement of selection operation more suitable to the Crossover for Real-Coded Genetic Algorithms,
PCA/ICA, 4) analysis of the transformation obtained Proc. of CEC’99, 111-646-651 (1999).
by the PCA/ICA more in detail, and 5) more compre-
S. Tsutsui, M. Yamamura and T. Higuchi: Multi-
hensive numerical experiments to examine the effective- parent Recombination with Simplex Crossover in
ness and limitations of the proposed methods. As for
Real Coded Genetic Algorithms, Proc. GECCO ’99
the reduction of computational load, a preliminary study
(1999)
shows that adopting the ICA for every five generations

648
[ll] I. Ono, H. Kita and S. Kobayashi: A Robust Real- Appendix: Population Distribution and
Coded Genetic Algorithm using Unimodal Nor- Independent Components
mal Distribution Crossover Augmented by Uniform
Crossover: Effects of Self-Adaptation of Crossover Assume that the fitness function f ( x ) can be expressed
Probabilities, Proc. of the Genetic and Evolutionary by an additive form of functions fi(wTx) of weighted
Computation Conference'99, 496-503 (1999). sums WTX of decision variables x:

[la] R. Salomon: Performance Degradation of Genetic


Algorithms under Coordinate Rotation, Proc. of 5th i
Annual Conference on Evolutionarg Programming,
pp. 155-161 (1996). As a result of selection by the GA, the population dis-
tributes more densely where the fitness function' takes
[13] H. Kita, I. Ono and S. Kobayashi: Theoretical Anal- smaller values (for minimization problems). Assume
ysis of the Unimodal Normal Distribution Crossover that the distribution of the population P(x) follows the
for Real-coded Genetic Algorithms, Proc. of the Boltzmann one:
ICEC'98, 529-534 (1998).
[14] H.-P. Schwefel: Evolution and Optimum Seeking,
Wiley (1995).
where T > 0 is a parameter.
[15] H. Satoh, M. Yamamura and S. Kobayashi: Min-
imal Generation Gap Model for GAS Consid- Transforming the coordinate system by the equations
ering Both Exploration and Exploitation, Proc. y i = WTX,we obtain
IIZUKA '96, pp. 494-497 (1997).
[16] C . Jutten and J. Herault: Blind separation of i i
sources, Part I: Adaptive algorithm based on neu-
where C is a constant. Equation (15) shows that the dis-
romimetic architecture, Signal Processing, 24, 1-20
tribution can be expressed by a product of the marginal
(1991).
probability density function of each component. Hence,
[17] A. Hyvarinen and E. Oja: A Fast Fixed-point Algo- if we can find such a transformation by the ICA of the
rithm for Independent Component Analysis, Neural population, the BLX-o that blends the solution in a
Computation, 9(7):pp. 1483-1492, (1997) component-wise manner will work effectively.

[18] http://www.cis.hut.fi/projects/ica/fastica/
[19] M. Pelikan, D.E. Goldberg, and F. Lobo.: A survey
of optimization by building and using probabilistic
models, IlliGALs Technical Report 99018 (1999).
[20] M. Sebag and A. Ducoulombier: Extending
population-based incremental learning t o contin-
uous search spaces, Proc. PPSN V,pp. 418-427
(1998).
[21] P.A.N. Bosman and D. Thierens: Expanding from
Discrete to Continuous Estimation of Distribution
Algorithms: The IDEA, Proc. PPSN VI, pp. 767-
776 (2000).

You might also like