You are on page 1of 13

F

o
r

P
e
e
r

R
e
v
i
e
w

O
n
l
y






Statistical Model of Evolutionary Algorithm for
Feed-Forward ANN Architecture Optimization


Journal: Journal of Experimental & Theoretical Artificial Intelligence
Manuscript ID: Draft
Manuscript Type: Original Article
Keywords: Artificial neural network, , Crossover., schema theory, topology mutation



URL: http://mc.manuscriptcentral.com/teta
Journal of Experimental & Theoretical Artificial Intelligence
F
o
r

P
e
e
r

R
e
v
i
e
w

O
n
l
y








ABSTRACT
The optimization of feed-forward
architecture designing is the evolution of
Artificial Neural Network (ANN). There is
no systematic procedure to design a near-
optimal architecture for a given application
or task. The pattern classification methods
and constructive and destructive algorithms
can be used for designing of architectures.
The proposed work develops the statistical
model of Evolutionary algorithm (EA) to
optimize the architecture. A single-point
crossover is applied with selective schemas
on the network space and evolution is
introduced in the mutation stage so that an
optimized ANNs are achieved.

Keywords: Artificial neural network,
topology mutation, schema theory
Crossover.
1 INTRODUCTION: Genetic algorithms
were developed by John Holland [1], [3], [2]
& [4]. Due to day to day life, a growing
number of applications combined with a
hardware enhancement, a variety of EAs are
becoming more and more popular. A family
of subsets of the search space and
appropriate process of re-encoding are two
notions and analogous to familiar facts
relating continuous maps and families of
open sets or measurable functions. In order
to apply on EA to a typical optimization








problem, we need to model the problem in a
suitable manner, i.e. to construct a search
space together with positive valued fitness
function and a family of mating and
mutation transforms. Therefore EAs can be
represented by a 4 tuple (, , , ) order.
is the family mating transforms; M is the
unary transformation on . The total search
space is divided into invariant subsets [3]
and a crossover operation is performs on .
While M is the family of mutations on and
is Ergodic, i.e. it ensures that MarKov
process [5] modeling the algorithm is
irreducible. The schemata correspond to
invariant subsets of the search space and the
schema theorem can be reformulated in
general framework. The invariant subsets of
the search space are encoding process
relating continuous maps and families of
open sets or measurable functions and sigma
algebras. A classical Geiringer theorem is
extended to represents a class of
evolutionary computation techniques with
crossover and mutation.
2.0 Representation of Evolutionary
Algorithm:
The mathematical foundation on
evolutionary algorithms representation given
section 1.0, we exploit the language of
category theory [6] is used. To apply on
evolutionary algorithm on a specific
Statistical Model of Evolutionary Algorithm for
Feed-Forward ANN Architecture Optimization

G.V.R. Sagar nusagar@gmail.com
Assoc. Professor,
G.P.R. Engg. College.
Kurnool AP, 518007, India.

Dr. S. Venkata Chalam sv_chalam2003@yahoo.com
Professor,
CVR Engg. College,
Hyderabad AP, India
Page 1 of 12
URL: http://mc.manuscriptcentral.com/teta
Journal of Experimental & Theoretical Artificial Intelligence
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
F
o
r

P
e
e
r

R
e
v
i
e
w

O
n
l
y
optimization problem, we need to model the
problem in a respective manner. This needs
to build a search space which contains
the elements of all the possible solutions to
the problem, a computable positive valued
fitness function ( ) , 0 : f and a suitable
family of mating or crossover and
mutation transforms.
The category of Heuristic 3-tuples: All the
families (F) are invariant subsets [3] of ,
characterize all families in set-theoretic and
sigmaalgebra.
Let denote a nonempty family of
transforms from
m
to for a fixed 1 m ,
for m-fixable families. We then denote the
family of invariant subsets of under
family of is

.
( ) { } =

T S S T S S
m
, 3.18
It follows that for every element x there
is a unique element of

containing x.
For a heuristic 3-tuple ( ) M F, , = is a 3-
tuple such that { } = ,
M
. Let x , For
a single heuristic 3-tuple = (, F, M),
denoted by

x
S then the smallest element of
F
is family of invariant subsets.
In a similar manner for given two heuristic
3-tuples ( )
1 1 1 1
, , M F = and
( )
2 2 2 2
, , M F = , we define a function
2 1
:
a
represents the reproduction
transformation called as morphism. Let
x
1
and y
2
1
F T and
( )
( ) 2 ,
2
, F F y x x
y x
= such that
( ) ( )
( )
( ) ( ) ( ) y x F y x T
y x
, ,
,
= 3.2
Similarly
1
M
M
and
2
M H
x x

such that
( )
( )
( )
( )
x x x
H M = , gives a
collection of all morphisms from
1
into
2

denoted by M.
A Generalization of Geiringers theorem
for EAs
A family of recombination operators, (also
see in [7]) of a given evolutionary algorithm
changes the frequency, with which various
elements of search space are sampled [1],
[8]. To illustrate this point, let
i
n
i
A
1 =
=
denote the search space of a given
evolutionary algorithm first discussed in [9].
Fix a population P consisting of a m
individuals with m being an even number. P
can be thought of as an m by n matrix whose
rows are individuals of the population P,

(
(
(
(
(
(
(
(
(

=
mn m m
n
n
a a a
a a a
a a a
P
. . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . .
. . . .
2 1
2 22 21
1 12 11
1.0
The elements of the i
th
column of P are
members of A
i
. The general Geringers
theorem [10] tells us the limiting frequency
with which certain elements of the search
space are sampled in the long run, provided
one uses crossover operator [19] alone is
represented by ( ) i p h , , , where hA
i
the
proportion of rows, say j of p for which
aji=h. i.e. if one starts with a population of
individuals and runs a evolutionary
algorithm in the absence of selection and
mutation (crossover being the only operator
involved) then, in the long run, the
frequency of occurrence of the individual
( )
n
h h h ..... , ,
2 1
before time t, represented by
( ) t h h h
n
, ..... , ,
2 1
is
( ) t h h h
n
t
, ..... , , lim
2 1


= ( ) i p h
n
i
, ,
1

=
1.1
The limiting distributions of the frequency of
occurrence of individuals belonging to a
certain schema under these algorithms have
been computed also appeared in [11], [12],
[13]. The classical Geiringer theorem and
proposed or modified Geiringer algorithms
Page 2 of 12
URL: http://mc.manuscriptcentral.com/teta
Journal of Experimental & Theoretical Artificial Intelligence
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
F
o
r

P
e
e
r

R
e
v
i
e
w

O
n
l
y
established from the basic facts about
MarKov chains [5] and random walks on
groups. This is mainly a matter of
formulating the statement of the theorem in a
slightly different manner. This new point of
view not only the existing various of
Geiringers theorem applied EAs, but also
extends the process on evolutions
algorithms. Below we shall give a more
formal description of an EA then the one
given in Section 1.

Framework:
A population P of size m is simply an
element of
m
. (It is a column vector). An
elementary step is a probabilistic rule which
takes one population as an input and produce
another population of the same size as an
output. We shall consider the following
types of elementary steps.
Selection: Consider a given population P as
an input.

(
(
(
(
(
(
(
(

=
m
x
x
x
P
.
.
.
2
1
with
i
x 1.2
The individuals of P are evaluated

( )
( )
( )
m m
x f
x f
x f
x
x
x

(
(
(
(
(
(
(
(

. .
. .
. .
.
.
.
2
1
2
1
1.3
A new population P
1
is obtained, where y
i
s
are chosen independently m times from the
individuals of P and y
i
=x
j
with probability

P =

This means that, the individuals of P
1
are
among there of P, and the expectation of the
number of occurrences of any individual of
P in P
1
is proportional to the number of

(
(
(
(
(
(
(
(

=
m
y
y
y
P
.
.
.
2
1
1
1.4
occurrence of that individual in P times the
individuals fitness value. In particular, the
fitter the individual is, the more copies of
that individual are likely to be present in P
1
.
On the other hand, the individuals having
relatively small fitness value are not likely to
enter into P
1
at all. This is similar to imitate
the natural survival of fittest principle.
Crossover: The population P
1
is the output
of the selection process. Now consider the
search space be a set, Fix on ordered K-
tuple of integers ( )
k
q q q q ........ , ,
2 1
= with
k
q q q ........
2 1
. Let K denote a partition
of set {1, 2, ..m} mN. Now partition the
set that partition K is q-fit it
{ }
k
p p p K ....... , ,
2 1
= with
i i
q P = and is
denoted by
m
q
the family of all q-fit
partitions of {1, 2, .m}.

Let there are
qk q q
F F F ...... , ,
2 1
fixed families of
q
i
are operations on and P
1
, P
2
,..P
k
be
the probability distributions on
( ) ( ) ( )
qk
qk
q
q
q
q
F F F .......... , ,
2
2
1
1
respectively. Let
P
m
be the probability distribution on the
collection
m
q
of partitions {1, 2, m}
there the their exists a 2(k+1) tuple
( )
m k qk q q
P P P P F F F , .... , , , ...... , , ,
2 1 2 1
.
According to the above process . The given
reproduction K-tuple
( )
m k qk q q
P P P P F F F , .... , , , ...... , , ,
2 1 2 1
. The
individuals of P are portioned into pairwise
disjoint tuples for mating according to P
m
is
( )
( )
l
x f
x f
m
l
j

=1
Page 3 of 12
URL: http://mc.manuscriptcentral.com/teta
Journal of Experimental & Theoretical Artificial Intelligence
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
F
o
r

P
e
e
r

R
e
v
i
e
w

O
n
l
y
( ) ( )
( )

)

=
........ ,.... ,
,....., .... , , , ...... , ,
2 1
2
2
2
2
2
1
1
1
1
2
1
1
j
qj
j j
q q
i i i
i i i i i i
K then
the corresponding tuples are given by
(
(
(
(
(
(
(
(

=
1
1
1
2
1
1
.
.
.
1
q
i
i
i
x
x
x
Q
(
(
(
(
(
(
(
(

=
2
2
2
2
2
1
.
.
.
2
q
i
i
i
x
x
x
Q
(
(
(
(
(
(
(
(

=
j
qj
j
j
i
i
i
j
x
x
x
Q
.
.
.
2
1
1.5
Having selected the partition, replace every
one of the selected q
j
tuples

(
(
(
(
(
(
(
(

=
j
qj
j
j
i
i
i
j
x
x
x
.
.
.
2
1
with the q
j
tuple 1.6


( )
( )
( )
(
(
(
(
(
(
(
(
(
(

= .
. . . , ,
. , . . . . . .
. , . . . . . .
. , . . . . . .
. , . . . . . .
, . . . , ,
, . . . , ,
2 1
2 1
2 1
2
1
1
j
qj
j j
j
qj
j j
j
qj
j j
i i i
qj
i i i
i i i
x x x T
x x x T
x x x T
1.7
For a q
j
- tuple of transformations
( ) ( )
qj
qj qj
F T T T ,...... ,
2 1
selected randomly
according to the probability P
j
on ( )
qj
qj
F .
This gives a new population.

(
(
(
(
(
(
(
(

=
m
y
y
y
P
.
.
.
2
1
1
1.8
Notice that a single child does not have to be
produced by exactly two parents. It is
possible that a child has more than two
parents. Asexual reproduction (mutation) is
also allowed.
A general evolutionary search algorithm
works as follows. Fix a cycle, say
{ }
j
n n
S C
1 =
= when S
n
is a finite sequence of
elementary steps. Now start the algorithm
with an initial population P given above may
be selected randomly. To run the algorithm
with cycle { }
n
S C = , simply input P into S
1
,
run S
1
, input the output of S1 to S
2
.. into
the O|P of S
j-1
into S
j
and produce the new
O|P, say P
1
. Now as an initial population and
run the cycle C again. Continue the is loop
finitely many times depending on the
circumstances. A recombination sub-
algorithm defined by a sequence of
elementary steps of reproduction only.
Modified Evolutionary algorithm model:
The general structure of EA proposed in [14]
The evolution algorithm has used the
following operators
a. Initialization
b. Recombination or Crossover
c. Mutation
d. Selection
The frame work of the EAs approach
requires a floating architecture and a fixed
population size. The population size,
maximum size and structure of the network
and genetic parameters are user specified.
The weight population is initialized with
user-defined number of hidden nodes for
each individual in order to create a new
population and the weights are generated
randomly same as the size of the population.
Page 4 of 12
URL: http://mc.manuscriptcentral.com/teta
Journal of Experimental & Theoretical Artificial Intelligence
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
F
o
r

P
e
e
r

R
e
v
i
e
w

O
n
l
y

ANN Recombination or Crossover:
In the proposed method, from the above
discussion consider a search space set and
family of transformations F
q
form
q
into ,
fix an ordered q-tuple of transforms
( )
q q
F T T T ...... , ,
2 1
. Now consider the
transformation
q q
q
T T T : ...... , ,
2 1

sending any given element

q
q
x
x
x

|
|
|
|
|
|
|
|

\
|
.
.
.
2
1
into

( )
( )
( )
q
q qj
q
q
x x x T
x x x T
x x x T

(
(
(
(
(
(
(
(
(

.
. . . , ,
. , . . . . . .
. , . . . . . .
. , . . . . . .
. , . . . . . .
, . . . , ,
, . . . , ,
2 1
2 1 2
2 1 1
1.9
Let the subsequence { }
j
n n
S C
1 =
= (The
element step S
n
is or recombination) the
recombination sub-algorithm of proposed
EA reproduces K-tuple
( )
m k qk q q
P P P P F F F , .... , , , ...... , , ,
2 1 2 1
, this heuristic
search algorithm results a MarKov process
with a state space of population, P of fixed
size m and is devoted by ( )
m m
P . Let
the transition probability P
xy
is simply the
probability that the population
m
y is
obtained from the population x by going
through the recombination cycle once. These
transition probabilities have been computed
but MarKov chain obtained is difficult to
analyze.
Let fix an EA A and the probability that a
population y is obtained from the population
X upon the completion of n complete cycles
of the recombination with a probability
0
.
>
n
y x
P . We also write Y X
A
for X
leads to Y or with a population
m
P , we
also write [P]
A
denotes the equivalence class
of the population P under the equivalence
relation
A
.
Therefore The MarKov chain initiated at
some population
m
P
is irreducible and
its unique stationary distribution is the
uniform distribution on [P]
A

.

Now fix a partition
( )
m
qn k
P P P K = ,......... ,
2 1
when
( )
n
k
n n
q q q qn ,......... ,
2 1
= and now fix a parti-
cular choice of tuples of transformation
( ) ( ) ( )
n
i
n
i
n
i
n
i
q
q
q
q
i
qin
i i
i
F F T T T T = : ...... , ,
2 1
2.0
such that ( ) , 0 ...... , ,
2 1
>
i
qin
i i n
i
T T T P
First notice that we can identify
m
with the
set
n
k
n n
q q q
....
2 1
via portion
( )
k
P P P K ... , ,
2 1
=

as follows
given ( )
m
m
x x x x = .... , ,
2 1
, identify
m
x
with the one point cross over element
( )
x
k
x x
x
u u u u ... , ,
2 1
=
r
when ( )
n
a a a
x
i
x x x u
1 2 1
,...., , = ,
i
n
i
P aq a a ..., , ,
2 1
and
n
ai
a a a < < < ....
2 1
.
Now to define a transformation as
m m k
T T T
K
T :
...., , ,
2 1
.
The output of the elementary step S
n

recombination is ( ) Y X T
k
T T T
K
=
...., , ,
2 1
when
m
Y corresponds to
( ) ( ) ( ) ( )
x
k k
x x
y
u T u T u T u ... , ,
2 2 1 1
=
r
. The transform
k
T T T
K
T
...., , ,
2 1
is the bijection. Indeed, two-sided
inverse of
k
T T T
K
T
...., , ,
2 1
is the transformation
( )
1
...., , ,
2 1

k
T T T
K
T which sends a given
m
x into
m
y corresponding to the elements

( ) ( ) ( ) ( )
n
k
n n
q q q
x
k i
x x
u T u T u T



.....
... , ,
2 1
2 1
1
2
1
1
1
2.1
Page 5 of 12
URL: http://mc.manuscriptcentral.com/teta
Journal of Experimental & Theoretical Artificial Intelligence
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
F
o
r

P
e
e
r

R
e
v
i
e
w

O
n
l
y
Then set of all such that transformation
denote by

=
ion recombinat for chosen are ,.... ,
and in partition a is
2 1
m
qn ,.... ,
2 1
k
k
T T T
n
T T T
K T
H
k

2.2
Now consider the set of transformation H
from
m
into itself as follows

{ }
n n j j
m m
H F oF o F o F T T H = =

, .... . :
1 1
2.3
therefore any transformation H T is a
composition of bijections, hence is itself a
bijection so that we can say that
m
S H


when
m
S

is a group of permutations of
m
.
Let G denotes the subgroup of Sm
generated by H. Now, when a E A A runs a
cycle on the input X amounts to selecting the
transformation form H independently and
applying them consecutively so that the
output of the cycle C on the input X would
be T(x) for some H T chosen with some
positive probability.
We now proceed to define a random walk
associates to a group action.
Let be a finite set and G. be a finite group
generated by H. ( ) G H and let d denotes
the identity of group G ( ) H e .
Let is a probability distribution on G
which is concentrated on
( ) ( )
n
xy
P H g g H . 0 > the probability
that a state X Y is reached from the state
in exactly n steps.
The random walk on action of a group G on
the set to be the Markov process with
transition probabilities is
{ } ( ) y x g g P y x
xy
= = ,
( ) ( ) x g x g = Q 2.4
Since H generates G, n large enough so that
G g we assure 0
) . (
>
n
x g x
P , such that
g
ng
m m m g ........ ,
2
2
2
1
=

now
let { } G g ng n = / max .
Therefore eq. (2.4) can be written as Markov
chain as by the definition of group action
( ) ( )
n
X d d d m m m x
n
x g x
g
ng
g g
P P

=
.... .........
) (
2 1

( ) ( )
=

n
x d d d m m m x
n
x g x
g
ng
g g
P P
......) ) ( .... ( ( ( ( (......... (
) (
2 1

P P x d d x d x d x .... ))........ ( )( ( ) (


P x x d d d x d d d ...... ) (......... ( )......) (....... (

( ) ( ) ( ) ( )......... .
1
x m m x m P x m P X
g
ng
g
ng
g
ng
g
ng x



( ) ( ) ( ) X m m m X m m m P
g
n
g
n
g
n
g
ng
g
n
g
n
. ....... .....
3 2 1 3 2

According to equation (2.4)
( ) ( )
( )
( )

>
ng
i
g
ni
ng n
m d
0
0 . 2.5
Equation (2.5) is an irreducible Markov
chain with a finite state space and it has
unique stationary unique distribution
denoted by is the initial distribution on x.
There fore we then have
( )
X
X
1
=

2.6
Then the distribution in the next generation
say . is given as .
( ) ( )

=
H m
m x m X ) . (
1
2.7

=
H m
m
X
) (
1
2.8
( )

= = =

X
X
m
X
H m
1
) (
1
2.9
Page 6 of 12
URL: http://mc.manuscriptcentral.com/teta
Journal of Experimental & Theoretical Artificial Intelligence
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
F
o
r

P
e
e
r

R
e
v
i
e
w

O
n
l
y
Since

=
H x
m 1 ) (
and is
concentrated on H.
Markov chain modeling an EA A is a
random walk associated to a finite group set
of G on X generates a new population or set
H, in the long run with a uniform
distribution ().
In this proposed evolutionary algorithms
given above to improve the behavior
between parents and off-springs. Single-
point crossover given in section xxx in
which, (single-point crossover) different
cutting points for each of the two parents in
the population. The cutting points are
independently extracted for each parent
because the genotype lengths of individuals
are variable. The cutting point is taken only
between the one layer and the next layer(for
two hidden layer between second layers of
two network parents); this means that a new
evolutionary weight matrix has created to
make connection between two layers at the
cutting points in the parents producing the
two off-springs, so that the population is
maintained constant. In each off-spring node
or layer creation and deletion is possible
based the predefined genetic parameters.

3.3 Topology Mutation:
The mutation transformations consist of
the transformations

:
a
M 3.0
where
{ } i S i n s
A U a


. ,......... 2 , 1
.
Therefore ( )
ik i i
a a a a ..... , ,
2 1
=

for
{ } n S i i i
a
k
,......., 2 , 1 .......
2 1


defined
as ( ) =
n i
x x x X ..... , ,
2
we have
( ) ( )
n
a
y y y y x M ... ,......... ,
2 1
= =

3.1
where

=
=
wise other x
j some for i q if a
y
q
j q
q

The global behavior of evolutionary
algorithms is to consider a group or family
of subsets of the search space and to predict
which ones of these subsets (say Q) must
satisfies the property is The expected
number of occurrences of elements of Q
increases from one generation to the next.
The each subset is called as schemata.
If the chromosomes length is fixed to n, the
search space is
i
n
i
A S
1 =
= where A
i
is set of
all possible alleles which may occur at the i
th

position in the chromosome. The next
section gives the selection of offspring based
on the fitness function.
3.4 Selection: A tournament is performed by
choosing the group of off-springs which are
selected randomly and reproducing the best
individual form this group. Now picking up
the P number of challengers as a group,
which is 10% of population size and arrange
the tournament with respected to fitness
between the P challengers and rth solution
and define the scores of rth solutions. The
scores are determined by the minimum
distance method using fitness function [18].
This is called the P tournament selection.
Arrange the scores of all the solutions in the
ascending order and pick up the best half
score positions. The best half scores are
considered for the next generation. Repeat
the process for r number of times, where r
is the twice the population size and obtained
the scores of r number of P-tournaments.
The selection probabilities for P-tournament
selection are given by

1 1

3.2
More number of selection pressures and their
comparision are given in [15], [16].

6. EXPERIMENTAL SETUP
The idea proposed in this work emphasis on
evolving ANNS; a new evolutionary system
for evolving feed-forward ANNs from
Page 7 of 12
URL: http://mc.manuscriptcentral.com/teta
Journal of Experimental & Theoretical Artificial Intelligence
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
F
o
r

P
e
e
r

R
e
v
i
e
w

O
n
l
y
architecture space. In this contest, the
evolutionary process attempts to crossover
and mutate weights before performing any
structural or topology crossover and
mutation. The evolutionary process is
involved in mutation of weights and
topology. Weight mutation is carried out
before structural or topology mutation.
Population size in EA taken as 20 and 10
independent trails have given to get the
generalize behavior. Condition of
terminating criteria is taken as fixed iteration
and it is equal to 100 for EA. Table 5.1 gives
all the parameters of the algorithm and
default setting values are taken for
considered problem. All the experiments are
run by specifying the parameters and by
tuning the genetic parameters to obtain the
best solution.

Table: 5.1 Default parameters.

In this work we considered five benchmark
problems are used to check the ANN
optimization.
a) N.bit (2 and 4) Parity (even) classifier
b) Pima-India diabetes classifier
c) SPECT heart decies classifier
d) Brest cancer classifier

Performance of N-Bit Parity (XOR)
classification Problem:
In simultaneous evolution of architecture
and connection weights, we considered only
2-bit and 4-bit parity encoders with different
network sizes are given in this section.


FIGURE 5.8 Performance of Evolutionary ANN for
2-bit parity with initial sizes of [2 3 2 1 2].


FIGURE 5.8 Performance of Evolutionary ANN for
2-bit parity with initial sizes of [2 2 2 1 2].

For parity 2/4 all networks in the space has a
maximum of 10 nodes including 2/4 inputs ,
number of hidden nodes in layer one,
number of hidden nodes in layer two, 1
output node and two hidden layers i.e. the
size is [2/4 2/3 2 1 2]. This allowed for
hidden layer configurations up to 5 nodes to
be evolved. The average and best generation
over all runs that found a solution for parity-
2 using accuracy fitness function and the
smallest architecture size found. The mean
square error (MSE) for 10 trail runs are
given in the Table 5.1 and the performance
of 5 runs are shown in Fig (5.8) and the run
3,4, & 5 completed in 50 generations and
run1&2 completed in 20 generations. The
average number of hidden nodes over 10
successful trail runs is 2.1 and the average
number of connections is 7.9. For ten runs,
Symbol Parameter
Default
value
N Population size 20
Seed
Previously saved
population
none

Probability of inserting a
hidden layer
0.1

Probability of deleting a
hidden layer
0.05

Probability of inserting a
neuron in hidden layer
0.05

Probability of deleting a
neuron hidden layer
0.05

Probability of crossover 0.1


Number of network
inputs
Problem
specific


Number of network
outputs
Problem
specific
K MSE in the range 10


Page 8 of 12
URL: http://mc.manuscriptcentral.com/teta
Journal of Experimental & Theoretical Artificial Intelligence
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
F
o
r

P
e
e
r

R
e
v
i
e
w

O
n
l
y
for the N-bit parity problems the best
individuals found with

= 0.05,


= 0.05,

= 0.01 and

= 0.01.



FIGURE 5.10 Performance of Evolutionary ANN for
4-bit parity with initial size of [4 5 4 1 2].



FIGURE 5.10 Performance of Evolutionary ANN for
4-bit parity with initial size of [4 4 5 1 2].

Table 5.2 Performance of ANN shown by
EA for different trails


Performance of Real-Time dataset
classification Problems:
For real time datasets all the data applied to
the training and test sets are acquired from
the UCI Machine Learning Repository [17].
Each input dataset variable should be
preprocessed so that its mean value,
averaged over the entire training set, is close
to zero, or else it is small compared to its
standard deviation
i) Pima-India-Diabetes dataset
problems.
ii) SPECT Heart Decease

Pima-India-Diabetes dataset composed of 8
attributes plus a binary class value to show
the signs of diabetes which corresponds to
the target classification value and includes
768 instances shown in Table 4.8. All the
datasets are divided in to two sets, using 500
instances for the training, 268 for the test.
For Single Proton Emission Computed
Tomography (SPECT) Heart datasets only
13 attributes are used as input parameters to
classify the problem and a total of 267
instances. The target value has stored at 14
th

parameter in the data set. These data sets are
normalized before applying to the network.
All the datasets are divided in to two sets,
using 200 instances for the training and 67
for the test.
The evolutionary process initialized with all
the networks in the architecture space with
an defend architecture size, example of size
[x y z 1 n] i.e. x inputs, y hidden nodes in 1
st

hidden layer, z hidden nodes in 2
nd
hidden
layer, one output layer with one node and n
represents the number of layers After the
evolutionary ANN process the optimized
network consists of only 2 hidden nodes in a
single hidden layer with uni-model sigmoid
activation function and the result of the real
data classification problems are shown in
Figures and Tables

Trail
No.
MSE
([2 3 2 1 2])
MSE
([4 5 4 1 2])
1 9.0084e-003 3.2548e-006
2 2.1219e-026 1.3548e-002
3 2.0416e-014 6.3254e-011
4 1.3406e-003 5.4856e-019
5 2.1219e-026 9.2154e-026
6 9.0084e-003 9.3554e-004
7 2.1219e-026 2.8754e-014
8 2.0416e-014 9.2365e-013
9 1.3406e-003 3.4587e-001
10 3.2323e-022 8.2657e-016
Page 9 of 12
URL: http://mc.manuscriptcentral.com/teta
Journal of Experimental & Theoretical Artificial Intelligence
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
F
o
r

P
e
e
r

R
e
v
i
e
w

O
n
l
y

FIGURE 5.12 Performance of Evolutionary ANN for
Pima India diabetes with initial size of [9 4 5 1 2]

Table 5.4 Results of Pima India Diabetes.


FIGURE 5.13 Performance of Evolutionary ANN for
SPECT Heart dataset with initial size of [14 4 5 1 2].


FIGURE 5.14 Performance of Evolutionary ANN for
Breast Cancer dataset with initial size of [11 4 5 1 2].


Table 5.6 Results of SPECT Heart dataset.

For Pima-India classification the average
mean square error is 8.6214e-3. During the
training the network is adjusted according to
its error, whereas the test process provides
an independent measure of network
performance during and after training. The
best solution found in less than 50
generations with

= 0.1,

= 0.05,

= 0.1 and

= 0.1. With another network


size of [9 5 4 1 2] is also shown in the Table
5.4 with minimum hidden nodes of 3 in a
single hidden layer. The results of the heart
dataset are as shown in Table 5.6 and a
comparisons with literature is shown in
Table 5.7. Ten runs are executed and the
Parameter
Experimental
Results
Number Of Runs 10 10
Number Of Generations 40 61
Number of Training patterns
used
500 500
Average Training Set
Accuracy
76.0 76.5
Number of Test patterns used 268 268
Average Test Set Accuracy 81.5 83.5
Initial Number of Hidden
layers / Nodes
2 / [4 5] 2 / [5 4]
Final Number of Hidden
layers / Nodes (Resulted NN)
1 / [2] 1 / [3]
Population size 50 50
Number of inputs 09 09
Number of outputs 01 01
Parameter
Experimental
Results
Number Of Runs 10 10
Number Of Generations 90 103
Number of Training patterns
used
200 200
Average Training Set
Accuracy
86.0 87.2
Number of Test patterns used 67 67
Average Test Set Accuracy 85.2 86.5
Initial Number of Hidden
layers / Nodes
2 / [4 5] 2 / [5 4]
Final Number of Hidden
layers / Nodes (Resulted NN)
1 / [3] 1 / [3]
Population size 50 50
Number of inputs 14 14
Number of outputs 01 01
Page 10 of 12
URL: http://mc.manuscriptcentral.com/teta
Journal of Experimental & Theoretical Artificial Intelligence
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
F
o
r

P
e
e
r

R
e
v
i
e
w

O
n
l
y
average percentage error values of the
training and test process are summarized in
Table 5.6 and 5 trail runs are shown in Fig
(5.13) with an average mean square error of
7.7264e-3. The best solutions reached in less
than 90 generations with

= 0.1,


= 0.1,

= 0.1 and

= 0.1. With another


network size of [14 5 4 1 2] is also shown in
the table with minimum hidden nodes of 3 in
a single hidden layer. The results of the
Brest Cancer dataset are as shown in Table
5.8 and a comparisons with literature is

Table 5.8 Results of Breast Cancer dataset.
shown in Table 5.9. Ten runs are executed
and the average percentage error values of
the training and test process are summarized
in Table 5.8 and 5 trail runs are shown in

Fig (5.14) with an average mean square error
of 5.3614e-3. The best solution reached in
less than 45 generations with

= 0.05,

= 0.05,

= 0.05 and

= 0.05 in all
runs. With another network size of [11 5 4 1
2] is also shown in the table with minimum
hidden nodes of 3 in a single hidden layer.

CONCLUSION:
The optimal weights in ANN in the phase of
learning have obtained by using the concept
of evolutionary genetic algorithm.
Determination of optimal architecture and
weights in ANN in the phase of learning has
obtained by using the concept of
evolutionary genetic algorithm. Proposed
method of both architecture and weights
adjustment has shown outperform at every
level for 2-Bit and 4-Bit parity compared to
the fixed network Back-Propagation and real
dataset classification problems reached the
excellent percentage of accuracy and
optimized network with less number of
hidden nodes and layers of less probability. .

REFERENCES:
[1] Michalewicz, Z. Genetic algorithms + data
structures = evolution programs. Springer-Verlag.
1996.

[2] Muhleinbeim, H. and Mahnig, T. Evolutionary
computation and beyond. In Y. Uesaka, P.
Kanerva, and H. Asoh, editors, Foundations of
Real-World Intelligence, CSLI Publications, pp.
123-188, 2001.

[3] Mitavskiy B. Crossover Invariant Subsets of the
Search Space for Evolutionary Algorithms.
Evolutionary Computation.
http://www.math.lsa.umich.edu/vbmitavsk/
[4] J. H. Holland, "Adaptation in Natural and
Artificial Systems. Ann Arbor", MI: Univ. of
Michigan Press, 1975.

[5] Coffey, S. An Applied Probabilists Guide to
Genetic Algorithms. A Thesis Submitted to The
University of Dublin for the degree of Master in
Science, 1999.

[6] Mac Lane, S. Categories for the working
mathematician. Graduate Texts in Mathematics
5, Springer-Verlag. 1971.

[7] Poli, R., Stephens, C., Wright, A., Rowe, J. A
Schema-Theory-Based Extension of Geiringers
Theorem for Linear GP and variable-length GAs
under Homologous Crossover, (2002).

[8] Vose, M. Generalizing the notion of a schema in
genetic algorithms Artificial Intelligence, 50(3):
385-396, 1991.

Parameter
Experimental
Results
Number Of Runs 10 10
Number Of Generations 45 52
Number of Training patterns
used
400 400
Average Training Set
Accuracy
97.0 97.0
Number of Test patterns used 240 240
Average Test Set Accuracy 98.5 98.5
Initial Number of Hidden
layers / Nodes
2 / [4 5] 2 / [5 4]
Final Number of Hidden
layers / Nodes (Resulted NN)
1 / [2 ] 1/ [2]
Population size 50 50
Number of inputs 11 11
Number of outputs 01 01
Page 11 of 12
URL: http://mc.manuscriptcentral.com/teta
Journal of Experimental & Theoretical Artificial Intelligence
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
F
o
r

P
e
e
r

R
e
v
i
e
w

O
n
l
y
[9] Radcliffe, N. The algebra of genetic algorithms.
Annals of Mathematics and Artificial
Intelligence, 10:339-384, 1994.
http://users.breathemail.net/njr/papers/amai94.pdf

[10] Geiringer, H. On the probability of linkage in
Mendelian heredity. Annals of Mathematical
Statistics, 15:25-57, 1944.

[11]Vose, M. and Wright, A. The simple genetic
algorithm and the Walsh transform: Part II, the
inverse. Evolutionary Computation, 6(3):275-
289, 1998.
[12]Stephens, C. and Waelbroeck, H. Schemata
evolution and building blocks. Evolutionary
Computation, 7(2):109-124, 1999.

[13] C. Stephens. The Renormalization Group and the
Dynamics of Genetic systems, to be published in
Acta Physica Slovaka, /0210271/ (2002).
http://arXiv.org/abs/condmat/

[14] Wright, A., Rowe, J., Poli, R., and Stephens C. A
fixed point analysis of a gene pool GA with
mutation. Proceedings of the Genetic and
Evolutionary Computation Conference (GECCO)
Morgan Kaufmann. 2002.
http://www.cs.umt.edu/u/wright/.
[15] Jun He, Xin Yao Drift analysis and average
time complexity of evolutionary algorithms;
Artificial Intelligence 127, 5785, 2001.

[16]T. Chen, J. He, G. Sun, G. Chen, X. Yao, A new
approach to analyzing average time complexity
of population-based evolutionary algorithms on
unimodal problems, IEEE Trans. Syst., Man, and
Cybern., Part B 39 (5), 1092_1106, 2009.

[17] D.J. Newman, S. Hettich, C.L. Blake, and
C.J. Merz. UCI repository of machine learning
databases, 1998.

[18] M. Hutter, S. Legg, Fitness uniform
optimization, IEEE Trans. Evol. Comput. 10 (5)
568_589.2006.
[19] Liepins, G. and Vose, M. haracterizing cross-
over in Genetic Algorithms. Annals of
Mathematics and Artificial Intelligence, 5: 27 -
34.(1992).


Page 12 of 12
URL: http://mc.manuscriptcentral.com/teta
Journal of Experimental & Theoretical Artificial Intelligence
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

You might also like