You are on page 1of 4

Genetic Algorithms in Structure Identification for N ARX Models

C. K. S. HOI, 1. G. French2 , C. S. Cox l and 1. Fletcher l ,


I School of Engineering and Advanced Technology, University of Sunderland, UK.
2 EPICC, University of Teesside, UK.

Abstract appropriate mechanism for coding the problem pa-


rameters (in this case the structure of the NARX
Genetic algorithms have been recently applied to model
both linear and non-linear systems. Different methods model) into a usable chromosome. In the following
of coding the problem solutions were proposed and were sections it is intended to provide an outline of the
claimed to have good performance. This paper presents three coding approaches which form the basis ofthe
a comparative study of three of the methods with their comparison study.
strengths and weaknesses highlighted.
2.1 Coding Method 1 : Binary Coding
1 Introduction
The binary coding method [4] is based upon the def-
The problem of identifying a dynamic process from inition of a data vector with fixed structure. For
experimentally derived data has received much at- example, for a single-input single output (SISO)
tention in the technical literature. An inspection of system of second order, with time delay 2 and ex-
this literature, reveals that a primary requirement panded using second order polynomials, the data
when undertaking such an identification exercise is vector would contain a total of 15 terms as shown
the determination of the most appropriate model below:
structure. For linear models the modern approach
to structure selection is by exhaustive search, based x T = [y(t - 1) y(t - 2) u(t - 2) u(t - 3)
on a statistical measure of model quality. In the case y(t - 1)2 y(t - l)y(t - 1) y(t - l)u(t - 2)
of a NARX model (nonlinear auto-regressive with
exogenous inputs), however, the use of an exhaus- y(t - l)u(t - 3) y(t - 2)2 y(t - 2)u(t - 2)
tive search is often infeasible, since the search space y(t - 2)u(t - 3) u(t - 2)2
(typically 90 structures for a single-input single-
output linear model) may be in the order of several u(t - 2)u(t - 3) u(t - 3)2 d.c.]
hundred million structures for the NARX case. (1)
Faced with this problem, several authors [2, 3, 4] The chromosome is thus defined as a binary string
have proposed the use of search schemes based on containing 15 bits. In this string a '1' represents
genetic algorithms. However, despite their common the inclusion of a term within the model and '0'
heritage, the schemes proposed differ significantly represents its absence. Hence the model
from author to author. In this paper, therefore, it
is intended to present a comparison of three such
methods.
2 Coding Methods
may be expressed by the string
A genetic algorithm works with a population of NBexp = [101000000001001] (3)
strings or chromosomes. In general these chromo-
somes are constructed from a coding (typically bi- The objective of the genetic search, therefore, is to
nary, but in essence any base set of integers or char- identify those terms from within the data vector
acters) of the parameters which the genetic algo- needed to capture the salient dynamic features of
rithm should identify. the process; the relative proportion of each term is
It follows therefore, that a key feature of any then determined using a least squares based algo-
successful genetic algorithm is the selection of an rithm.

G. D. Smith et al., Artificial Neural Nets and Genetic Algorithms


Springer-Verlag Wien 1998
598

Table 1: Assignment of integers to data vector terms.


+
lIlclex ~erm
1 y{t -lJ
2 yrt - 2)

14 u(t - 3)~
*
15 d.c
.(t-1)

Figure 1: Tree structure representation of the terms


2.2 Coding Method 2 : Integer Coding contained in Equation (2).
The integer coding method [2] is based upon the as- 3 The Fitness Function
signment of a numerical value, the integer, to each
of the terms in the data vector as indicated in Ta- In conventional system identification approaches, it
ble 1. The chromosome is then constructed as an is usual to choose the model structure based on
integer string where each bit defines a term within a compromise between model accuracy and model
the model. complexity. To help with this choice it is convenient
Thus, the model described by Equation (2) may to make use of one of the standard measures.
be expressed by the string: In this study it is intended to make use of Young's
Information Criterion (YIC) as defined below:
Nexp = [1 3 12 15] (4)

It should be noted, however, that in this implemen-


tation the string length, and hence the number of
YIC = In (0';)
O'~
+ In (0';
n
L ~(i,i))
O( i)2
(6)
terms in the model, is fixed. Once again, the ap-
proach is to search only for the terms which con-
tribute to the model, with the relative proportion
where 0';signifies the error variance and O'~ signifies
the variance of the actual process output, n is the
of each term being determined via a least squares number of parameters in the model, P is the final
based algorithm. covariance matrix and 0 is the vector of parameter
estimates. The YIC statistic is essentially a trade off
2.3 Coding Method 3 : Tree Structured between model fit and the variance of the parameter
Symbolic Coding estimates. The best model will provide the smallest
In a tree structured algorithm [3, 5, 6] the chro- value of YIC. For the genetic algorithm, we define
mosomes, which in general are of variable length,
represent expressions constructed from a base set of Score = 50 - YIC (7)
operators and variables. A typical base set may be:
and the best model will maximise Score.
Variables = [y(t-l) y(t-2) y(t-3) u(t-l) u(t-2) 4 Termination of the Algorithm
u(t - 3) ]
Operators = [ ( ) + * ] Since, the principal aim of the genetic search is to
Using this base set the terms contained in the model identify those terms from within the data vector
expressed in Equation (2) may be represented by the which are needed to capture the process behaviour;
tree in Figure 1 or alternatively as the chromosome then it would seem sensible to terminate the search
once these 'significant' terms are established.
((d.c. + (u(t - 2) + y(t - 1))) Assuming a normal distribution, we can be 90%
(5) confident that a parameter is making a significant
+ (u(t - 2) * u(t - 2))) contribution to a model if its magnitude satisfies the
relationship
Again, as in the previous odings, the genetic algo-
rithm is used solely to establish the terms present
within the model description. 1Oi I> 1.95-/0i - Variance (8)
599

However, establishing the significance of a parame- Table 2: A comparison of the number of generations
ter, or indeed all parameters, within a model is not required for successful identification of the test example.
in itself sufficient for termination of the search algo- op.
rithm. This is because the significance test estab- size
lishes only if a term is significant within the model
in which it is tested and not if it is a parameter
within some form of optimum model.
One solution to the above dilemma is to first es-
tablish the model quality, in terms of the normalised
residual error variance, of a candidate model. The
search can then be terminated if the normalised er- 6 Improved Search Procedure
ror variance is below some predefined threshold and
all terms contained within the model are significant. In Section 4 the parameter significance test was used
This threshold may be simply determined, based on in the formulation of a termination criterion. How-
a measure of the process noise variance. ever, since the primary purpose of this test is to
establish the significance of terms within a given
model, then it may prove advantages if the inform-
ation gathered during such a test could be used to
Threshold = 2(7; Steady State (9) guide the genetic search. To this end it is proposed
(72
y that a local hill climber is added to the genetic
search, based on the information provided via sig-
5 Example: The Basic Algorithms nificance testing. This algorithm is outlined below:
To contrast the performance of the methods we will 1. Evaluate () and confidence bounds for a candi-
consider a SISO process described by the expression date solution
2. Are all parameters significant - (Yes/No)?
y(t) =0.5y(t - 1) + u(t - 1) + 0.8u(t - 3)
(10) 3. Yes: Return fitness and modify solution - End
+ u(t - 1)2 + f(t)
4. No: Remove insignificant terms - Goto II
The input to the process, u(t), is chosen as a multi
level random sequence and the noise term, f(t), is However, since this hill climber is very effective at
chosen to be uniformly distributed with an output- rejecting non-contributing terms, it may lead to
signal to noise ratio of 20:1. Five hundred data pairs rapid convergence of the population to non-optimal
were used during identification. super fit individuals. To guard against this possi-
The results of the three search algorithms are pre- bility, the bias of the population is monitored as
sented in Table 2. In each case the data vector is an indication of the entire population development.
defined assuming a third order system expanded us- Bias in this context is defined as the average con-
ing second order polynomials, i.e. 28 terms. The re- vergence of each gene [1]. Thus for a binary coding
sults presented are based on an average of 20 runs system, the bias will approach 0.5 for a uniformly
with the genetic algorithm. In the integer coding distributed population. Premature convergence can
approach the string length is defined as four and in be identified by a very low jhigh value of the bias.
the symbolic coding the tree is assumed to have a Should this occur the mutation rate is increased for
maximum of ten branch nodes. the affected bit location.
Inspection of the table indicates that there is lit- Table 3 illustrates the effect that the inclusion of
tle to chosen between the binary and symbolic cod- such modifications have on the genetic search, for
ing approaches; the integer coding approach, on the the example shown in section 6.
other hand, would appear to offer a significant im- As can be seen the inclusion of the modifications
provement. This result, however, is misleading since has significantly reduced the number of generations
the integer coding method needs to be given the pre- required by the genetic algorithm. Basing an assess-
cise number of terms within the model prior to the ment of performance on this number alone, however,
commencement of the search, which is not always would be somewhat misleading. The reason for this
possible. is that the inclusion of the hill climber, with its
600

Table 3: A comparison of the number of generations References


required for successful identific~tion of the test example.
op. [IJ J.E. Baker. Adaptive Selection Methods for Genetic
size Algorithms. In J. J. Grefenstette (editor), Proceed-
ings ICGA '85, pages 101-111, Lawrence Erlbaum
Associates, 1985.
[2J C.M. Fonseca, E.M. Mendes, P.J. Fleming, and S.A.
Billings. Non-linear model term selection with ge-
netic algorithms. Technical report, University of
Sheffield, 1993.
obvious iterative nature, significantly increases the
computational burden incurred at each generation. [3J C.K.S. Ho. Tree structured GA in system identifi-
A far better assessment of performance, therefore, cation. Technical report, University of Sunderland,
is to compare the number of model evaluations per- 1995.
formed during each search. Again, inspection of the [4J C.J. Li and Y.C. Jeon. Genetic algorithms in identi-
table indicates that, except for the integer coding, fying nonlinear auto regressive with exogenous in-
substantial improvements have also been made in puts models for nonlinear systems. In Proc. Am
this respect. Moreover, in the case of the integer Control Conf., pages 2305-2309. IEEE Press, 1993.
coding the results presented are for a search in which [5J B. McKay, M.J. Willis, and G.W. Barton. Using a
the string length is set to seven. Thus the inclusion tree structured genetic algorithm to perform sym-
of the suggested modifications has, in this case, en- bolic regression. In GALESIA '95, pages 487-492.
abled the previous restriction, that the exact string lEE, 1995.
length be known prior to the commencement of the [6J M.C. South. The Application of Genetic Algorithms
search, to be relaxed. to Rule Finding in Data Analysis. PhD thesis, Uni-
versity of Newcastle upon Tyne, UK. 1994.
7 Conclusion
[7J P. Young. Recursive Estimation and Time-series
The paper has presented a comparison of three cod- Analysis. Springer-Verlag, New York, 1984.
ing schemes, proposed by various researchers, for
the identification of the optimum structure of a
NARX model, using genetic algorithms.
To form a common basis for this comparison, the
paper has postulated the use of a fitness function
and a termination criterion based on traditional sys-
tem identification quality measures. In this way, the
algorithms will naturally seek solutions which trade
off model accuracy against model quality and, con-
sequently, will tend to yield unbiased models.
The paper has also proposed the use of a local
hill climbing scheme, based on significance testing,
to provide guidance to the genetic search. Results
to date, indicate that the inclusion of such a scheme
considerably improves the efficiency of the search
algorithm.
Finally, for the results presented it would appear
that there is little to choose between the binary and
integer coding approaches (since the problem of the
fixed string length has been overcome using the hill
climber). Further, it would appear that both of
these methods offer a considerable efficiency gain
when compared with the tree structured approach.

You might also like