Professional Documents
Culture Documents
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 89
Abstract— These Grammar Induction (or Grammar Inference or Language Learning) is the process of learning of a grammar
from training data of the positive and negative strings of the language. Genetic algorithms are amongst the techniques which
provide successful result for the grammar induction. The paper discusses an extended approach of using stochastic mutation
scheme based on Adaptive Genetic Algorithm for the induction of the grammar for a set of four different languages and its
comparison with other reproduction operators. The algorithm produces successive generations of individuals, computing their
“fitness value” at each step and selecting the best of them when the termination condition is reached. The paper deals with the
issues in implementation of the algorithm, chromosome representation and evaluation, selection and replacement strategy, and
the genetic operators for crossover and mutation. The model has been implemented, and the results obtained for the set of four
languages with comparison over three crossovers and stochastic mutation scheme along with other three mutation operators
are presented.
—————————— ——————————
1 INTRODUCTION
3 CHROMOSOME STRUCTURE
A sequential structured chromosome [5] is used in the
implementation consists of random sequence of 0s and 1s.
The decoding procedure of the grammar maps the ran-
dom chromosome according to bit sequence based on the
number of terminals available in given sample data.
Mapping from bit representation to symbolic representa-
tion “111”, “1111” is taken as “epsilon” symbol in 3-bit, 4-
bit respectively. If the number of terminals for language Fig. 2. Symbol mapping for L= {(10)n | n0}Grammar mapping from
contains less than 4 terminals then 3-bit representation is the binary chromosome mapping for L= {(10)n | n0}.
used along with 4 variables. e. g. Let the language L is
given by, L= {(10)n | n0}, the symbol mapping for decod- the mother possess different good qualities, we would expect
ing the chromosome for L is as shown in Fig. 1. that all the good qualities will be passed into the child. Thus
the offspring, just by combining all the good features from its
parents, may surpass its ancestors. The mixing of genetic ma-
terial via sexual reproduction is one of the most powerful fea-
tures of Genetic Algorithms. Genetic Algorithms representa-
tion does not differentiate male and female individuals.
Mutation is the other way to get new offspring. Mutation
consists of changing the value of genes. In natural evolution,
n
Fig. 1. Symbol mapping for L= {(10) | n0}. mutation mostly engenders non-viable genomes. Actually
mutation is not a very frequent operator in natural evolution.
Note that, there are two terminals ‘a’, ‘b’ in the lan- Nevertheless, is optimization, a few random changes can be a
guage which are represented by “100” and “101” respec- good way of exploring the search space quickly [2]. The
tively.Chromosome is applied with a Special operator which paper focuses on the experimentation with three crossov-
masks every 5th symbol in the chromosome to ‘V’ by changing er and four mutation operators.
first bit in the equivalent binary representation from ‘1’ to ‘0’
without adding any new production to the grammar derived 4.1 Crossover Operators Used
from the chromosome. It is similar to the expansion operator Three crossover operators are used in the experimenta-
used in [6], [7] which adds two extra productions to the set of tion includes Two Point crossover method, a variant of
rules derived. The resultant productions are then processed the cyclic crossover with internal swapping and Uniform
for left recursion removal followed for left factoring. The sam- crossover method.
ple data is used to evaluate the fitness of the resultant produc-
4.1.2 Two Point Crossover Method (C1)
tions set by checking acceptance of the positive sample and
rejection of the negative sample. The process of grammar In two point crossover method, the parent chromosome is
mapping from the binary chromosome is shown in fig. 2. cut at two random points and the child1 is created by re-
placing the slice between two cut in parent1 with the slice
from Parent2. The same process is to be conducted for the
4 REPRODUCTION OPERATORS USED generation of child2. The example is shown in Fig. 3(a).
Recombination or sexual reproduction is a key operator for
4.1.2 A Variant of the Two Point Crossover with Internal
natural evolution. It takes two chromosomes and it produces
Swapping (C2)
two new chromosomes by mixing the gene found in the orig-
inals. In biology, the most common form of recombination is In this crossover method, the internal swapping is done
crossover, two chromosomes are cut at one point and the with the slices and those slices are used in cyclic manner
halves are spliced to create new chromosomes. The effect of for generating the children. The example is shown in Fig.
recombination is very important because it allows characteris- 3(b).
tics from two different parents to be assorted. If the father and
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 91
TABLE 1
THE LANGUAGES USED.
TABLE 3
THE RANK MATRIX FOR THE LANGUAGE L1.
TABLE 4
THE RANK MATRIX FOR THE LANGUAGE L2.
7 CONCLUSION
The proposed model has been implemented, and the re-
sults analysis for the set of four languages with compari-
son over three crossovers and stochastic mutation scheme
along with other three mutation operators is done. The
Stochastic Mutation method (M3) and Two Point Cros-
sover Method (C1) found to be best in induction of the
TABLE 6
THE RANK MATRIX FOR THE LANGUAGE L4. grammar considered in the experiment whereas the oper-
ator combination C2-M3 is found to the best combination
for the same. MLDP is found to be more effective in the
selection of the corpus. The selection of the good quality
corpus (positive and negative string inputs) has resulted
into induction of good quality grammar for the languages
considered. Results have shown tendency towards the
local optimum convergence which requires special atten-
tion in future work.
TABLE 7
THE CUMULATIVE RANK MATRIX FOR REPRODUCTION
OPERATORS.
TABLE 8 TABLE 10
THE STATISTICAL ANALYSIS FOR THE LANGUAGE L1. THE STATISTICAL ANALYSIS FOR THE LANGUAGE L3.
[3] Wyard, P., “Representational Issues for Context‐Free Grammar
ACKNOWLEDGMENT
Induction Using Genetic Algorithm”, Proceedings of the 2nd
Authors thank to Dr. V. M. Thakre, P. G. Department of International Colloquium on Grammatical Inference and Appli‐
Computer Science, Sant Gadge Baba Amravati University, cations, Vol 862, pp. 222‐235, 1994.
Amravati, Maharastra, for his kind support in providing [4] Introduction to Automata Theory, Languages, and Computa‐
Laboratory infrastructural facility required for the con- tion, 3/E ,John E. Hopcroft, Rajeev Motwani, Jeffrey D. Ullman,
Addison‐Wesley, 2007.
duction of the experiment.
[5] Sequential Structuring Element for CFG Induction Using Ge‐
netic Algorithm, Choubey N. S. and Kharat M. U., International
REFERENCES Journal of Futuristic Computer Application, 1(1):12‐16, 2010.
[1] Guy De Pauw, “Evolutionary Computing as a Tool for Gram‐ [6] Ernesto Rodrigues and Heitor Silvério Lopes, “Genetic Pro‐
mar Development”, CNTS – Language Technology Group, gramming with Incremental Learning for Grammatical Infe‐
UIA – University of Antwerp, Antwerp – Belgium, GECCO rence”, Proceedings of the Sixth International Conference on
2003, LNCS 2723, pp. 549–560, 2003.,Springer‐Verlag Berlin Hybrid Intelligence Systems (HIS), pp. 47‐47, 2006. IEEE.
Heidelberg 2003. [7] Ernesto Rodrigues and Heitor Silvério Lopes, “Genetic Pro‐
[2] Sivanandam, Deepa “Introduction to Genetic Algorithm”, gramming for Induction of Context‐free Grammars”, Seventh
Springer, 2008.
TABLE 11
TABLE 9
THE STATISTICAL ANALYSIS FOR THE LANGUAGE L4.
THE STATISTICAL ANALYSIS FOR THE LANGUAGE L2.
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 95
Nitin S. Choubey, BE(1995), MBA(1997),
ME(2002), Ph.D.[Management](2004) was
educated at Sant Gadge Baba Amravati (SGBA)
University, Maharashtram India. He is pursuing
Ph.D. program in faculty of Computer Science &
Engineering from SGBA University. Presently he
is working at Mukesh Patel School of Technolo-
gy Management and Engineering at SVKM's
NMIMS-deemed-to-be-University, Shirpur Cam-
pus, Shirpur, Dhule, Maharastra, India, as Associate Professor and
Head of the Computer Department. He has presented papers at
National/International conferences and also published papers in
National/International Journals on various issues of Computer Engi-
neering and Management. To his credit, he has published books on
various topics in Computer Science and Management subjects. His
areas of interest include Algorithms, Theoretical Computer Science,
and Computer Networks and Internet.
Fig. 8. Generation Chart for the Languages L1, L2, L3 and L4.