Professional Documents
Culture Documents
CHAPTER 5
GENETIC ALGORITHM BASED ASSOCIATION RULE MINING
5.1
61
Current
Population
Current
Population
Archive
(1)
(1)
(2) (2)
New Solutions
(a)
Archive
Passive Elitism
(1)
(1)
(2)
New Solutions
5.2.1 Elitism in GA
In association rule mining using genetic algorithm there are chances of
the selection of individuals with minimum fitness value (non-opt parent for
reproduction) over chromosomes with better fitness values(opt parent for
reproduction). To prevaricate elitism has been used in the search process of
multiobjective optimization (active elitism).
When elitism is done care should be taken to prevent being trapped by a
premature convergence if a high-elitist pressure is applied to the generation of
new solutions. In association rule mining using Elitist GA, the elitism was set
as 10% of the population size. The active elitism allows the elite (archive)
population for further reproduction. In other words the elite population is fed
62
for the selection process so as to perform the crossover and mutation operation,
if selected. Thus being trapped into local optima by premature convergence is
avoided. The block diagram of the GA with Elitism is given in Figure 5.2
below.
It was demonstrated in section 4.4 that the user defined values of the
supportmin and confidencemin for the fitness function (Eqn.(4.1)) adopted
affect the performance of the system. The actual tradeoff between these two
values is not easy to set by trial and error method (manual tuning).
63
For ARM with Elitism based GA we have defined a new fitness function
to overcome the setback. The fitness function for mining association rules with
GA is defined as.
( )
Where
( )
( ))
( )
(5.1)
( ) is
64
TABLE 5.1
Default Parameters for ARM with Elitist GA
Parameter
Value
Elitism
Crossover Probability
Mutation Probability
Selection Method
Population Size
0.5
0.2
Roulette Wheel Selection
Lenses
: 20
Habermans Survival : 300
Car Evaluation
: 700
Postoperative Patient : 75
Zoo
: 90
Association rules were mined from the specified datasets and the
predictive accuracy achieved over 50 generation is presented in the Table 5.2
given below.
TABLE 5.2
Predictive Accuracy for AR mined with Elitist GA
Lenses
Car
Evaluation
Habermans
Survival
Postoperative
Patient
Zoo
90
82.4
70
70
70.4
87.5
81.6
75
70
70.4
91.6
82.8
91.6
73.5
75
10
90
7.5
75
70.8
75
15
87.5
80
83.3
73
75
20
91.6
77.5
97
68
79
25
87.5
85
92.5
70
82
30
83.3
83.75
83.3
79
78.6
50
90
75
75
75
86
No. Of
Iterations
65
100
90
80
70
60
50
GA
40
30
Elitism
GA
20
10
0
Lenses
Habermans
Car
Survival Evaluation
Po-opert
Care
Zoo
Datasets
66
Habermans
Car
Postoperative
Survival
Evaluation
Patient
Zoo
Laplace
0.505582
0.538462
0.5
0.5024
0.5
Lift
3.263158
1.846154
2.08
5.05
1.279
Infinity
Infinity
Infinity
Infinity
Infinity
0.015661
0.076389
0.006
0.0079
0.007
Conviction
Leverage
The Laplace measure is a confidence estimator and the value away from 1
indicates that the rules generated are of value. The conviction measure being infinity
67
for all datasets indicates that the rules generated are interesting. The range of the
values of Leverage measure ranges between [-0.25, 0.25]. The measure of association
rules mined with Elitist GA are all less than 0.01 indicates that the rules generated are
compact (antecedent and consequent are correlated). Similarly the Lift measure being
close to one signifies that the antecedents and the consequent in rules generated with
Elitist GA are associated.
Number of rules generated by the Elitist GA is plotted against the ARM with
simple GA and plotted in the Figure 5.3.
180
160
No. of Rules
140
120
100
GA
80
60
GA with
Elitism
40
20
0
Lenses
Post
Operative
Patient
Zoo
Haberman's
Car
Survival
Evaluation
Datasets
5.3
68
In the traditional genetic algorithm, the crossover rate and mutation rate
are fixed values which are selected based on experience. Generally we believe
that when the crossover rate is too low, the evolutionary process can easily fall
into local optimum to result in groups of premature convergence due to
population size and the lack of diversity. When the crossover rate is too high,
the process is optimized to the vicinity of optimal point and the individual is
difficult to reach optimal point which can slow the speed of convergence
significantly though groups can ensure the diversity.
The issue of parameter setting has been relevant since the beginning of
the EA research and practice. Eiben et al. 2003; 2007 emphasize that the
parameter values greatly determinate the success of an EA in finding an
69
70
)
)
(5.2)
71
Parameter
Population Size
Value
Varies as per the
dataset
0.9
0.1
Selection Method
Roulette wheel
selection
72
The
simple GA and compared with the results of AGA. This is shown in Figure 5.5.
100
90
80
70
60
50
40
30
20
10
0
GA
SAGA
Lenses
Habermans
Car
Evaluation
Survival
Po-opert
Care
Zoo
Datasets
120
100
80
GA
60
AGA
40
GA with AGA
mutation
20
0
Lenses
Habermans
Survival
Zoo
73
Laplace
Conviction Infinity
Zoo
0.6524
Infinity
Infinity
Infinity
Infinity
Leverage
0.0324
0.0569
0.04356
0.0674
0.0765
Lift
1.89
1.89
1.89
1.89
1.89
The Laplace and Leverage values are away from 1 for all datasets which
indicates that the rules generated are of interest. The conviction value, infinity
also signifies the importance of the rules generated. Similarly the Lift measure
being close to 1 means that the antecedents and the consequents are related,
indicating the importance of the rules generated.
5.4
SUMMARY
Association rule mining was performed with two different methods
74
set based on the mutation rate history and the fitness values of the individuals.
The AGA methodology of ARM generated association rule with enhanced PA
compared to simple GA and Elitist GA. The quantitative measures of the
association rules was found to be within the specified range indicating the
importance of the rules generated.
.