You are on page 1of 28

Association Rule Mining through Adaptive

Parameter Control in Particle Swarm


Optimization
K.Indira
Research Scholar
Department of Computer Science and Engineering
Pondicherry Engineering College
Puducherry, India
919585287662
induharini@gmail.com

S.Kanmani
Professor
Department of Information Technology
Pondicherry Engineering College
Puducherry, India
kanmanis@pec.edu

Abstract: Association rule mining is a data mining task on a great deal of academic research has
been done and many algorithms are proposed. Association rule mining is treated as a twofold
process by most of the methods. It increases the complexity of the system and takes up more
time and space. Evolutionary Computation (EC) are fast growing search based optimization
method for association rule mining. Among ECs Particle Swarm Optimization (PSO) is more suited
for mining association rules. The bottleneck of PSO is setting the precise values for their control
parameters. Setting values to the control parameter is done either through parameter tuning or
parameter control. This paper proposes an adaptive methodology for the control parameters in
PSO namely, acceleration coefficients and inertia weight based on Estimation of Evolution State
(EES) and fitness value respectively. Both of the proposed adaptive methods when tested on five
datasets from University of California Irvine (UCI) repository proved to generate association rules
with better accuracy and rule measures compared to simple GA and PSO.

Keywords: Association Rule Mining, Evolutionary Computation, Particle Swarm


Optimization, Adaptive PSO, Estimation of Evolution State.

1. Introduction
Classification rule discovery and association rules mining are two important data
mining tasks. Association rule mining discovers all those rules from the training
1

set that satisfies minimum support and confidence threshold while classification
rule mining discovers a set of rules for predicting the class of unseen data. Mined
rules can be effective in uncovering unknown relationships, providing results that
can be the basis of forecast and decision [1]. They have proven to be very useful
tools for an enterprise as it strives to improve its competitiveness and profitability.
The application area of association rule mining and Classification rule discovery
varies from market analysis to business intelligence [53], and now extended to
epidemiology, clinical medicine, fluid dynamics, astrophysics, and crime
prevention [54]. Hence the accuracy of the rules mined and the relationship
between attributes has become an important issue. The standard rule mining
methods such as Apriori [2,18], FP growth tree [18,19] scans the whole dataset for
each attribute match, increasing the input/output overhead of the system. The
rules generated are with single objective aiming at accuracy alone and the number
of rules generated is vast. Pruning and summarization are needed to filter the
significant rules [6].
The efficiency of association rule mining can be enhanced by
Reducing the number of passes over the database
Making the process as multiobjective
Sustaining the search space efficiently
Evolutionary Computing (EC) methods as Particle Swarm Optimization (PSO),
provides solution for the above three requirements while mining association rules
[12, 22, 27]. Both methods generate rules with better predictive accuracy and
reduced complexity. However one laborious aspect of all EAs, is setting
appropriate values for the control parameter [41]. Mining association rules for
classification task is proposed in this paper using Adaptive Particle Swarm
Optimization (APSO).
In solving problems with PSO, the properties affect their performance. The
properties of PSO depend on the parameter setting and hence users need to find
the apt value for the parameters to optimize the performance. The interactions
between the parameters have a complex behavior and so each parameter will give
a different effect depending on the value set for others. Without prior knowledge
of the problem, parameter setting is difficult and time consuming. The two major
ways of parameter setting are through parameter tuning and parameter control
[15]. Parameter tuning is the commonly practiced approach that amounts to find
appropriate values for the parameters before running the algorithm. Parameter
control steadily modifies the control parameter values during the run. This could
be achieved either through deterministic or adaptive or self-adaptive techniques
[17].
Deterministic parameter control is made using a deterministic rule that modifies
the strategy parameter without any feedback. This method becomes unreliable for
most problems because the parameter adjustments must rely on the status of the
problem at current time. In self-adaptive approach the parameters to be controlled
are encoded into the candidate solution which may result to deadlock, good
solution depends on finding the good setting of parameters and obtaining the good
setting of parameters depends on finding the good solution. Moreover, extra bits
are required to store these strategy parameters, so the dimensionality of the search
2

space is increased. Thus the corresponding search space becomes larger and hence
the complexity of the problem is increased.
Therefore adaptive method is the solution. There have been several attempts in
adapting the parameters of PSO. The role of inertia weight is considered critical
for the convergence of PSO. An adaptive inertia weight is proposed by Eberhart
and Kennedy [13] and Inertia weight is linearly decreased over the generations in
Arumugam et al. [4]. The acceleration coefficients are adapted in Ratnaweera et
al. [39] dynamically and in Arumugam et al. [4] by balancing the cognitive and
the social components. The acceleration coefficients are adjusted based on
evolution state estimator and the inertia weight is adapted through evolution factor
by Zhan et al. [52] to perform global search over entire search space with faster
convergence. The adaptive methodologies adjust the parameters based on
strategies independent of the data used in the applications.
The literature on adaptive strategies in PSO focuses on the mechanisms of
adaptation over generation either decreasing or increasing the parameter values,
independent of the data involved to enhance the performance. As association rule
mining is mainly based on relationship or correlation between items in a dataset,
data dependent strategy of adaptation could generate better association rules both
qualitatively and quantitatively. The performance of the adaptive strategy for
mining association rules could be enhanced if the parameter adjustments are done
depending on i) the data involved and ii) fitness values used for evolution through
generations. In this paper the parameter control mechanism based on EES is
proposed for adopting the acceleration coefficients in PSO and the inertia weight
is adopted during the evolutionary process based on the fitness values.
The rest of the paper is organized as follows. Section 2 introduces the
preliminaries on association rules and PSO. Section 3 reviews the literature
related to proposed methodology. In section 4 the proposed methodology for
mining association rules with Adaptive Particle Swarm Optimization is given.
Section 5 reports the experiment settings, experimental results and discussions.
Finally section 6 draws the conclusion.

2. Preliminaries
This section briefly discusses about the association rules and their related factors.
The basic features of multiobjective optimization and Particle Swarm
Optimization are also discussed.
2.1 Association Rule

Association rules are a class of important regularities in data. Association rule


mining is commonly stated as follows [1]: Let I = {i1,i2, , in} be a set of n binary
attributes called items. Let D = {t1, t2, , tm} be a set of transactions called the
database. Each transaction in D has a unique transaction ID and contains a subset
of the items in I. A rule is defined as an implication of the form x y where
x, y I and x y = . The set of itemsets x and y are called antecedent (lefthand-side or LHS) and consequent (right-hand-side or RHS) of the rule. Often
rules are restricted to only a single item in the consequent. The same is adopted in
this paper also.
3

There are two basic measures for association rules, support and confidence.
Support of an association rule is defined as the percentage/fraction of records that
contain x y to the total number of records in the database. Support is calculated
using the following equation
( )

(1)

Confidence of an association rule is defined as the percentage/fraction of the


number of transactions that contain to the total number of records that
contain where if the percentage exceeds the threshold of confidence an
interesting association rule can be generated.
(

( )

(2)

( )

( ) is
Confidence is a measure of strength of the association rule.
the support of both antecedent and consequent occurring together in the dataset.
2.2 Multiobjective Optimization
A general minimization problem of M objectives can be mathematically stated as:
Given = [x1, x2, . . . , xd], where d is the dimension of the decision variable
space,
Maximize : ( )

[ ( )

], subject to :

gj( ) 0, j= 1, 2, . . . , J, and
hk( ) = 0, k = 1, 2, . . .,K, where ( ) is the ith objective function, M is the
number of objectives, gj( ) is the jth inequality constraint, and hk( ) is the kth
equality constraint.
A solution is said to dominate another solution if it is not worse than that solution
in all the objectives and is strictly better than that in at least one objective. The
solutions over the entire solution space which are not dominated by any other
solution are called Pareto-optimal solutions.
Association rules generated are usually large in numbers and have redundant
information. Much work on association rule mining has been done focusing on
efficiency, effectiveness and redundancy. Focus is also needed on the quality of
rules mined. In this paper association rule mining using GA and PSO is treated as
a multiobjective problem where rules are evaluated quantitatively and
qualitatively based on the following five measures.

i.

Predictive Accuracy
Laplace
Conviction
Leverage
Lift
Predictive Accuracy (also Confidence) measures the effectiveness of the
rules mined. The mined rules must have high predictive accuracy.
4

( )

(3)

( )

where x is the antecedent and y the consequent.


ii.

Laplace is a confidence estimator that takes support into account,


becoming more pessimistic as the support of decreases [11,20]. It ranges
between [0, 1] and is defined as
(

iii.

( )

(5)

Leverage also known as Piatetski-Shapiro measure [37], measures how


much more counting is obtained by the rule from the co-occurrence of the
antecedent and consequent from the union. It ranges between [0.25, 0.25]
and is defined as
(

v.

(4)

Conviction is sensitive to rule direction and attempts to measure the


degree of implication of a rule [7]. It ranges between [0.5, ]. Values far
from 1 indicate interesting rules.
(

iv.

( )
( )

( )

( )

( ) (6)

Lift (also called interest by Brin et al. [7]) measures how far from
independence are antecedent and the consequent. It ranges within [0,+].
A lift value of 1 indicates that the items are co-occurring in the database as
expected under independence. Values greater than one indicate that the
items are associated.
(

(7)

( )

The objective function or fitness function is derived to meet the above objectives
based on the confidence and support values of the mined rules. Maximization of
the fitness function is the goal.
( )

( )

( )

( )

(8)

2.3 Particle Swarm Optimization


Swarm Intelligence (SI) [25] is an innovative distributed intelligent paradigm for
solving optimization problems that originally took its inspiration from the
biological examples by swarming, flocking and herding phenomena in
vertebrates. Particle Swarm Optimization (PSO) [24] incorporates swarming
behaviors observed in flocks of birds, schools of fish, or swarms of bees, and even
human social behavior.
PSO is initialized with a group of random particles (solutions) and then searches
for optima by updating generations. During all iterations, each particle is updated
by following the two best values. The first one is the best solution (fitness) it
has achieved so far. This value is called pBest. The other best value that is
tracked by the particle swarm optimizer is the best value obtained so far by any
particle in the population. This best value is a global best and is called gBest.
5

After finding the two best values; each particle updates its corresponding velocity
and position.
Each particle p, at some iteration t, has a position x(t), and a displacement velocity
v(t). The personal best (pBest) and global best (gBest) positions are stored in the
associated memory. The velocity and position are updated using equations 9 and
10 respectively.
()(

()(

(9)
(10)

Where
is the inertia weight
is particle velocity of the ith particle before updation
is particle velocity of the ith particle after updation
is the ith, or current particle
i
is the particles number
d
is the dimension of search space
rand ( ) is a random number in the range (0, 1)
is the individual factor
is the societal factor
is the particle best
is the global best
Particles velocities on each dimension are clamped to a maximum velocity V max.
If the sum of accelerations causes the velocity on that dimension to exceed V max,
which is a parameter specified by the user, then the velocity on that dimension is
limited to Vmax. This method is called Vmax method [49].
Inertia weight controls the impact of the velocity history into the new velocity and
balances global and local searches. Suitable fine-tuning of cognitive parameter
and social parameter c1 and c2 result in faster convergence of the algorithm and
alleviate the risk of settling in one of the local minima. The Pseudo code for PSO
is given below.
For each particle
{
Initialize particle
}
END
Do
{
For each particle
{
Calculate fitness value
If the fitness value is better than the best fitness value (pBest) in
history
set current value as the new pBest

}
End
Choose the particle with the best fitness value of all the particles as the
gBest

For each particle


{
Calculate particle velocity according to equation (9)
Update particle position according to equation (10)
}
End
}
While maximum iterations or minimum error criteria is not attained

3. Literature review
The issue of parameter setting has been relevant since the beginning of the EA
research and practice. Eiben et al. [14,18] emphasize that the parameter values
greatly determinate the success of EA in finding an optimal or near-optimal
solution to a given problem. Choosing appropriate parameter values is a
demanding task. There are many methods to control the parameter setting during
an EA run. Parameters are not independent but trying all combinations is
impossible and the found parameter values are appropriate only for the tested
problem instances. Nannen et al. [33] state that major problem of parameter
tuning is weak understanding of the effects of EA parameters on the algorithm
performance.

In recent past, many approaches have been applied to investigate how to adapt
suitable parameters. Adaptive parameter control strategies can be developed based
on the identified evolutionary state and by making use of existing research results
on inertia weight [10, 41, 42] and acceleration coefficients [38, 39, 45, 47]. The
impact of inertia weight and maximum velocity in PSO was analyzed in [40].
Different combinations of inertia weight and max velocity resulted in different
performance on the Schaffer function and the time-varying inertia weight lead to
better performance. Parsopoulos et al. [36] focused on the adaptation of the most
important parameter of unified particle swarm optimization, which was previously
proposed in [35]. According to the results of the experiments, the self-adaptive
scheme proposed in [36] can well balance the capabilities of exploration and
exploitation. Yau-Tarng et al. [48] proposed an adaptive fuzzy particle swarm
optimization algorithm utilizing fuzzy set theory to adjust PSO acceleration
coefficients adaptively, and was thereby able to improve the accuracy and
efficiency of searches.
Adaptive parameter control system [43] using entire search history automatically
adjusts the parameters of an evolutionary algorithm according to the entire search
history, in a parameter-less manner. For GA, the methodology automatically
controlled the crossover operator, crossover values and mutation operator and for
PSO, the two learning factors and the inertia weight was also adjusted
7

automatically. A Novel Self Adaptive Inertia Weight Particle Swarm


Optimization approach to determine the optimal generation schedule of cascaded
hydroelectric system was proposed in [3]. To overcome the premature
convergence of particle swarm optimization the evolution direction of each
particle is redirected dynamically by adjusting the two sensitive parameters i.e.
acceleration coefficients of PSO in the evolution process [49]. Several strategies
[8,23] have been proposed to tune the contribution of various sub-algorithms
adaptively according to their previous performance, and have shown pretty good
effect.
Zhan et al. [59] propose an adaptive particle swarm optimization (APSO)
consisting of two main steps, evaluating the population distribution and particle
fitness in first step followed by elitist learning strategy when the evolutionary
state is classified as convergence state to improve the search efficiency and
convergence speed. Self learning based PSO (SLPSO) [50] simultaneously
adopted four PSO based search strategies and then generated better quality from
past generation based on self adaptive method.
From the methodologies in literature it is seen that significant improvement in the
performance of PSO can be achieved by setting appropriate values to the control
parameters. In specific adaptive method of parameter control has received wider
coverage. But so far the adaption was mostly based on parameter analysis, which
is independent of the application being executed. Though this approach enhances
the performance, association rule mining is strongly based on items in dataset and
the relationship among them. Hence they are expected to perform well when the
adaptation is made data dependent. This paper proposes methodology for
association rule mining based on PSO where the parameters are set depending on
the dataset and fitness value generated.

4. Methodology
Any heuristic search is characterized by two basic behaviors, exploration and
exploitation. These concepts are often mutually conflicting in the sense that if
exploitation of a search is increased, then exploration decreases and vice versa.
The manipulation of control parameters in PSO is based on the application and
hence data, balances the above problem.
The roles of the parameters are significant in the performance of the
methodologies. This paper proposes adaptive parameter control for the parameters
of Particle Swarm Optimization. The parameters involved in PSO are given in
Table 1.
Table 1. Parameters of PSO

Parameter
Inertia weight ( )
Acceleration
Coefficient c1
Acceleration
Coefficient c2

Parameter Role
Controls the impact of the velocity
history into the new velocity
Maintains the diversity of swarm
Convergence towards the global
optima
8

4.1 Adaptation in PSO


In the past few years, many approaches have been applied to investigate how to
adapt suitable parameters. In some early works, Shi et al. analyzed the impact of
the inertia weight and maximum velocity in PSO [41]. They revealed that
different combinations of inertia weights and maximum velocities may result in
different performance on the Schaffer function. Based on the empirical analysis, it
was concluded that a time-varying inertia weight can lead to better performance.
In [42], Shi et al. proposed a PSO variant, which implements a fuzzy rule to
control the parameters. Following the same line of thinking, Parsopoulos et al.
focused on the adaptation of the most important parameter of unified particle
swarm optimization (UPSO) [35], which was previously proposed in [34].
According to the results of the experiments, the self-adaptive scheme proposed in
[35] can well balance the capabilities of exploration and exploitation.
Due to the fact that most of the PSO research works related to adaptive schemes is
empirical, Bergh et al. investigate the influence of parameters, such as the inertia
term, from the perspective of particle trajectories for general swarms [5]. All the
parameter adaptation has been experimentally proven to be able to remarkably
improve the performance, when applied for association rule mining the adaptation
based on data involved might enhance the system further than data independent
adaption. Thus the adaptive Particle Swarm optimization based on the dataset
values is proposed
4.2 Adaptive Particle Swarm Optimization (APSO)
A swarm [16] consists of a set of particles {x1,,xm}of size m, moving within the
search space, S Rd, each representing a potential solution of the problem as
Find

( ) subject to appropriate constraints;

Where is the particle, S is the search space, d the dimension of the search space,
Rd is the set of potential solution and F is the fitness function associated with the
problem, which we consider to be a minimization problem without loss of
generality.
PSO is mainly conducted by three key parameters important for the speed,
convergence and efficiency of the algorithm [53]: the inertia weight ( ) and two
positive acceleration coefficients(c1 and c2). Inertia weight controls the impact of
the velocity history into the new velocity. Acceleration parameters are typically
two positive constants, called the cognitive parameter c1 and social parameters c2.
Inertia Weight plays a key role in the process of providing balance between
exploration and exploitation process. The linearly decreasing inertia weight over
time enhances the efficiency and performance of PSO [46]. Suitable fine-tuning of
cognitive and social parameters c1 and c2 may result in faster convergence of the
algorithm and alleviate the risk of settling in one of the local minima. To adapt the
parameters of PSO in the proposed adaptive PSO methodology for association
rule mining the parameter tuning is performed based on the evolutionary state to
which the data fit. The acceleration coefficients are adapted based on the dataset
from which association rules are mined. The inertia weight is adjusted based on
9

the fitness value of the individuals. The pseudo code for the Adaptive PSO is
given below
/* Ns: size of the swarm, C: maximum number of iterations, Of : the final output*/
i. t = 0, randomly initialize S0,
Initialize xi, i, i {1, . . .,Ns} /* xi : the ith particle */
Initialize vi, i, i {1, . . .,Ns} /* vi : the velocity of the ith particle*/
Initialize 1, c11, c21
/* : Inertia weight, c1, c2 :
Acceleration
Coefficients */
Pbi xi, i, i {1, . . .,Ns}
/* Pbi : the personal best of the ith
particle */
Gb xi
/* Gb : the global best particle */
ii. for t = 1 to C,
for i = 1 to Ns
F(xi) = confidence(xi )

/* C : total no of iterations */
log (support (xi)

(length(x) + 1)

/* F(xi) : Fitness of xi */
If ( F(xi) < F(Pbi))
Pbi xi

/* Update particle best */

Gb min(Pb1, Pb2, , PbNs) /* Update global best */


adjust parameters(t, c1t, c2t)
/* Adaptive Adjustment */
vi(t) = tvi (t-1)+ c1tr1(Pbi xi) + c2t r2(Gb xi )
/* Velocity Updation
r1, r2 : random values*/
xi (t)= xi(t-1) + vi(t)
/* Position Updation */

At non dominated(St At)

iii. Of At and stop

/* Updating the Archive*/


/* Of : Output*/

adjust parameters (t, c1t, c2t) in the above pseudo code is achieved through
adaptive mechanism newly proposed. The proposed approach distributes the
evolution of data into four states based on estimation of evolutionary states:
Convergence, Exploration, Exploitation and Jumping out.
4.2. Estimation of Evolutionary State (EES)
During the PSO process, the population distribution characteristics vary not only
with the generation number but also with the evolutionary state. For example, at
an early stage, the particles may be scattered in various areas, and, hence, the
population distribution is dispersive. As the evolutionary process goes on,
however, particles would cluster together and converge to a locally or globally
optimal area. Hence, the population distribution information would be different
from that in the early stage. In order to detect the different population distribution
information and use this information to estimate the evolutionary state is carried
out.

10

Based on the search behaviors and the population distribution characteristics of


the PSO Estimation of Evolution State is done as follows;
a. The distance between particles is calculated using the Euclidean distance
measure for each particle i using the following equation

(11)

where N is the population size, xi and xj are the ith and jth particle in the
population respectively.
b. Calculate the evolutionary state estimator e, defined as
(12)
Where
is the distance measure of the gBest particle,
,
are
the maximum and minimum distance measures respectively from step a.
c. Record the evolutionary e factor for 100 generations for each dataset
individually.
d. Classify the estimator e, to the state it belongs: Exploration, Exploitation,
Convergence, Jumping out for the datasets based on the evolutionary
states.
The evolutionary state estimator e over hundred generations is plotted for the zoo
dataset as shown in Figure 1. The evolution states are estimated based on the e
values. The state transition is different for the five datasets and are non
deterministic and fuzzy. Based on the evolution state estimator values the
classification is done for determining the intervals at which class transition occurs.

Evaluationestimator (e)

0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
1

11

16

21

26

31

36

41

46

Iteration No.
Figure 1. Evolutionary state information robustly revealed by e at run time for Zoo dataset

The intervals arrived for the five datasets based on EES is as shown in Table 2.
Table 2. Classification into States by Evolutionary Factor

Datasets

Lenses

Car
Habermans Postoperative
Evaluation Survival
Patient

Zoo
11

State
Convergence
Exploitation
Exploration
Jumping out

0.0 0.3
0.1- 0.4
0.2- 0.7
0.6- 1

0.0- 0.15
0.15- 0.25
0.1- 0.3
0.3- 1

0.0 - 0.4
0.3-0.7
0.6- 0.9
0.8- 1

0.0 - 0.5
0.2- 0.6
0.4-0.8
0.7-1

0.0- 0.15
0.1- 0.35
0.2- 0.4
0.3-1

The change of state reflected as per the PSO sequence is Convergence


Exploitation
Jumping Out Convergence.
The formulation for numerical implementation of the classification for the
Habermans Survival dataset is as follows.
Case (a) Exploration: A medium to large value of e represents exploration,
whose membership function is defined as

( )

(13a)
{

Case (b)Exploitation: A shrunk value of e represents exploitation whose


membership function is defined as
( )

(13b)
{

Case (c)Convergence: A minimal value of e represents convergence, whose


membership function is defined as
( )

(13c)

Case (d)Jumping Out: When PSO is jumping out of a local optimum, the
globally best particle is distinctively away from the swarming cluster. Hence, the
largest values of f reflect S4, whose membership function is, thus, defined as
( )

(13d)

Therefore, at a transitional period, two memberships will be activated, and e can


be classified to either state. The singleton method of defuzzification is adopted.
The distribution of all particles in PSO based on Euclidean distance calculation is
illustrated in Figure 2.

12

, Exploration

, Exploitation, Convergence
,

, Jumping Out

- Best particle of the swarm

Figure 2. Population Distribution in PSO based on Evolutionary factor e

Based on the estimation into states using evolutionary estimation factor e and
fitness values the adaptation of acceleration coefficients and inertia weight is
performed.
4.3 Adaptive Control of Acceleration Coefficients
The acceleration coefficients are made adaptive through the classification of
evolutionary states. Parameter c1 represents the self-cognition that pulls the
particle to its own historical best position, helping in exploration of local niches
and maintaining the diversity of the swarm. Parameter c2 represents the social
influence that pushes the swarm to converge to the current globally best region,
helping with fast convergence. Both c1 and c2 are initially set to 2 [26, 31,44]
based on the literature works. The acceleration coefficients are adaptively altered
during evolution according to the evolutionary state, with strategies developed.
The strategy to adopt for the four states is as given in Table 3. The values for
and are empirically arrived for the datasets taken up for study.
Table 3. Control Strategies for c1 and c2

State/Acceleration
Coefficient
Exploration

c1

c2

Increase by

Decrease by

Exploitation

Increase by

Decrease by

Convergence

Increase by

Increase by

Jumping out

Decrease by

Increase by

The increase or decrease in the values of c1 and c2 arrived in Table 3 is discussed


below.
Exploration: During exploration particles should be allowed to explore as many
optimal regions as possible. This avoids crowding over single optima, probably
the local optima and explores the target thoroughly. Increase in the value of c1 and
decrease in c2 facilitate this process.
Exploitation: In this state based on the historical best positions of each particle
they group towards those points. The local information of the particle aids this
process. A slight increase in c1 advances the search around particle best (pBest)
positions. At the same time the slight decrease in c2 avoids the deception of local
optima as the final global position has yet to be explored.
13

Convergence: In this state the swarm identifies the global optima. All the other
particles in the swarm should be lead towards the global optima region. The slight
increase in the value of c2 helps this process. To fasten up the process of
convergence a slight increase in the value of c1 is adopted.
Jumping Out: The global best (gBest) particle moves away from the local optima
towards global optima, taking it away from the crowding cluster. Once any
particle in the swarm reaches this region then all particles are to follow the same
pattern rapidly. A large c2 along with a relatively small c1 value helps to obtain
this goal.
4.4 Acceleration Coefficient Bounds
The adjustments on the acceleration coefficients should be minimum so as to
maintain the balance between values c1 and c2. Hence, the maximum increase or
decrease between two generations is bounded by the range of [0.02, 0.1]. The
value of is set as 0.02 based on trials and the value for is set as 0.06 similarly.
c1 and c2 values are clamped in the interval [1.5, 2.5] in order to maintain the
diversity of the swarm and fast convergence [9]. The sum of the acceleration
coefficients is limited to 4.0[51], when the sum exceeds this limit then both c1 and
c2 are normalized based on the equation 15.
(14)

Coefficient Value

The adaptation behavior of the acceleration coefficients c1 and c2 through change


of states in estimation of evolution state for Zoo dataset is shown in Figure 3.
2.4
2.3
2.2
2.1
2
1.9
1.8
1.7
1.6
1.5

C1
C2

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61
Generation Number

Figure 3. Adaptation of Acceleration Coefficients through EES for Zoo Dataset

4.5 Inertia Weight Adaptation


The inertia weight controls the impact of previous flying experience, which is
utilized to keep the balance between exploration and exploitation. The particle
adjusts its trajectory according to its best experience and enjoys the information of
its neighbors. In addition, the inertia weight is also an important convergence
factor; the smaller the inertia weight, the faster the convergence of PSO. A linear
decrease in inertia weight gradually may swerve the particle from their global
14

optima [13]. Hence a nonlinear adaptation of inertia weight as proposed in the


given equation 14 is the solution. The global best particle is derived based on the
fitness value of the particles in the swarm. The proposed methodology for
adopting the inertia weight is based on the fitness values exhibited by the
particles.
(

)(

(15)
(

1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1
7
13
19
25
31
37
43
49
55
61
67
73
79
85
91
97

Inertia Weight

where f, fmin and favg are the fitness values, which are decided by the current
particles in the swarm. The minimum fitness value of all particles in swarm is
defined as fmin, while the mean value is selected as favg. As a result, all the
information searched, whatever is now or past, is utilized to update the future
velocity. It is also a way to deal with the problem of falling into the local
optimum. The adaptation of inertia weight for zoo dataset over generations is
given in Figure 4.

Iterations

Figure 4. Inertia Weight Adaptation for Zoo dataset

From Fig. 2, it can be concluded that, in general, the change in inertia weight
decreases linearly. In detail, it is adapted based on the fitness information
dynamically. The value of inertia weight swings around 0.461 frequently at later
generations. In this area, the algorithm can balance the searching and convergence
ability in the search space to avoid the risk of more likely to be trapped into local
minima, and improve the diversity of PSO.
5. Experimental Results and Discussions

The objective of the adaptive PSO is to enhance the performance of association


rule mining by making the adaption dependent on dataset used. To validate the
performance of the proposed methodology, Nine datasets from University of
California Irvine (UCI) repository [32]: Lenses, Habermans Survival, Car
Evaluation, Postoperative Patient, Zoo, Iris, Nursery, Tic Tac Toe and Wisconsin
breast cancer have been used for the study.
15

The experiments were developed using Java and run in windows environment. All
the results were obtained using a Pentium IV PC with clock rate of 2.4 GHz and
512 MB of main memory. Each methodology was run 10 times on the dataset
chosen. The number of generations was fixed to 100. The datasets considered for
the experiments is listed in Table 4.
Table 4. Datasets Description

Dataset

Attributes

Instances

Lenses
Car Evaluation
Habermans Survival
Post-operative Patient Care
Zoo

4
6
3
8
16

24
1728
310
87
101

Iris
Nursery
Tic Tac Toe
Wisconsin Breast Cancer

4
8
9
32

150
12960
958
569

Attribute
characteristics
Categorical
Categorical, Integer
Integer
Categorical, Integer
Categorical, Binary,
Integer
Real
Categorical
Categorical
Real

The first five datasets ( Lenses, Car Evaluation , Habermans Survival, Post
Operative Patient and Zoo) were used for analyzing the performance of APSO in
comparison with Simple PSO. The parameter setting for the APSO methodology
is given in the Table 5.
Table 5. Parameter Setting for APSO

Parameter Name
Population Size

Number of Generations
c1 , c2

Parameter Value
Lenses
Habermans Survival
Car Evaluation
Postoperative Care
Zoo
100
2
0.9
0.4

: 20
: 300
: 1000
: 50
: 100

The scope of this study on association rule mining using adaptive PSO is to

Mine association rules using adaptive PSO with multiobjectives namely


accuracy and significance of the rules generated
Compare the performance with simple PSO (non adaptive methodologies)
Analyze the performance over generations
Convergence analysis of APSO in terms of execution time
Compare the performance with similar EC methodology namely Ant
Colony Optimization

16

The APSO methodology is applied on the five datasets to mine association rules
and the predictive accuracy is recorded. The results are compared with the simple
PSO performance [20] implemented on the same platform with same datasets.
The APSO methodology for mining association rules adapts the acceleration
coefficients based on the Estimation of evolutionary state. The estimate factor (e)
determines one among the four states: Exploration, Convergence, Exploitation
and Jumping out, to which the particle belongs and adjusts the acceleration
coefficients accordingly. Based on the fitness values the inertia weight is adapted.
Then the velocity modification is based on the state in which the particle lies. This
balances between exploration and exploitation thus escaping from premature
convergence. The predictive accuracy of the association rules mined is improved
through the evolution process.
The predictive accuracy for the adaptive PSO methodology over 100 generations
is shown in Figure 5.
Predictive Accuracy (%)

100
99
98

Car

97

Haberman

96

Lens

95

Postop

94

Zoo

93
10

20

30

40

50

60

70

80

90 100

Iteration No.
Figure 5. Predictive Accuracy of Adaptive PSO over hundred Generations

The adaptive PSOs performance on Car Evaluation dataset and Zoo data set is
consistent as seen from the figure above. The performance on Habermans
survival dataset and Post operative patient dataset is maximum at the end of
evolution. This avoids premature convergence at initial stages. For the Lenses
dataset where the dataset size is small, global search space is maintained
effectively making the convergence possible even at earlier iterations.
The Laplace, Conviction, Leverage and Lift measures of the five datasets for the
adaptive PSO is recorded as in Table 6.
Table 6. Multiobjective Measures of APSO

Lenses
Laplace
Conviction
Leverage
Lift

0.52941
Infinity
0.026
1.548

Haberman's
Survival
0.501608
Infinity
0.003039
2.0458

Car
Evaluation
0.502488
Infinity
0.002394
1.9875

Postoperative
Patient
0.5028
Infinity
0.0301
2.2478

Zoo
0.5024
Infinity
0.0249
2.4558
17

When the Laplace measure is away from 1, indicates that the antecedent values
are dependent on the consequent values and hence the rules generated are of
significant. The conviction measure which is infinity for all datasets show that the
rules generated is interesting. The Leverage measure being far away from 1 again
insists on the interestingness of the rules generated. The measure of Lift for all the
datasets, greater than 1 signifies the dependency of the consequent on the
antecedent. The rules generated by Adaptive PSO are of importance based on the
Lift measure.
The predictive accuracy achieved from the adaptive PSO methodology is
compared with the simple PSO as in Figure 6. The Adaptive PSO performs better
than the simple PSO for all the five datasets when applied for association rule
mining.
Predictive Accuracy (%)

100
95
90
85
80
75

PSO

70

APSO

65
Lenses

Po-opert
Care

Habermans
Survival

Zoo

Car
Evaluation

Dataset

Figure 6. Predictive Accuracy comparison of Adaptive PSO with PSO

The number of rules generated by APSO with predictive accuracy less than 0.05
percentage of the maximum predictive accuracy arrived is plotted for mining
association rules against the simple PSO methodology for the datasets in Figure 7.
In terms of number of rules generated adaptive PSO performs better for
Hagermans Survival, Zoo and car evaluation datasets. The number of rules
generated for lens and postoperative patient dataset is marginally better than
simple PSO. As the number of instances in both datasets is small, the increase in
the number of rules is minimum.

18

No. of Rules Generated

40
35
30
25
20
15
10
5
0

PSO
APSO

Lens

Postop

Haberman

Zoo

Car

Datasets

Figure 7. Comparison of No. of Rules Generated by APSO and PSO

The execution time of APSO for mining association rules from the datasets in
comparison with PSO methodology is given in Figure 8. This is the time taken by
the system to generate association rules (CPU time). The number of iteration
when the predictive accuracy is at maximum is taken as point at which the
execution time is recorded. The execution time of the APSO for mining
association rules is more when compared to PSO. The adaption mechanism of
acceleration coefficients based on evolution factor, and inertia weight adaptation
increases the complexity of the algorithm resulting in higher execution time when
compared to PSO.
9634

Execution Time (ms)

10000
9000
8000
7000
6000
5000
4000
3000
2000
1000
0

6532

6123
4586

PSO

3425

APSO

2552
1841
1243
84 156

Lens

Postop

Haberman
Datasets

Zoo

Car

Figure 8. Comparison of Execution Time of APSO and PSO

When number of instances in the dataset is less then increase in execution time is
marginal. Thus the execution time difference for mining rules from the Lenses,
Postoperative Patient and Zoo datasets are minimum for APSO than PSO. For the
Habermans survival dataset with moderate number of instances there is
noticeable increase in execution time, whereas for the car Evaluation dataset the
difference in execution time is large. The consistent performance of the APSO in
term of predictive accuracy throughout evolution and the predictive accuracy
achieved from APSO for mining association rules balances for the execution time
difference.

19

Iteration Number

The major drawback of the simple PSO is its premature convergence where the
particle fixes some local optima as the target (global search area), where all
particles converge locally. One of the objectives of Adaptive PSO is to avoid the
premature convergence, thereby balancing between exploration and exploitation.
The iteration number at which predictive accuracy is high is plotted for both the
APSO and PSO methodologies for mining association rules for the datasets used
in Figure 9.
120
100
80
60
40
20
0

PSO
APSO

Lenses

Zoo

Car
Habermans
Evaluation
Survival

Postop

Datasets

Figure 9. Comparison of Convergence of APSO and PSO

The analysis of the performance of the five datasets through the adaptive methods
of PSO has been compared with simple PSO. The Adaptive PSO generated
association rules with better predictive accuracy and qualitative measures.

The performance of APSO (data dependent methodology) is compared with two data
independent adaptation methodologies SAPSO1 and SAPSO2 [21] as given in Figure
10 and it is noted that the data dependent methodology of mining association rules
performs better than data independent methodologies in terms of predictive accuracy.
100

Predictive Accuracy %

99.5

99
SAPSO2
SAPSO1

98.5

APSO

98

97.5
Lenses

Zoo

Habermans
Survival

Po-opert Care Car Evaluation

Datasets

Figure 10 Comparison of Convergence of APSO and SAPSO

20

To test the significance of the proposed adaptive techniques on PSO, the APSO is
compared with Ant Colony Optimization (ACO) methods for mining rules. Four
more datasets from UCI repository for which association rule mining using ACO
has been carried out in literature have been chosen. The predictive accuracy of the
mined rules using APSO for these datasets was recorded. The comparative results
obtained by APSO and ACO are shown in Table 7.
Table 7. Predictive Accuracy Comparison of AGA and APSO

Dataset
Lenses
Habermans
Survival
Car Evaluation
Postoperative Care
Zoo
Iris
Nursery
Tic Tac Toe
Wisconsin Breast
Cancer

98.1

TacoMiner
zbakr
et al [29]
-

Ant
Colony
Parpinelli
et al [34]
-

99.6

99.95
99.54
99.72
98.86
99.12
98.76

98
97.24
-

75.57

76.58

98.67
97.16
97.59

98.75

96.97

94.32

Adaptive
PSO

Ant
Ant
Colony
Colony
Liu et zbakr et
al [30]
al [28]
-

The adaptive PSO methodology generates association rules with better predictive
accuracy when compared to other methodologies analyzed.
The adaptive particle Swarm Optimization methodology when applied for mining
association rules generates rules with increase in predictive accuracy, increase in
the number of rules generated along with better rule set measures. The increase in
the execution time of APSO when compared to simple PSO methodology is also
minimum taking into account the complexity of the methodology and the increase
in accuracy achieved over PSO.
The balance in setting the global search space between exploration and
exploitation is also attained while mining association rules with APSO for all the
datasets used. This is noted from the shift in iteration number of APSO when
compared to simple PSO where highest predictive accuracy is noted. Adopting the
acceleration coefficients c1 (cognitive factor) and c2 (social factor) through EES
methodology makes the adjustments in fixing up the global search space
effectively with the help of local optimas. The fitness value of the particles is
used in adapting inertia weight. This helps in better balance between exploration
and exploitation when particles fly through the search space.
6.

Conclusion

Particle Swarm Optimization is fast and efficient search methods for optimization
problems. Simple PSO performance is problem dependent and setting of the
control parameters is a trivial task. This paper proposes adaptive methods of
parameter control for association rule mining using PSO. The fitness value plays a
21

vital role in passing data over generations and adaptation mainly depends on the
fitness values of the particles. In Adaptive PSO the acceleration coefficients were
adapted through estimation of evolutionary state. In each generation, particles
were set into one of the four evolutionary states and based on the states the
acceleration coefficients were altered. The inertia weight parameter adaptation
relied on the fitness values and maximum and minimum values set for inertia
weight.
When tested on five datasets from UCI repository APSO generated association
rules with better predictive accuracy and good rule set measures. The execution
time for mining association rules is found to be optimum taking into account the
accuracy achieved.
References
1. Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in
large databases. In: Buneman P, Jajodia S ( eds) Proceedings of the 1993 ACM SIGMOD
International Conference on Management of Data. New York: ACM Press, pp 207216
2. Agrawal R, Srikant R (1994) Fast Algorithms for Mining Association Rules in Large
Databases. In: Proceedings of the 20th International Conference on Very Large Data Bases,
Morgan Kaufmann, pp 478-499
3. Amita Mahor, Saroj Rangnekar (2012) Short term generation scheduling of cascaded hydro
electric system using novel self adaptive inertia weight PSO. International Journal of
Electrical Power & Energy Systems 34(1): 1-9
4. Arumugam MS, Rao MVC (2008) On the improved performances of the particle swarm
optimization algorithms with adaptive parameters, crossover operators and root mean
square (RMS) variants for computing optimal control of a class of hybrid systems. Applied
Soft Computing 8 (1): 324336
5. Back T (1992) Self-adaptation in genetic algorithms. In: Proceedings of the 1st European
Conference on Artificial Life, pp 263271
6. Bing Liu, Wynne Hsu , Yiming Ma (1999) Pruning and Summarizing the Discovered
Associations. In: Proceeding of the fifth ACM SIGKDD international conference on
Knowledge discovery and data mining, pp125 134
7. Brin S, Motwani R, Ullman JD, Tsur S (1997) Dynamic itemset counting and implication
rules for market basket data. In : J. Peckham, editor, Proceedings of the 1997 ACM
SIGMOD International Conference on Management of Data, pp 255264
8. Burke EK, Hyde M, Kendall G et al. (2009) A survey of hyper-heuristics. In: Technical Rep.
NOTTCS-TR-SUB-0906241418-2747, School of Computer Science and Information
Technology, University of Nottingham
9. Carlisle, Dozier G (2001) An off-the-shelf PSO. In: Proceeding on Particle Swarm
Optimization. Indianapolis, Purdue School of Engineering and Technology, pp 16
10. Chatterjee, A., Siarry, P. (2006) Nonlinear inertia weight variation for dynamic adaptation
in particle swarm optimization. Computers and Operations Research, 33: 859871.
11. Clark P, Boswell (1991) Rule Induction with CN2: Some recent improvements. In:
Proceedings of the European Working Session on Learning, pp 151-163
12. Deepa Shenoy P, Srinivasa KG, Venugopal KR, Lalit M. Patnaik (2003) Evolutionary
Approach for Mining Association Rules on Dynamic Databases. In: Proceeding of PacificAsia Conference on Knowledge Discovery and Data Mining, LNAI, 2637, Springer-Verlag, pp
325336

22

13. Eberhart R, Kennedy J (1995) A new optimizer using particle swarm theory. In: Proceedings
of the Sixth International Symposium on Micro Machine and Human Science, Piscataway,
NJ, IEEE Service Center, pp 3943
14. Eiben A E, Michalewicz Z, Schoenauer M, Smith J E, Parameter Control in Evolutionary
Algorithms , Lobo F.G. and Lima C.F. and Michalewicz Z. (eds.) , Parameter Setting in
Evolutionary Algorithms , Springer , 2007 , pp. 19-46
15. Eiben AE, Smith JE (2003) Introduction to Evolutionary Computing. Springer-Verlag, Berlin
Heidelberg New York.
16. Engelbrecht AP (2006) Fundamentals of Computational Swarm Intelligence. John Wiley &
Sons, New York
17. Good IJ (1965) The Estimation of Probabilities: An Essay on Modern Bayesian Methods,
MIT Press, Cambridge
18. Han J, Kamber M (2001) Data mining: Concepts and Techniques. Morgan Kaufmann,
Burlington, Massachusetts
19. Han J, Pei J,Yin Y (2000) Mining frequent patterns without candidate generation, In:
Proceeding of SIGMOD00, Dallas, TX, May 2000, pp 1-12
20. Indira K, Kanmani S et al.( 2012) Population Based Search Methods in Mining Association
Rules. In: Third International Conference on Advances in Communication, Network, and
Computing CNC 2012, LNCS pp 255261.
21. Indira K, Kanmani S. (2012) Association Rule Mining using Self Adaptive Particle Swarm
Optimization, IJCA Special Issue on Computational Intelligence and Information Security
CIIS, 27-31, 2012
22. Jacinto Mata Vzquez, Jos Luis lvarez Macas, Jos Cristbal Riquelme Santos (2002)
Discovering Numeric association rules via evolutionary algorithm. In: Pacific-Asia
Conference on Knowledge Discovery and Data Mining, pp 4051
23. Jasper A Vrugt, Bruce A Robinson, James M Hyman (2009) Self-adaptive multimethod
search for global optimization in real-parameter spaces. IEEE Transactions on Evolutionary
Computation 13 (2) (2009), pp 243259.
24. Kennedy J, Eberhart RC (1995) Particle swarm optimization. In: Proceeding of the IEEE
International Conference on Neural Networks, pp 19421948
25. Kennedy J, Eberhart RC (2001) Swarm Intelligence. Morgan Kaufman, Burlington,
Massachusetts
26. Kennedy J, Mendes R (2002) Population structure and particle swarm performance. In:
Proceedings of the 2002 Congress on Evolutionary Computation, pp 16711676
27. Kuo R J, Chao CM, Chiu YT (2011) Application of Particle Swarm Optimization to
Association rule Mining. Applied soft computing 11(1):326-336
28. Lale Ozbakir, Adil Baykasoglu, Sinem Kulluk. (2008) Rule Extraction from Neural Networks
Via Ant Colony Algorithm for Data Mining Applications, Lecture Notes in Computer
Science, 5313, pp. 177-191.
29. Lale Ozbakir, Adil Baykasoglu, Sinem Kulluk, Huseyin Yapc. (2010) TACO-miner: An ant
colony based algorithm for rule extraction from trained neural networks. Expert Systems
with Applications, 36 (10) : 1229512305.
30. Liu Bo, Abbas, H.A., McKay, B. (2003), Classification rule discovery with ant colony
optimization.In : IEEE/WIC International Conference on Intelligent Agent Technology, 1317 October, Halifax, Canada,pp.83-88.
31. Mendes R, Kennedy J, Neves J (2004) The fully informed particle swarm: simpler, maybe
better. IEEE Transactions on Evolutionary Computation 8: 204210.
32. Merz CJ, Murphy P (1996) UCI Repository of Machine Learning Databases,
http://www.cs.uci.edu/~mlearn/ MLRepository.html

23

33. Nannen V, Smit SK, Eiben AE (2008) Costs and benefits of tuning parameters of
evolutionary algorithms. In: Proceedings of the 10th International Conference on Parallel
Problem Solving from Nature, PPSN X, Dortmund, Germany, pp 528538
34. Parpinelli, R. S., Lopes, H. S., & Freitas, A. A. (2002). Data mining with an ant colony
optimization algorithm. IEEE Transactions on Evolutionary Computation, 6(4): 321332.
Special issue on Ant Colony Algorithms.
35. Parsopoulos KE, Vrahatis MN (2004) Unified particle swarm optimization scheme. In:
Proceeding in Lecture Series on Computational Sciences, pp 868873
36. Parsopoulos KE, Vrahatis MN (2007) Parameter selection and adaptation in unified
particle swarm optimization. Mathematical and Computer Modelling 46: 198213
37. Piatetsky-Shapiro G (1991) Discovery, analysis, and presentation of strong rules. In:
Knowledge Discovery in Databases, AAAI/MIT Press, pp 229248
38. Ratnaweera, A., Halgamuge, S., Watson, H. (2003) Particle Swarm Optimization with Selfadaptive Acceleration Coefficients, Proceedings of First International Conference on Fuzzy
Systems and Knowledge Discovery, pp. 264268
39. Ratnaweera A, Halgamuge SK, Watson HC (2004) Self-organizing hierarchical particle
swarm optimizer with time-varying acceleration coefficients. IEEE Transactions on
Evolutionary Computation 8 (3): 240255
40. Shi Y, Eberhart RC (1998) Parameter selection in particle swarm optimization. In:
Proceeding of the 7th Conference on Evolutionary Programming, New York, pp 591600
41. Shi, Y., and Eberhart, R.C. (1999) Empirical Study of Particle Swarm Optimization,
Proceedings of IEEE Congress on Evolutionary Computation, pp. 19451950.
42. Shi, Y., and Eberhart, R.C. (2001) Fuzzy Adaptive Particle Swarm Optimization, Proceedings
on IEEE Congress on Evolutionary Computation, vol. 1, pp. 101106.
43. Shing Wa Leung, Shiu Yin Yuen, Chi Kin Chow (2012) Parameter control system of
evolutionary algorithm that is aided by the entire search history. Applied Soft Computing
12: 30633078
44. Song, MP, Gu GC (2004) Research on particle swarm optimization: A review. In:
Proceedings of the IEEE International Conference on Machine Learning and Cybernetics,
pp 22362241
45. Tripathi, P.K., Bandyopadhyay, K.S., Pal, S.K. (2007) Adaptive Multi-objective Particle
Swarm Optimization Agorithm, Proceedings of IEEE Congress on Evolutionary
Computation, pp. 22812288.
46. Xin J, Chen G, Hai Y (2009) A Particle Swarm Optimizer with Multistage Linearly-Decreasing
Inertia Weight. In: International Joint Conference on Computational Sciences and
Optimization, IEEE, pp 505508
47. Yamaguchi, T., and Yasuda, K. (2006) Adaptive Particle Swarm Optimization: Selfcoordinating Mechanism with Updating Information, Proceedings of IEEE International
Conference on System, Man, Cybernatics, Taipei, Taiwan, pp. 23032308.
48. Yau-Tarng Juang, Shen-Lung Tung, Hung-Chih Chiu (2011) Adaptive fuzzy particle swarm
optimization for global optimization of multimodal functions. Information Sciences 181:
45394549
49. Ying Wang, Jianzhong Zhou, Chao Zhou, Yongqiang Wang, Hui Qin, Youlin Lu (2012) An
improved self-adaptive PSO technique for short-term hydrothermal scheduling. Expert
Systems with Applications 39(3):2288-2295
50. Yu Wang, Bin Li, Thomas Weise, Jianyu Wang, Bo Yuan, and Qiongjie Tian. Self-adaptive
Learning based Particle Swarm Optimization, Information Sciences, Vol.181, pp 4515-4538.
51. Zhan ZH, Xiao J, Zhang J, Chen WN (2007) Adaptive control of acceleration coefficients for
particle swarm optimization based on clustering analysis. In: Proceedings of IEEE Congress
on Evolutionary Computation, Singapore, pp 32763282
52. Zhan Z-H, Zhang J, Li Y et al. (2009) Adaptive particle swarm optimization. IEEE Transactions
on Systems Man, and Cybernetics Part B: Cybernetics, 39 (6) :1362-1381

24

53. Chang C.I, Chueh, H.E , Lin, N.P (2009) Sequential Patterns Mining with Fuzzy Time-Intervals.
In: Sixth International Conference on Fuzzy Systems and Knowledge Discovery, pp. 165-169.
54. Ceglar, A. and Roddick, J. F. (2006). Association mining. ACM Computing Surveys, 38(2):5.

Annexure 1
UCI Datasets
The datasets that have been used in this research work have been taken from the
UCI machine learning repository. The UCI Machine Learning Repository is a
collection of databases, domain theories, and data generators that are used by the
machine learning community for the empirical analysis of machine learning
algorithms. The archive was created as an ftp archive in 1987 by David Aha and
fellow graduate students at UC Irvine. The details of the datasets used in this
study from http://archive.ics.uci.edu/ml/about.html are presented below.

Lenses Dataset
Number of Instances
: 24
Number of Attributes
: 4 (all nominal)
Attribute Information
: 3 Classes
1 : the patient should be fitted with hard contact lenses,
2 : the patient should be fitted with soft contact lenses,
3 : the patient should not be fitted with contact lenses.
age of the patient: (1) young, (2) pre-presbyopic, (3) presbyopic
spectacle prescription: (1) myope, (2) hypermetrope
astigmatic: (1) no, (2) yes
tear production rate: (1) reduced, (2) normal
Number of Missing Attribute Values: 0
Habermans Survival Dataset
Number of Instances
: 306
Number of Attributes
: 4 (including the class attribute)
Attribute Information
:
1. Age of patient at time of operation (numerical)
2. Patient's year of operation (year - 1900, numerical)
3. Number of positive axillary nodes detected (numerical)
4. Survival status (class attribute)
1 = the patient survived 5 years or longer
2 = the patient died within 5 year
Missing Attribute Values : None
Car Evaluation Dataset
Number of Instances

: 1728
25

(instances completely cover the attribute space)


Number of Attributes
:6
Attribute Values
:
buying
v-high, high, med, low
maint
v-high, high, med, low
doors
2, 3, 4, 5-more
persons
2, 4, more
lug_boot small, med, big
safety
low, med, high
Missing Attribute Values : none
Post Operative Patient Dataset
Number of Instances
: 90
Number of Attributes
: 9 including the decision (class attribute)
Attribute Information
:
1. L-CORE (patient's internal temperature in C):
high (> 37), mid (>= 36 and <= 37), low (< 36)
2. L-SURF (patient's surface temperature in C):
high (> 36.5), mid (>= 36.5 and <= 35), low (< 35)
3. L-O2 (oxygen saturation in %):
excellent (>= 98), good (>= 90 and < 98),
fair (>= 80 and < 90), poor (< 80)
4. L-BP (last measurement of blood pressure):
high (> 130/90), mid (<= 130/90 and >= 90/70), low (< 90/70)
5. SURF-STBL (stability of patient's surface temperature):
stable, mod-stable, unstable
6. CORE-STBL (stability of patient's core temperature)
stable, mod-stable, unstable
7. BP-STBL (stability of patient's blood pressure)
stable, mod-stable, unstable
8. COMFORT (patient's perceived comfort at discharge, measured as
an integer between 0 and 20)
9. decision ADM-DECS (discharge decision):
I (patient sent to Intensive Care Unit),
S (patient prepared to go home),
A (patient sent to general hospital floor)
Missing Attribute Values

Attribute 8 has 3 missing values

Zoo Dataset
Number of Instances
Number of Attributes
Attribute Information

: 101
: 18 (animal name, 15 Boolean attributes, 2 numerics)
: (name of attribute and type of value domain)
26

1. animal name
2. hair
3. feathers
4. eggs
5. milk
6. airborne
7. aquatic
8. predator
9. toothed
10. backbone
11. breathes
12. venomous
13. fins
14. legs
15. tail
16. domestic
17. catsize
18. type
Missing Attribute Values

Unique for each instance


Boolean
Boolean
Boolean
Boolean
Boolean
Boolean
Boolean
Boolean
Boolean
Boolean
Boolean
Boolean
Numeric (set of values: {0,2,4,5,6,8})
Boolean
Boolean
Boolean
Numeric (integer values in range [1,7])
: None

Iris Dataset
Number of Instances
: 150 (50 in each of three classes)
Number of Attributes
: 4 numeric, predictive attributes and the class
Attribute Information
:
Sepal length, Sepal width, Petal length and Petal width in cm
Class Types
: Iris Setosa, Iris Versicolour and Iris Virginica.
Missing Attribute Values : None
Nursery Dataset
Number of Instances
Number of Attributes
Attribute Values
parents
has_nurs
form
children
housing
finance
social
health

: 12960 (instances completely cover the attribute


space)
:8
:

usual, pretentious, great_pret


proper, less_proper, improper, critical, very_crit
complete, completed, incomplete, foster
1, 2, 3, more
convenient, less_conv, critical
convenient, inconv
non-prob, slightly_prob, problematic
recommended, priority, not_recom
27

Missing Attribute Values

: none

Tic Tac Toe Dataset


Number of Instances
Number of Attributes
Attribute Information

: 958 (legal tic-tac-toe endgame boards)


: 9, each corresponding to one tic-tac-toe square
: (x=player x has taken, o=player o has taken,
b=blank)

1. top-left-square: {x,o,b}
2. top-middle-square: {x,o,b}
3. top-right-square: {x,o,b}
4. middle-left-square: {x,o,b}
5. middle-middle-square: {x,o,b}
6. middle-right-square: {x,o,b}
7. bottom-left-square: {x,o,b}
8. bottom-middle-square: {x,o,b}
9. bottom-right-square: {x,o,b}
10. Class: {positive,negative}
Missing Attribute Values

: None

Wisconsin Breast Cancer Dataset


Number of Instances
: 699
Number of Attributes
: 10 plus the class attribute
Attribute Information
:
Sample code number, Clump Thickness, Uniformity of Cell Size,
Uniformity of Cell Shape, Marginal Adhesion, Single Epithelial Cell
Size, Bare Nuclei, Bland Chromatin, Normal Nucleoli, Mitoses and
Class (2 for benign, 4 for malignant).
Missing attribute values
: 16

28

You might also like