You are on page 1of 28

International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.

3, June 2015

A SERIAL COMPUTING MODEL OF AGENT


ENABLED MINING OF GLOBALLY STRONG
ASSOCIATION RULES
G.S.Bhamra1, A. K.Verma2 and R.B.Patel3
1

M. M. University, Mullana, Haryana, 133207 - India


2
Thapar University, Patiala, Punjab, 147004- India
3
Chandigarh College of Engineering & Technology, Chandigarh- 160019- India

ABSTRACT
The intelligent agent based model is a popular approach in constructing Distributed Data Mining (DDM)
systems to address scalable mining over large scale and ever increasing distributed data. In an agent based
distributed system, variety of agents coordinate and communicate with each other to perform the various
tasks of the Data Mining (DM) process. In this study a serial computing mode of a multi-agent system
(MAS) called Agent enabled Mining of Globally Strong Association Rules (AeMGSAR) is presented based
on the serial itinerary of the mobile agents. A Running environment is also designed for the implementation
and performance study of AeMGSAR system.

KEYWORDS
Knowledge Discovery, Association Rules, Intelligent Agents, Multi-Agent System

1.INTRODUCTION
Data Mining (DM) technique is used to extract some interesting and valid data patterns implicitly
stored in large databases [1], [2]. Intelligent software agent technology is an interdisciplinary
technology dealing with the development and efficient utilization of autonomous software objects
called agents which have access to geographically distributed and heterogeneous resources. They
are autonomous, adaptive, reactive, pro-active, social, cooperative, collaborative and flexible.
They also support temporal continuity and mobility within the network. An intelligent agent with
mobility feature is known as Mobile Agent (MA). MA migrates from node to node in a
heterogeneous network without losing its operability. On reaching at a network node MA is
delivered to an Agent Execution Environment (AEE) where its executable parts are started
running. Upon completion of the desired task, it delivers the results to the home node. A Mobile
Agent Platform (MAP) or Agent Execution Environment (AEE), is a server application that
provides the appropriate functionality to MAs to authenticate, execute, communicate, migrate to
other platform, and use system resources in a secure way. A Multi Agent System (MAS) is
distributed application comprised of multiple interacting intelligent agent components [3].
Let DB = {T j , j = 1K D} be a transactional dataset of size D where each transaction T is assigned
an identifier ( TID ) and I = {d i , i = 1K m} , total m data items in DB . A set of items in a particular
transaction T is called itemset or pattern. An itemset, P = {d i , i = 1K k } , which is a set of k data
DOI:10.5121/ijcsa.2015.5307

77

International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.3, June 2015

items in a particular transaction T and P I , is called k-itemset. Support of an itemset,


No_of_T_containing_P
s ( P) =
% is the frequency of occurrence of itemset P in DB , where
D
No_of_T_containing_P is the support count (sup_count) of itemset P . Frequent Itemsets (FIs)
are the itemset that appear in DB frequently, i.e., if s ( P ) min_th_sup (given minimum
threshold support), then P is a frequent k-itemset. Finding such FIs plays an essential role in
miming the interesting relationships among itemsets. Frequent Itemset Mining (FIM) is the task
of finding the set of all the subsets of FIs in a transactional database [2].

Association Rules (ARs) are used to discover the associations among item in a database [4]. It is
an implication of the form P Q [support,confidence ] where, P I , Q I and P Q = . An
AR is measured in terms of its support and confidence factor where support of the rule
( s ( P Q ) ) is the probability of both P and Q appearing in T , i.e., p ( P Q ) and the
confidence of the rule ( c ( P Q ) ) is the conditional probability of Q given P , i.e., p ( Q | P ) .
An AR is said to be strong if s ( P Q ) min_th_sup (given minimum threshold support) and
c ( P Q ) min_th_conf (given minimum threshold confidence). Association Rule Mining (ARM)

today is one of the most important aspects of DM tasks. In ARM all the strong ARs are generated
from the FIs. The ARM can be viewed as two step process [5], [6].
1. Find all the frequent k-itemsets ( Lk )
2. Generate Strong ARs from Lk
a. For each frequent itemset, l Lk , generate all non empty subsets of l .
b. For every non empty subset s of l , output the rule s ( l s ) , if
sup_count ( l )
sup_count ( s )

min_th_conf

Distributed Association Rule Mining (DARM) is the task of generating the globally strong
association rules from the global FIs in a distributed environment. Few preliminaries notations
and definitions required for defining DARM and to make this study self contained are as follows:

S = {Si ,i = 1K n} , n distributed sites.

SCENTRAL ,

Central Site.
DBi = {T j , j = 1K Di } , Horizontally partitioned data set of size Di at the local site Si , where
each transaction T j is assigned an identifier (TID).

DB = U i =1 DBi

I = {d i , i = 1K m} , total m data items in each DBi .

LFIk( i ) , Local frequent k-itemsets at site Si .

LFISC
, List of support count Itemset LFIk( i ) .
k (i )

LLSAR
, List of locally strong association rules at site Si .
i

LTLSAR = U i =1 LLSAR
, List of total locally strong association rules.
i

LTFI
= U i =1 LFIk ( i ) , List of total frequent k-itemsets.
k

, the aggregated dataset of size D = i =1 Di , DBi DB j =

78

International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.3, June 2015
n

LGFI
= I i =1 LFIk ( i ) , List of global frequent k-itemsets.
k

LGSAR
, List of Globally strong association rule.
CENTRAL

Local Knowledge Base (LKB), at site Si , comprises of LFIk( i ) , LFISC


and LLSAR
which can provide
k (i )
i
reference to the local supervisor for local decisions. Global Knowledge Base (GKB), at SCENTRAL ,
comprises of LTLSAR , LTFI
, LGFI
and LGSAR
for the global decision making [7]. Like ARM, DARM
k
k
CENTRAL
task can also be viewed as two-step process [6]:
1. Find the global frequent k-itemset ( LGFI
) from the distributed Local frequent k-itemsets
k
( LFIk( i ) ) from the partitioned datasets.
) from LGFI
.
2. Generate globally strong association rules ( LGSAR
k
CENTRAL
The existing agent based systems specifically dealing with DARM task are: Knowledge
Discovery Management System (KDMS) [8], Efficient Distributed Data Mining using Intelligent
Agents [9], Mobile Agent based Distributed Data Mining [10], An Agent based Framework for
Association Rule Mining of Distributed Data (AFARMDD) [11], [12], Multi-Agent Distributed
Association Rule Miner (MADARM) [13]. All these systems are academic research projects.
Qualitative comparison of these DARM frameworks is provided in [14]. Most of the existing
agent based frameworks for DARM task are only prototype model and lacks the appropriate
underlying AEE, scalability, privacy preserving techniques, global knowledge generation and
implementation using a real datasets.
The rest of the paper is organised as follows. Section 2 described the running environment for the
proposed system along with various algorithms involved. Serial computing model of AeMGSAR
is presented in Section 3. Algorithms for all the agents involved in this system are also discussed.
Section 4 describes the implementation and performance study of the system and finally the
article is concluded in Section 5.

2.ENVIRONMENT FOR THE PROPOSED SYSTEM


Every MAS needs an underlying AEE to provide a running infrastructure on which agents can be
deployed and tested. A running environment has been designed in Java. Various attributes of the
MA are encapsulated within a data structure known as AgentProfile . It contains the name of MA
( AgentName ), version number ( AgentVersion ), entire byte code ( BC ), list of nodes to be
visited by MA, i.e., itinerary plan ( LNODES ) , type of the itinerary ( ItinType ) which can be
serial or parallel, a reference of current execution state ( AObject ) and an additional data structure
known as Briefcase that acts as a result bag of MA to store final resultant knowledge ( Result_Si )
at a particular site. Computational time ( CPUTime ) taken by a MA at a particular site is also
stored in Result_Si . In addition to results, Briefcase also contains the system time for start of
agent journey ( TripTimestart ), system time for end of journey ( TripTimeend ) and total round trip
time of MA ( TripTime ) calculated using TripTime TripTimeend TripTimestart . Stationary as well
as mobile agents involved in the models would be discussed later on. This environment consists
of the following three components:

79

International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.3, June 2015

Data Mining Agent Execution Environment (DM_AEE): It is the key component that
acts as a Server. DM_AEE is deployed on any distributed sites Si and is responsible for
receiving, executing and migrating all the visiting DM agents. It receives the incoming
AgentProfile at site Si , retrieves the entire BC of agent and save it with
AgentName.class in the local file system of the site Si after that execution of the agent is
started using AObject . Steps are shown in Algorithm 1.

Agent Launcher (AL): It acts a Client at agent launching station ( SCENTRAL ) and launches
the goal oriented DM agents on behalf of the user through a user interface to the
DM_AEE running at the distributed sites. Agent Pool (or Zone) at SCENTRAL is a repository

of all mobile as well as stationary agents (SAs). AL first reads and stores AgentName
in AgentProfile . The entire BC of the AgentName is loaded from the Agent Pool and
stored in AgentProfile . LNODES and ItinType are retrieved and stored in AgentProfile .
TripTimestart is maintained in Briefcase which is further added to AgentProfile . In case of
serial computing model, i.e., if ItinType = Serial , AL dispatches a specific single MA
along with LNODES , and it travels from node to node. AgentVersion is set as 1 for this
agent. AL also contacts the Result Manager (RM) for processing the Briefcase of an agent.
Detailed steps are given in Algorithm 2.

Result Manager (RM): It manages and processes the Briefcase of all MAs. RM is either
contacted by a MA for submitting its results or by AL for processing the results of the
specific MA. On completion of itinerary, each DM agent submits its results to RM which
computes total round trip time ( TripTime ) of that MA and saves it in the Briefcase of that
agent. It ItinType = Serial then it saves the updated AgentProfile of an agent at SCENTRAL .
When it is contacted by AL for processing the results of a specific agent it sends back the
AgentProfile of that agent. Steps are defined in Algorithm 3.

Algortihm 1 DATA MINING AGENT EXECUTION ENVIRONMENT (DM_AEE)


1: procedure DM_AEE( )
2:
while TRUE do
3:
AgentPofile listen and receive AgentProfile at Si
4:
AgentName get AgentName from AgentProfile
5:
6:

BC retrieve the BC of agent from AgentProfile

save the BC with AgentName.class in the local file system of Si


7:
AObject get AObject from AgentProfile
> current state
8:
> start executing mobile agent
AObject.run()
9:
end while
10: end procedure
Algortihm 2 AGENT LAUNCHER (AL)
1: procedure AL( )
2:
option read option (dispatch / result)
3:
switch option do
4:
case dispatch
5:
6:

> dispatch the mobile agent to DM_AEE

AgentName read Mobile Agent's name

add AgentName to AgentProfile


80

International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.3, June 2015
7:
8:
9:

10:
11:
12:
13:

BC load entire byte code of AgentName from AgentPool

add BC to AgentProfile
LNODES read Itinerary (IP addresses) of mobile agent
ItinType read ItinType ( Serial / Parallel)

add ItinType to AgentProfile


if ItinType = " Serial " then
AgentVersion 1

14:

add AgentVersion to AgentProfile

15:
16:
17:
18:

add LNODES to AgentProfile


switch AgentName do
case LFIGA
minthrsup read minimum threshold support
AObject new LFIGA(AgentProfile, minthrsup)
end case
case LKGA
minthrconf read minimum threshold confidence
AObject new LKGA(AgentProfile, minthrconf)
end case
case TFICA
AObject new TFICA(AgentProfile)
end case
case LKCA
AObject new LKCA( AgentProfile)
end case
case GKDA
GSAR
LGSAR
CENTRAL load LCENTRAL generated by GKGA at SCENTRAL

19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:

add LGSAR
CENTRAL to Briefcase
add updated Briefcase to AgentProfile

35:
36:
37:
38:

AObject new GKDA (AgentProfile)


end case
end switch
add AObject to AgentProfile

39:
40:
41:
42:
43:
44:

> Serial Itinerary

> current state

Transfer AgentProfile to DM_AEE at first IP address in LNODES


end if
end case
case result
> process the result of mobile agent
AgentName read mobile agent's name
ItinType read mobile agent's ItinType

45:

add AgentName to LAgentInfo

46:
47:
48:

add ItinType to LAgentInfo


> Result processing for Serial Itinerary Agents
if ItinType = " Serial " then

49:
50:
51:
52:

AgentProfile contact RM for LAgentInfo

Briefcase retrieve Briefcase from AgentProfile


switch AgentName do
case LFIGA
81

International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.3, June 2015
53:
process the Briefcase of LFIGA
54:
end case
55:
case LKGA
56:
process the Briefcase of LKGA
57:
end case
58:
case TFICA
59:
call GFIGA (Briefcase)
60:
end case
61:
case LKCA
62:
call GKGA (Briefcase)
63:
end case
64:
case GKDA
65:
process the Briefcase of GKDA
66:
end case
67:
end switch
68:
end if
69:
end case
70: end switch
71: end procedure

> stationary agent

> stationary agent

Algortihm 3 RESULT MANAGER (RM)


1: procedure RM( )
2: while TRUE do
3:
listen and receive the incomming request
4:
if contacted by a mobile agent for submitting results from site Si then
5:
AgentProfile receive the incomming AgentProfile from site Si
6:
ItinType retrieve ItinType from AgentProfile

7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:

Briefcase retrieve mobile agent's Briefcase from AgentProfile

TripTimestart retrieve TripTimestart from Briefcase


TripTimeend retrieve TripTimeend from Briefcase
TripTime TripTimeend TripTimestart
add TripTime to Briefcase
add updated Briefcase to AgentProfile
if ItinType = " Serial " then
save AgentProfile at SCENTRAL
end if
end if
if contacted by AL for processing the results then

18:

AgentName retrieve AgentName from incomming LAgentInfo

19:

ItinType retrieve ItinType from incomming LAgentInfo


if ItinType = " Serial " then

20:
21:

AgentProfile load AgentProfile for AgentName from SCENTRAL


22:
dispatch AgentProfile to AL
23:
end if
24:
end if
25: end while
26: end procedure
82

International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.3, June 2015

The overall working of AeMGSAR system may be divided into following six stages:
1. Request Stage: Request for the DARM is initiated at SCENTRAL by AL on behalf of the user
with necessary credentials.
2. Preparation Stage: AL through User Interface reads agent name; version number;
Itinerary for the MAs journey is obtained in terms of IP addresses of the distributed nodes
to be visited by a MA; any specific additional data for a specific MA is obtained; Agent
code for the specific MA is loaded from AgentPool; for serial itinerary a single specific
MA is dispatched by AL to travel and visit n distributed sites in parallel.
3. Local Mining Stage: ARM process is performed locally by specific DM agents on each
distributed site and results are kept as local knowledge base at that site.
4. Result Collection Stage: Collector agents visits each site and collect the results generated
by DM agents and submit the results back to RM at SCENTRAL .
5. Knowledge Integration and Global Knowledge Generation Stage: Knowledge or result
integration is carried out by the RM with the help of stationary agent and Global
Knowledge in the form of Globally Strong Association Rules may be generated with the
help of other stationary agents at SCENTRAL .
6. Global Knowledge Dispatching Stage: Global knowledge is dispatched to the distributed
sites by a dispatching agent to compare it with the local knowledge at each site.

Figure 1. AeMGSAR Serial Computing Model

83

International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.3, June 2015

3.SERIAL COMPUTING MODEL OF AEMGSAR


Serial computing model of AeMGSAR system is shown in Figure 1. It consists of total seven
agents, five of these are MAs dispatched from SCENTRAL with serial itinerary multi-hop migration
and other two are intelligent SAs running at SCENTRAL to perform different tasks. The CPU time
taken by a MA while processing on each site along with some other specific information is
carried back in the result bag at SCENTRAL . Agents in serial number 1-5 visit n sites serially other
parameters are collected from different resources. Detailed relationship among these agents and
working behaviour of each agent is as follows:
1. Local Frequent Itemset Generater Agent (LFIGA): This is a MA that carries the
and LFISC
AgentProfile & min_th_sup . LFIGA generates and stores LFI
Si by
k (i )
k ( i ) at site
scanning the local DBi at that site with the constraint of min_th_sup . It carries back the
computational time ( CPUTime ) at each site Si and TripTimeend . This agent is embedded
with Apriori algorithm [15] for generating all the frequent k-itemset lists. It may be
equipped with decision making capability to select other FIM algorithms based on the
density of the dataset at a particular site. More details are available in Algorithm 4.
2. Local Knowledge Generater Agent (LKGA): This is a MA that carries the
AgentProfile & min_th_conf . LKGA applies the constraint of min_th_conf to generate and
by using the LFIk(i ) and LFISC
store LLSAR
k ( i ) lists already generated by LFIGA agent at site Si .
i

list also support and confidence for a particular association rule along with the site
LLSAR
i
name. It carries back the computational time ( CPUTime ) at each site Si and TripTimeend .
Detailed steps are given in Algorithm 7.
3. Total Frequent Itemset Collector Agent (TFICA): This is a MA that carries the
AgentProfile . TFICA collects list of local frequent k-itemset ( LFI
k ( i ) ) generated by LFIGA
agent and carries back the list of total frequent k-itemset LTFI
in the result bag to RM at
k
SCENTRAL . In addition to this resultant knowledge, it also carries back the computational
time ( CPUTime ) at each site Si and TripTimeend . It executes Algorithm 8.
4. Local Knowledge Collctor Agent (LKCA): This is a MA that carries the AgentProfile .
LKCA collects the list of locally strong association rules ( LLSAR
) generated by LKGA
i
agent and carries back the list of total locally strong association rules ( LTLSAR ) in the result
bag to RM at SCENTRAL . In addition to this resultant knowledge, it also carries back the
computational time ( CPUTime ) at each site Si and TripTimeend . Steps are shown in
Algprithm 9.
5. Global Knowledge Dispatcher Agent (GKDA): This is a MA that carries the
AgentProfile containing global knowledge ( LGSAR
CENTRAL ). It dispatches global knowledge at
every site for further decision making and comparing with the local knowledge at that
site. It executes Algorithm 12.
6. Global Frequent Itemset Generater Agent (GFIGA): It is a stationary agent at SCENTRAL ,
mainly used for processing the result bag of TFICA, i.e., total frequent k-itemset list
GFI
( LTFI
. More details
k ) generated y TIFCA to generate the global frequent itemset list, Lk
are available in Algorithm 10.
7. Global Knowledge Generater Agent (GKGA): It is also a stationary agent at SCENTRAL ,
mainly used for processing the LGFI
list and LTLSAR list to compile the global knowledge,
k
84

International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.3, June 2015

i.e., the list of globally strong association rules, LGSAR


CENTRAL . Detailed steps are shown in
Algorithm 11.
Algortihm 4 LOCAL FREQUENT ITEMSET GENERATER AGENT (LFIGA)
Input:
AgentProfile, A collection of agent attributes set by the AL
min_th_sup, the given minimum threshold support
Output: LFI &SC , the list of frequent itemsets and their support counts

1: procedure LFIGA( AgentProfile,min_th_sup )


2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:

CPUTimestart get system time

Briefcase get Briefcase from AgentProfile


DBi load DBi from local file system of site Si
> No. of records
T DBi .get (0)
I DBi .get (1)
> No. of items
> itemset data bank
DB[T][I] DBi .get (3)
minsupcount (T min_th_sup) / 100
> generate frequent-1 itemset list ( FIL1 ) and support count list ( FISC1 )
CFIL1 {1,2,3...I}
> candidate frequent-1 itemset
for i 1,I do
> initialize the support count array SCFIL1 to zero
SCFIL1 [i] 0
end for
k 1
for all candidate c CFIL1 do
> find support count for every candidate
for all transaction t DB do
if c t then
SCFIL1 [k ] SCFIL1[k ] + 1
end if
end for
k k +1
end for
> prune CFIL1 to generate FIL1 and FISC1
for k 1, I do
if SCFIL1[k ] minsupcount then
add ck CFIL1 to FIL1
add SCFIL1 [k] to FISC1
end if
end for
if FIL1 then

31:

add FIL1 to LFI

32:
33:
34:
35:

add FISC1 to LFISC


end if
k2
while FILk 1 do

36:
37:
38:

CFILk Call GenerateCFIL(FILk -1 ) > see Algorithm 5


for i 1, CFILk .length do
> initialize the array SCFILk to zero
SCFILk [i ] 0
85

International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.3, June 2015
39:
40:
41:
42:
43:
44:
45:
46:
47:
48:
49:
50:

end for
i 1
for all candidate c CFILk do
for all transaction t DB do
if c t then
SCFIL1 [k ] SCFIL1[k ] + 1
end if
end for
i i +1
end for
> prune CFILk to generate FILk and FISCk
for i 1, SCFILk .length do
if SCFILk [i] minsupcount then
add ci CFILk to FILk

51:
52:
53:
54:
55:
56:

add SCFILk [i] to FISCk


end if
end for
if FILk then
add FILk to LFI

57:
58:
59:
60:
61:
62:
63:
64:
65:

add FISCk to LFISC


end if
k k +1
end while
add T to LFI &SC
add LFI to LFI &SC
add LFISC to LFI &SC
save LFI &SC in the local file system of this site Si

66:

CPUTimeend get system time

67:
68:
69:
70:

CPUTime CPUTimeend CPUTimestart


add CPUTime to Result_Si
add Result_Si to Briefcase
add updated Briefcase to AgentProfile

71:

LNODES get itinerary list from AgentProfile

72:

LNODES remove first IP address from LNODES

73:
74:
75:
76:
77:
78:
79:
80:
81:
82:
83:
84:

> find support count for every candidate


> scan DB

NODES

add updated L

> visited site

to AgentProfile

NODES

if L
then
AObject new LGFIGA(AgentProfile, min_th_sup)
add AObject to AgentProfile

> itinerary not empty

transfer AgentProfile to DM_AEE at first IP address in LNODES


else

TripTimeend get system time for end of agent journey


add TripTimeend to Briefcase
add updated Briefcase to AgentProfile
transfer AgentProfile to RM at SCENTRAL
end if
end procedure
86

International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.3, June 2015
Algortihm 5 GENERATECFIL
Input: Lk 1 , Frequent k - 1 itemsets
Output: Ck , Candidate Frequent k itemsets

1: procedure GENERATECFIL ( Lk 1 )
2:
3:

for all itemset l1 Lk -1 do


for all itemset l2 Lk -1 do

4:
if (l1 [1] = l2 [1]) (l1 [2] = l2 [2]) L (l1 [k - 1] = l2 [k - 1]) then
5:
> join step: generate candidates
c l1 l2
6:
end if
7:
if HASINFREQUENTSUBSET ( c, Lk 1 ) then
> see Algorithm 6
8:
delete c
9:
else
10:
add c to Ck
11:
end if
12:
end for
13:
end for
14:
return Ck
15: end procedure

Algortihm 6 HASINFREQUENTSUBSET
Input: c, Candidate k itemsets
Output: Lk 1 , Frequent k 1 itemsets

1: procedure HASINFREQUENTSUBSET ( c, Lk 1 )
2: for all (k - 1) subset s c do
3:
if s Lk 1 then
4:
return TRUE
5:
else
6:
return FALSE
7:
end if
8: end for
9: end procedure

Algortihm 7 LOCAL KNOWLEDGE GENERATER AGENT (LKGA)


Input:
AgentProfile, A collection of agent attributes set by the AL
min_th_conf, the given minimum threshold confidence
Output: LLSAR , the list of locally strong association rules

1: procedure LKGA( AgentProfile,min_th_conf )


2:
3:

CPUTimestart get system time

4:

LFI &SC load LFI &SC from local file system of this site Si

5:

T LFI & SC .get (0)

6:

LFI LFI & SC .get (1)

Briefcase get Briefcase from AgentProfile

> No. of records


> frequent k-itemset list
87

International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.3, June 2015
7:
8:
9:
10:

LFISC LFI & SC .get (2)


for k 2, L .size do
Lk LFI .get (k )
for all l Lk do

11:

lsubsets generate all non - empty subsets of l

12:

lspcount get support count of l from LFISC

13:

ARsupport (lspcount / T) 100

14:

for all non - empty subset s lsubsets do

15:

sspcount get support count of s from LFISC

16:

ARconf (lspcount / sspcount )100

17:

if ARconf min_th_conf then

18:

ARstrong "s l - s[ARsupport %, ARconf %]"

19:

print ARstrong

20:

add l to ARstrong

21:

SiIP get IP address of this site Si

22:

add SiIP to ARstrong

23:

add ARstrong to LLSAR

24:
25:
26:
27:
28:
29:
30:

33:
34:

LNODES get itinerary list from AgentProfile

35:

LNODES remove first IP address from LNODES


NODES

36:

add updated L

37:
38:

if LNODES then
AObject new LKGA(AgentProfile, min_th_conf)
add AObject to AgentProfile

39:
40:
41:
42:
43:
44:
45:
46:
47:

> get frequent k-itemset list

> support of the association rule

> confidence of the association rule

end if
end for
end for
end for
save LLSAR in the local file system of this site Si
CPUTimeend get system time

CPUTime CPUTimeend CPUTimestart


add CPUTime to Result_Si
add Result_Si to Briefcase
add updated Briefcase to AgentProfile

31:
32:

> support count list

FI

> visited site

to AgentProfile

> itinerary not empty

transfer AgentProfile to DM_AEE at first IP address in LNODES


else

TripTimeend get system time for end of agent journey


add TripTimeend to Briefcase
add updated Briefcase to AgentProfile
transfer AgentProfile to RM at SCENTRAL
end if
end procedure

88

International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.3, June 2015
Algortihm 8 TOTAL FREQUENT ITEMSET COLLECTOR AGENT (TFICA)
Input: AgentProfile, A collection of agent attributes set by the AL
Output: LFI , the list of locally frequent itemsets

1: procedure TFICA( AgentProfile,min_th_conf )


2:
CPUTimestart get system time
3:
Briefcase get Briefcase from AgentProfile
4:

LFI &SC load LFI &SC from local file system of this site Si

5:

LFI LFI & SC .get (1)

6:

add LFI to Result_Si

7:

10:
11:

CPUTimeend get system time


CPUTime CPUTimeend CPUTimestart
add CPUTime to Result_Si
add Result_Si to Briefcase
add updated Briefcase to AgentProfile

12:

LNODES get itinerary list from AgentProfile

13:

LNODES remove first IP address from LNODES

14:

add updated LNODES to AgentProfile

15:
16:
17:

if LNODES then
AObject new TFICA(AgentProfile)

18:
19:
20:
21:

transfer AgentProfile to DM_AEE at first IP address in LNODES


else
TripTimeend get system time for end of agent journey
add TripTimeend to Briefcase

8:
9:

> frequent k-itemset list

> visited site


> itinerary not empty

add AObject to AgentProfile

22:
add updated Briefcase to AgentProfile
23:
transfer AgentProfile to RM at SCENTRAL
24:
end if
25: end procedure

Algortihm 9 LOCAL KNOWLEDGE COLLECTOR AGENT (LKCA)


Input: AgentProfile, A collection of agent attributes set by the AL
Output: LLSAR , the list of locally strong association rules

1: procedure LKCA( AgentProfile )


2:
3:

CPUTimestart get system time


Briefcase get Briefcase from AgentProfile

4:

LLSAR load LLSAR from local file system of this site Si

5:

add LLSAR to Result_Si

6:
7:

CPUTimeend get system time

8:
9:
10:

CPUTime CPUTimeend CPUTimestart


add CPUTime to Result_Si
add Result_Si to Briefcase
add updated Briefcase to AgentProfile
89

International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.3, June 2015
11:

LNODES get itinerary list from AgentProfile

12:

LNODES remove first IP address from LNODES


NODES

> visited site

13:

add updated L

14:
15:
16:

if LNODES then
AObject new LKCA(AgentProfile)

17:
18:
19:
20:

transfer AgentProfile to DM_AEE at first IP address in LNODES


else
TripTimeend get system time for end of agent journey
add TripTimeend to Briefcase

to AgentProfile

> itinerary not empty

add AObject to AgentProfile

21:
add updated Briefcase to AgentProfile
22:
transfer AgentProfile to RM at SCENTRAL
23:
end if
24: end procedure

Algortihm 10 GLOBAL FREQUENT ITEMSET GENERATER AGENT (GFIGA)


Input: Briefcase, Result bag of TFICA agent
Output: LGFI , the list of global frequent itemsets

1: procedure GFIGA( Briefcase )


2:
CPUTimestart get system time

(U L ) from Briefcase
retrieve global frequent itemsets ( I L ) from Briefcase
n

3:

LTFI retrieve total frequent itemsets

FI
i

4:

LGFI

5:
6:
7:

print LGFI
save LGFI in the local file system of site SCENTRAL
CPUTimeend get system time

i=1

i =1

FI
i

8:
CPUTime CPUTimeend CPUTimestart
9:
print CPUTime
10:
return LGFI
11: end procedure

Algortihm 11 GLOBAL KNOWLEDGE GENERATER AGENT (GKGA)


Input: Briefcase, Result bag of LKCA agent
Output: LGSAR
CENTRAL , the list of globally strong association rules

1: procedure GKGA( Briefcase )


2:

CPUTimestart get system time

3:

LTLSAR retrieve total strong rules

4:

LGFI load global frequent itemsets ( LGFI ) from SCENTRAL

5:

for all ARstrong LTLSAR do

(U

n
i=1

LLSAR
from Briefcase
i

6:

L get frequent itemset from ARstrong

7:

if L LGFI then

90

International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.3, June 2015
8:

print ARstrong along with the site address (SiIP )

9:

add ARstrong to LGSAR


CENTRAL

10:
11:
12:
13:
14:
15:
16:
17:

end if
end for
save LGSAR
CENTRAL in the local file system of site SCENTRAL

CPUTimeend get system time


CPUTime CPUTimeend CPUTimestart
print CPUTime
return LGSAR
CENTRAL
end procedure

Algortihm 12 GLOBAL KNOWLEDGE DISPATCHER AGENT (GKDA)


Input: AgentProfile, A collection of agent attributes set by the AL
Output: Dispatch LGSAR
CENTRAL at each distributed site Si

1: procedure GKDA( AgentProfile )


2:
CPUTimestart get system time
3:
Briefcase get Briefcase from AgentProfile
4:

GSAR
LGSAR
CANTRAL get LCENTRAL from Briefcase

5:
6:

save LGSAR
CENTRAL in the local file system of site Si

7:
8:
9:
10:

CPUTimeend get system time


CPUTime CPUTimeend CPUTimestart
add CPUTime to Result_Si
add Result_Si to Briefcase
add updated Briefcase to AgentProfile

11:

LNODES get itinerary list from AgentProfile

12:

LNODES remove first IP address from LNODES


NODES

13:

add updated L

14:
15:
16:

NODES

if L
then
AObject new GKDA(AgentProfile)

17:
18:
19:

transfer AgentProfile to DM_AEE at first IP address in LNODES


else
TripTimeend get system time for end of agent journey
add TripTimeend to Briefcase

20:
21:

> visited site

to AgentProfile

> itinerary not empty

add AObject to AgentProfile

add updated Briefcase to AgentProfile


22:
transfer AgentProfile to RM at SCENTRAL
23:
end if
24: end procedure

91

International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.3, June 2015

Figure 2. Control Panel of AeMGSAR

4.IMPLEMENTATION AND PERFORMANCE STUDY


All the agents as well as control panel as shown in Figure 2 are designed in Java. Synthetic
dataset ( DBi ) is stored across three distributed sites S1 , S2 and S3 , with 3500, 3850 and 3900
transactions and 10 items in each respectively using Transactional Data Set Generator (TDSG)
tool [16]. Binary and transactional versions of these datasets are shown in Appendix A. The
required configuration of the system is shown in Table 1 with additional deployment of DM_AEE
at each distributed site and AL and RM at SCENTRAL . Round Trip time taken by various MAs is
shown in Figure 3. CPU time consumed by various MAs at site S1 , S2 and S3 is shown in Figure
4, Figure 5 and Figure 6, respectively. CPU time for GFIGA and GKGA is 101357102 nano
seconds and 33317458 nano seconds, respectively. LFIk(i ) and LFISC
k ( i ) at distributed sites generated by
LFIGA agent with 20% min_th_sup are shown in Appendix B.1, B.2 and B.3. LLSAR
at distributed
i
sites generated by LKGA agent with 50% min_th_conf are shown in Appendix B.4, B.5 and B.6.
Globally frequent itemsets generated by GFIGA at SCENTRAL is shown in Figure 7. Fifteen numbers
of 2-itemsets and eight number of 3-itemsets are globally frequent in LTFI
list and 4, 5 and 6k
itemsets, which are locally frequent, are not globally frequent. Globally strong association rules
( LGSAR
CENTRAL ) generated by GKGA at SCENTRAL for globally frequent 3-itemsets are shown in Figure 8
and LGSAR
CENTRAL for 2-itemsets are shown in Appendix B.7.

On comparing this system with the traditional central data warehouse (DW) based approach for
ARM where entire data from the distributed sites is centrally collected in a DW [17], it is found
that the storage cost is reduced as data is mined locally and only the resultant knowledge is
carried at the central site by mobile agents. As size of the resultant data carried across by mobile
92

International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.3, June 2015

agents is small so network communication cost is also reduced in this case. Data mining is
performed locally by agents, so computational cost at central site is also minimised. AeMGSAR
reflects the global knowledge because all the strong association rules generated are also strong at
each distributed site. The system relies upon the Java's in-built security system. As MAs are
scalable in nature so performance would not be affected by adding more sites.
Table 1. Network Configuration

LAN Configuration
IP a
Network
SCENTRAL
Intel b
MS c
192.168.46.5
NW d
S1
Intel b
MS c
192.168.46.212
NW d
b
c
S2
Intel
MS
192.168.46.189
NW d
b
c
S3
Intel
MS
192.168.46.213
NW d
a. IP address with Mask: 255.255.255.0 and Gateway 192.168.46.1
b. Intel Pentium Dual Core(3.40 GHz, 3.40 GHz) with 512 MB RAM
c. Microsoft Windows XP Professional ver. 2002
d. Network Speed: 100 Mbps and Network Adaptor: 82566DM-2 Gigabit NIC
Site Name

Processor

OS

Figure 3. Round Trip time taken by various MAs

Figure 4. CPU Time taken by various MAs at site S1


93

International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.3, June 2015

Figure 5. CPU Time taken by various MAs at site S2

Figure 6. CPU Time taken by various MAs at site S3

Figure 7. Lists of global frequent k-itemsets at SCENTRAL


94

International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.3, June 2015

Figure 8. Globally strong association rules for globally frequent 3-itemsets

5.CONCLUSION
Mobile agents strongly qualify for designing distributed applications and the amalgamation of
DDM and agent technology gives favourable results. Most of the existing agent based
frameworks for DARM task are only prototype model and lacks the appropriate underlying
execution environment, scalability, privacy preserving techniques, global knowledge generation
and implementation using a real datasets. In this study, a scalable MAS, called Agent enabled
Mining of Globally Strong Association Rules (AeMGSAR), is presented based on the serial
itinerary of the mobile agents. In this system the overall task of mining the globally strong
association rules is divided into subtasks which are handled by various mobile as well as
stationary agents. An AEE is also designed for the implementation and performance study of
AeMGSAR system. Serial itinerary used for mobile agent migration increases the overall cost of
DARM task so a parallel computing model could be designed where clones of each mobile agent
is dispatched in parallel to all distributed sites.

REFERENCES
[1] U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth & R. Uthurusamy, (1996) Advances in Knowledge
Discovery and Data Mining, AAAI/MIT Press.
[2] J. Han & M. Kamber, (2006) Data Mining: Concepts and Techniques, 2nd ed. Morgan Kaufmann.
[3] G. S. Bhamra, R. B. Patel & A. K. Verma, (2014) Intelligent Software Agent Technology: An
Overview, International Journal of Computer Applications (IJCA), vol. 89, no. 2, pp. 1931.
[4] R. Agrawal, T. Imielinski & A. Swami, (1993) Mining association rules between sets of items in large
databases, in Proceedings of the ACM-SIGMOD International Conference of Management of Data,
pp. 207216.
[5] R. Agrawal & J. C. Shafer, (1996) Parallel mining of association rules, IEEE Transaction on
Knowledge and Data Engineering, vol. 8, no. 6, pp. 962969.
[6] M. J. Zaki, (1999) Parallel and distributed association mining: a survey, IEEE Concurrency, vol. 7,
no. 4, pp. 1425.
[7] X. Wu & S. Zhang, (2003) Synthesizing high-frequency rules from different data sources, IEEE
Transactions on Knowledge and Data Engineering, vol. 15, no. 2, pp. 353367.
95

International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.3, June 2015
[8] Y.-L. Wang, Z.-Z. Li & H.-P. Zhu, (2003) Mobile agent based distributed and incremental techniques
for association rules, in Proceedings of the International Conference on Machine Learning and
Cybernetics(ICMLC 2003), vol. 1, pp. 266271.
[9] C. Aflori & F. Leon, (2004) Efficient Distributed Data Mining using Intelligent Agents, in
Proceedings of the 8th International Symposium on Automatic Control and Computer Science, pp. 1
6.
[10] U. P. Kulkarni, P. D. Desai, T. Ahmed, J. V. Vadavi & A. R. Yardi, (2007) Mobile Agent Based
Distributed Data Mining, in Proceedings of the International Conference on Computational
Intelligence and Multimedia Applications (ICCIMA 2007), IEEE Computer Society, pp. 1824.
[11] G. Hu & S. Ding, (2009a) An Agent-Based Framework for Association Rules Mining of Distributed
Data, in Software Engineering Research, Management and Applications 2009, ser. Studies in
Computational Intelligence, R. Lee and N. Ishii, Eds. Springer Berlin - Heidelberg, vol. 253, pp. 13
26.
[12] G. Hu & S. Ding, (2009b) Mining of Association Rules from Distributed Data using Mobile
Agents, in Proceedings of the International Conference on e-Business(ICE-B 2009), pp. 2126.
[13] A. O. Ogunde, O. Folorunso, A. S. Sodiya, J. A. Oguntuase & G. O. Ogunleye, (2011) Improved
cost models for agent based association rule mining in distributed databases, Anale SEria
Informatica,
vol.
9,
no.
1,
pp.
231250,
Available:
http://analeinformatica.tibiscus.ro/download/lucrari/9-1-20-Ogunde.pdf
[14] G. S. Bhamra, A. K. Verma, & R. B. Patel, (2015) Agent Based Frameworks for Distributed
Association Rule Mining: An Analysis, International Journal in Foundations of Computer Science &
Technology (IJFCST), vol. 5, no. 1, pp. 11-22.
[15] R. Agrawal & R. Srikant, (1994) Fast Algorithms for Mining Association Rules in Large Databases,
in Proceedings of the 20th International Conference on Very Large Data Bases (VLDB94). Morgan
Kaufmann Publishers Inc., pp. 487499.
[16] G. S. Bhamra, A. K. Verma, & R. B. Patel, (2011) TDSGenerator: A Tool for generating synthetic
Transactional Datasets for Association Rules Mining, International Journal of Computer Science
Issues (IJCSI), vol. 8, no. 2, pp. 184-188.
[17] G. S. Bhamra, A. K. Verma, & R. B. Patel, (2014) An Investigation into the Central Data Warehouse
based Association Rule Mining, International Journal of Computer Applications (IJCA), vol. 96, no.
10, pp. 1-12.
AUTHORS
Gurpreet Singh Bhamra is currently working as Assistant Professor at
Department of Computer Science and Engineering, M. M. University, Mullana,
Haryana. He received his B.Sc. (Computer Sc.) and MCA from Kurukshetra
University, Kurukshetra in 1995 and 1998, respectively. He is pursuing Ph.D.
from Department of Computer Science and Engineering, Thapar University,
Patiala, Punjab. He is in teaching since 1998. He h as published 13 research
papers in International/National Journals and International Conferences. He has
received Best Paper Award for An Agent enriched Distributed Data Mining on
Heterogeneous Networks, in Challenges & Opportunities in Information
Technology (COIT-2008). He is a Life Member of Computer Society of India. His research interests are in
Distributed Computing, Distributed Data Mining, Mobile Agents and Bio-informatics.
Dr. Anil Kumar Verma is currently working as Associate Professor at
Department of Computer Science & Engineering, Thapar University, Patiala. He
received his B.S., M.S. and Ph.D. in 1991, 2001 and 2008 respectively, majoring in
Computer science and engineering. He has worked as Lecturer at M.M.M.
Engineering College, Gorakhpur from 1991 to 1996. He joined Thapar Institute of
Engineering & Technology in 1996 as a Systems Analyst in the Computer Centre
and is presently associated with the same Institute. He has been a visiting faculty to
many institutions. He has published over 100 papers in referred journals and
conferences (India and Abroad). He is a MISCI (Turkey), LMCSI (Mumbai),
GMAIMA (New Delhi). He is a certified software quality auditor by MoCIT,

96

International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.3, June 2015
Govt. of India. His research interests include wireless networks, routing algorithms and securing ad hoc
networks and data mining.
Dr. Ram Bahadur Patel is currently working as Professor and Head at Department
of Computer Science & Engineering, Chandigarh College of Engineering &
Technology, Chandigarh. He received PhD from IIT Roorkee in Computer Science &
Engineering, PDF from Highest Institute of Education, Science & Technology
(HIEST), Athens, Greece, MS (Software Systems) from BITS Pilani and B. E. in
Computer Engineering from M. M. M. Engineering College, Gorakhpur, UP. Dr.
Patel is in teaching and research since 1991. He has supervised 36 M. Tech, 7 M.
Phil. and 8 PhD Thesis. He is currently supervising 6 PhD students. He has published
130 research papers in International/National Journals and Refereed International
Conferences. He has written 7 text books for engineering courses. He is member of
ISTE (New Delhi), IEEE (USA). He is a member of various International Technical Committees and
participating frequently in International Technical Committees in India and abroad. His current research
interests are in Mobile & Distributed Computing, Mobile Agent Security and Fault Tolerance and Sensor
Network.

APPENDIX A SYNTHETIC DATASETS


A.1 BDS3500T10I.txt and corresponding TDS3500T10I.txt( DB1 ) at site S1
These synthetic binary and transactional datasets of 3500 records are created by TDSG tool at
site S1 . In the binary version each column head represents the item number and each row
represents a transaction where integer 1 is used for a purchased item and 0 is used if it is nor
purchased. The corresponding transactional version has a Transaction It (TID) for each
transaction and Itemset is the set of all the purchased items for that particular transaction.

97

International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.3, June 2015

A.2 BDS3850T10I.txt and corresponding TDS3850T10I.txt( DB2 ) at site S2

These synthetic binary and transactional datasets of 3850 records are created by TDSG tool at site
S2 .

A.3 BDS3900T10I.txt and corresponding TDS3900T10I.txt( DB3 ) at site S3

These synthetic binary and transactional datasets of 3900 records are created by TDSG tool at site
S3 .

98

International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.3, June 2015

APPENDIX
SYSTEM

BRESULTANT

KNOWLEDGE

OF

AEMGSAR

B.1 LFIk(1) and LFISC


at site S1
k (1)

List of frequent k-itemset, i.e., LFIk(1) is represented by column L and column SC shows the support
count of the corresponding frequent k-itemset, i.e., LFISC
k (1) at site S1 . These frequent itemsets and
their support counts are obtained by processing the synthetic dataset ( DB1 ) as shown in Appendix
A.1.

B.2 LFIk(2) and LFISC


at site S2
k (2)

These frequent itemsets and their support counts are obtained by processing the synthetic dataset
( DB2 ) as shown in Appendix A.2.

99

International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.3, June 2015

B.3 LFIk(3) and LFISC


at site S3
k (3)

These frequent itemsets and their support counts are obtained by processing the synthetic dataset
( DB3 ) as shown in Appendix A.3.

100

International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.3, June 2015

B.4 L1LSAR at site S1

Column L represents frequent k-itemset and column AR(support, confidence) shows the list of
locally strong association rules, i.e., L1LSAR at site S1 . Each strong rule has its associated support
and confidence factor. The minimum threshold is taken as 20% and minimum threshold
confidence as 50% for generating the strong rules by making use of the data as shown in
Appendix B.1.

101

International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.3, June 2015

B.5 LLSAR
at site S2
2

Column L represents frequent k-itemset and column AR(support, confidence) shows the list of
locally strong association rules, i.e., LLSAR
at site S2 . Each strong rule has its associated support
2
and confidence factor. The minimum threshold is taken as 20% and minimum threshold
confidence as 50% for generating the strong rules by making use of the data as shown in
Appendix B.2.

102

International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.3, June 2015

B.6 LLSAR
at site S3
3

Column L represents frequent k-itemset and column AR(support, confidence) shows the list of
locally strong association rules, i.e., LLSAR
at site S3 . Each strong rule has its associated support
3
and confidence factor. The minimum threshold is taken as 20% and minimum threshold
confidence as 50% for generating the strong rules by making use of the data as shown in
Appendix B.3.

103

International Journal on Computational Sciences & Applications (IJCSA) Vol.5, No.3, June 2015

B.7 LGSAR
CENTRAL at site SCENTRAL

Column L represents globally frequent k-itemset, i.e., itemsets which are locally strong at all the
distributed sites and column AR(support, confidence) shows the list of globally strong
association rules, i.e., LGSAR
CENTRAL for such itemsets. Each globally strong rule has its associated
support and confidence factor. The minimum threshold is taken as 20% and minimum threshold
confidence as 50%. Site represents the IP address of the site where the rule is locally strong. IP
address 192.168.46.212 is used for site S1 , 192.168.46.189 for site S2 and address
192.168.46.213 is used for site S3 .

104

You might also like