Data Mining Definition: - Finding Hidden Information in A Database - Similar Terms

Data Mining Definition
Finding hidden information in a database

Similar terms
Exploratory data analysis
Data driven discovery
Deductive learning
Basic Data Mining Tasks
Classification maps data into predefined groups or
classes
Supervised learning
Pattern recognition
Prediction
Regression is used to map a data item to a real valued
prediction variable.
Clustering groups similar data together into clusters.
Unsupervised learning
Segmentation
Partitioning
Basic Data Mining Tasks (contd)
Summarization maps data into subsets with
associated simple descriptions.
Characterization
Generalization
Link Analysis uncovers relationships among data.
Affinity Analysis
Association Rules
Sequential Analysis determines sequential patterns.
MULTI-OBJECTIVE BASED IMPROVED ARTIFICIAL BEE COLONY
ALGORITHM TO MINE THE BUSINESS INTELLIGENCE RULES
Proposed Work:
To propose an multi objective optimization
algorithm to mine the intelligence rules.
Abstract
Mining of interesting relation with the objective of improving
the business plays a vital role in critical decisions in the
current scenario.
In this work, an algorithm to mine the intelligence rules from
the input data using the multi-objective optimization
algorithm is proposed.
The objectives used are profit, quality and the proposed
reliability.
Subsequently R-tree is modelled and is solved using the
improved ABC algorithm.
Improvement to the ABC algorithm is brought about in the
initial population and in the scout bee phase.
The performance is evaluated in terms of the number of rules
mined, computation time and memory usage by varying the
threshold and support value respectively.
Introduction
One of the outstanding applications of the DM is the business
intelligence and risk management as DM has a direct and
significant say on the decision-making process
The accomplishment of a BI system is not a traditional project
like the operational or transactional mechanism, indeed it
deals with multifaceted task necessitated with the suitable
infrastructure and resources.
To solve this the multi-objective optimization has emerged as
a well-acknowledged investigation
The classification rule mining may be designed as a multi-
objective optimization issue with diverse commensurable and
often contradictory objectives
In this paper, a technique to mine business intelligent rules is
proposed with the use of multi-objective improved ABC based
rule mining.
Introduction (Ctd)
Here, initially the influential patterns are found out from the
input database with the use of objectives of profit, quality and
proposed objective of reliability.
Subsequently, R-Tree is generated and then the patterns are
solved using the improved Artificial Bee Colony (ABC)
algorithm.
The rules obtained from the Improved ABC algorithm are
useful for developing business strategies
Review of Literature
C.F. Cheung and F.L. Li [20] were instrumental in effectively
launching the quantitative correlation coefficient mining
technique for business intelligence in small and medium
enterprises of the trading business.
The novel approach was competent to reveal the concealed
models of the sales and the market.
Therefore, a model business intelligence system (BIS) termed
as the correlation coefficient sales data mining system
(CCSDMS) was brought to limelight and efficiently
experimented in a specified reference site.
Review of Literature (ctd)
Hung-Pin Chiu et al. [21] spotlighted the novel clustering
based fuzzy association rule mining in the electronic
commerce (EC) scenario
A cluster-based fuzzy association rules (CBFAR) mining
architecture was brought in to concurrently tackle the issues
with three major objectives.
(a) An effective fuzzy association rule miner based on cluster-
based fuzzy-sets tables was spearheaded to locate all the
gigantic fuzzy itemsets.
(b) The novel approach needed minimal contrast to create
colossal itemsets.
(c) The fuzzy rule mining technique was employed to evaluate
the confidence values for locating the correlations between
transaction database and the browsing information database.
Alatas and Erhan Akin [22] launched the Multi-objective rule
mining employing a messy particle swarm optimization
technique.
A multi-objective messy particle swarm optimization (PSO)
approach was brought in as the investigation stratagem to
extract the classification rules within the datasets.
The optimized PSO employed the similarity measure for
neighborhood and far-neighborhood search to park the global
best particles located in multi-objective method.
For the bi-objective issue of rule mining of superior
precision/lucidity, the multi-objective approach was designed
to permit the PSO technique to return an approximation to
the higher precision/lucidity bounds, offering solutions
The captivating outcomes effectively exhibited the superlative
efficiency of the novel technique.
Jayakrushna Sahoo et al. [23] developed investigative
community when they proficiently proposed the mining the
association rules from the high utility itemsets.
In their document, they dealt with the issue of locating the
association rules employing the utility-confidence structure,
which represented a simplification of the amount-confidence
measure.
They were capable of creating the utility based non-
redundant association rules and techniques for restructuring
the entire association rules.
The fascinating upshots emerged as the ideal credentials
authenticating the superlative efficiency and competence of
the HUCI-Miner technique vis--vis popular peer methods.
Further, the upbeat test outcomes illustrated the superior
quality in the compacted illustration of the complete rule set
Thi-Thiet Pham et al. [24] introduced an effective technique
for extracting the relevant sequential rules from an attributed
prefix-tree in two phases.
In the former phase, it was able to construct a prefix-tree
which stockpiled the entire sequential patterns from a
specified sequence database.
In the later phase, it succeeded in extorting the relevant
sequential rules from the corresponding s prefix-tree.
Further, an excellent pruning system was instrumental in
scaling down the search space and the runtime in the mining
tasks.
The eye-catching test outcomes illustrated the success of the
innovative technique by authenticating its superlative
excellence over the modern approaches in extracting the
relevant sequential rules.
Wingyan Chung et al. [25] discussed the Discovering of
business intelligence from online product appraisals.
In their document, they designed a novel class of BI systems
taking cues from the rough set theory, inductive rule learning,
and data retrieval techniques.
The outcomes illustrated the novel technique was able to
usher in superior precision and exposure related to rule
quality, and brought in exciting and enlightening rules with
superior support and confidence values.
The investigations resulted in significant impacts for the
market sentiment appraisal and e-commerce reputation
administration.
Block diagram of proposed method
Objective Measures
Data Base
Improved ABC
Algorithm
Objective measures
The objective measures taken here are profit, quality index
and reliability
PROFIT
In-order to define profit, it is essential to define quantity first.
The quantity of an event set in a space Q can be described as
the average number of events in a unit of Q. It can be defined
as the total number of events of the event set divided by the
volume of the space Q. For a given space Q, the quantity of an
event set X located in each unit of Q, can be defined by,
| {x | x X ^ x insideQ } |
Quantity ( X , Q)
|Q|
Objective measures (Ctd)
For two event sets X and X and a given follow () predicate
defined by a neighbourhood function N(x) for any event x X,
the profit of X and X can be represented by,
avg xX ( densityX', N ( x )))

Pr ofit( X X ' )
|density( X ',Q )
Quality Index:
This is the measure which calculates the quality if the sequence of
pattern with reference to the profit factor.
In a sequence of L event types, the subsequence G is considered
to be significant only if the profit of G and its tail event set are
significant.
The quality index of a L-sequence is defined as:
While L=2:Quality index(G) = Profit(G[1]. G[2]. )
While L>=3: Quality index(G)= minimum(Quality index(G[1:L-1],
density ratio(tail event set (G[1:L-1) G[L].
Reliability
The reliability(validity) measure is defined in terms of profit
and quality index.
The reliability is defined as:

where is the profit and is the quality index.
Using these objectives of profit, quality index and reliability,
the significance patterns are found out from the input
database.
This ensures that the time of computation and complexity is
reduced.
Construction of R-TREE Structure
After reducing the patterns by the use of objective measures
of profit, quality index and reliability, R-Tree is built based on
the remaining patterns.
Let H be a set of event types of the form H {X1, X 2, X3,..., X L} where
L is the number of event types. Each event type consists of set
of events denoted as X {x1, x2, x3,..., xm} , where m denotes the
number of events.
An event type may appear multiple times in a sequential
pattern.
Due to the repetition of event types in the sequential
patterns, the length of the sequential pattern is bounded by
m, since m is the maximum number of events.
Generation of Event R-Tree
Event R- tree is a spatial access method defined as a tree that
consists of all event types as a child node of root node.
The set of events within the event types is given as its sub
node.
Then, the events occurring within the adaptive neighborhood
(minimum bounding rectangle) is represented as a leaf node.
This event tree can be utilized for computing the follow ratio
without the need of scanning the database multiple times.
Using the event R-tree, the significance measure can be easily
calculated without scanning the database so that the
computation time will get reduced.
Before the R-tree generation, neighbourhood has to be
calculated.
Steps to calculate the neighbourhood
Consider the region with the user given space and time limits.
Note the set of all events which occurs within this region.
Keep the time as constant and vary the space to some extent.
Calculate all the set of events for a new region.
Set a threshold value
Continue the process by varying the space limits.
Apply the spatial distance to the left and right directions in
the space, and temporal value can be given only in upward
direction (unidirectional property of time)
If the computed value < MBR, extend the unit in the
neighborhood in both directions of the space.
While satisfying the conditions such as,
The minimum bounding condition is satisfied
It reaches defined number of iterations
Terminate the process.
Operations of R-Tree
Some of the operations in R-tree are given below:

Search Operation
Insertion Operation
Deletion Operation
Splitting the nodes
RULE OPTIMIZATION USING ABC ALGORITHM
After the construction of the event R-Tree, there will be some
redundant rules in the tree. Those rules are optimized using
the abc algorithm.
ABC algorithm is an optimization tool, whose search is based
on population.
It simulates the behaviour of honey bee on finding nectar and
sharing the information of food sources to the bees in the
hive.
Agents in ABC:
The Employed Bee
The Onlooker Bee
The Scout
RULE OPTIMIZATION USING ABC ALGORITHM (ctd..)
The Employed Bee:
It randomly selects the food source(position) and determines
its nectar amount(fitness value).
The Onlooker Bee:
It gets the information of food sources from the employed
bees in the hive (dancing area) and select one of the food
source to gather the nectar.
The Scout:
It is responsible for finding new food source when the existing
food sources are abandoned(position that could be improved
after the predetermined number of cycles) .
The position update is made with the use of fractional
calculus with respect to time.
Greedy selection method is employed to select the food
source such that the nectar is high.
Flow diagram of the ABC
Initialize the population. Here the input patterns form the initial
population.
Evaluate the fitness.
For every cycle (till the maximum number of cycles is reached),
repeat the below steps (d-l).
Form the new population ( position) for the employed bee using
the formula: b j ,k a j , k j , k ( y j , k yi , k ), i j
Where, b j , k is the new position,a j ,k is the old position, and . is a
rFndom number in the range [1, 1] which controls the
production of a neighbor food source position around .
Calculate the fitness for the new solution for employee bee using
MSE.
Apply greedy search between and for employee bee.
Calculate the probability value for the solution for onlooker bee
using: : , where is the tness value of the solution evaluated by

Data Mining Definition: - Finding Hidden Information in A Database - Similar Terms

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Mining Definition: - Finding Hidden Information in A Database - Similar Terms

Uploaded by

Copyright:

Available Formats

Data Mining Definition

Finding hidden information in a database

avg xX ( densityX', N ( x )))

Some of the operations in R-tree are given below:

You might also like