Prof Shaikh and Anis 2012 Pies PDF

Determine Data Mining Using Dynamic Data Base.
Mr. ABDUL JABBAR SHAIKH AZAD

A
, Mr. ANIS MAYODDIN KURESHI
B
,
Prof. DEELIP D. JOSHI
C
, Dr. RAMESH R. MANZA
D
& Dr. ASHOK NARAYAN PATIL
E
.

A & B: Department of Computer Science, P.S.G.V.P.s Mandals Arts, Commerce and Science College
Shahada,
C: Asst. Professor, Department of Computer Science, B.P. Arts, S.M.A Science And K.K.C Commerce
College, Chalisgaon.
D: Asst. Professor, Department of Computer Science, Dr. Babasaheb Ambedkar Marathwada University,
Aurangabad.
E: Principal, Vasantrao Naik College, Shahada.

Email abdulteli786@gmail.com
a
, manzaramesh@gmail.com
c
.

Abstract:
In this paper, we have taken the critical review of some papers on Dynamic database. The goal is
to make best data cleaning, data selection and data transformation. The totally Data Mining process is a
step in knowledge Discovery Process requires for the patterns recognition and model from the database.
Some cases the problem is known, correct data is available as well and problem might occurs duplicate,
missing and incorrect file.
Some researchers have used Data Mining for Dynamic Data base, such as Hebah H. O.
Nasereddin, Vijay Raghavan, Alaaeldin Hafez, Fernando Crespo, Richard Weber Yi Wang ,Shi-Xia Liu
and Jianhua Feng. They had used various methods like Fuzzy clustering, Customer segmentation, Fuzzy
clustering, Customer segmentation, data mining process in dynamic data mining process. Different
methods have been programmed and for accuracy in real data managing from the data base system. They
have used for those technique for research fields that are due to the expansion of both computer
hardware and software. Data mining is the way help organization make full use of the data stored in their
database used to different decision making this is true for all fields.

Keywords: Customer segmentation, Fuzzy clustering, Customer segmentation, Fuzzy clustering,
Knowledge Discovery Process (KDP) etc.

INTRODUCTION
Data mining is the task of discovering
interesting and hidden patterns from large amounts
of data where the data can be stored in databases,
data warehouses, OLAP (On Line Analytical
Process) or other repository information. It is also
defined as Knowledge Discovery in Databases
(KDD). Data mining is the main process of
discovering meaningful pattern and relationships
that lie hidden within very large databases. Also
defines data mining as the analysis of observational
data sets to finds unsuspected relationship and to
summarize the data in novel ways that are both
understandable and useful to data owner. Data
mining is a part of a process called KDD. This
process consists basically of steps that are
performed before carrying out data mining, such as
data selection, data cleaning, pre-processing and
data transformation.
The major components of data mining as
following : Database, data warehouse or other
information repository; a server which is
responsible for the fetching the relevant data based
on the users data mining request, knowledge base
which is used to guide to search the information
from the database. Data mining engine consists of a
set of functional modules, Pat tern evolution
module which interacts with the data mining
modules so as to focus the search towards
interesting patterns and graphical user interface
which communicate users and the data mining
system, allowing the user interaction with system.
Data mining technique can discover information
that many traditional business analysis and
statistical techniques fail to deliver. Management
information system should provide advanced
capabilities that give the user the power to ask
more sophisticated and pertinent question. It
empowers the right people by providing the
specific information they need.

Hebah H. O. Nasereddin, Vijay Raghavan
they propose an approach that dynamically updates
knowledge obtained from the previous data mining
process. Transactions over a long duration are
divided into a set of consecutive episodes. In our
approach, information gained during the current
episode depends on the current set of transitions
and the discovered information from the database
the last episode. They suggested discovering
current data mining rules that have been discovered
in dynamic data mining that discover dependencies
among values of an attribute is an important
research area. The problem of association mining
also referred to as the market basket problem is
defined as follows.
Let = {i
1
,i
2
, . . . , i
n
} be a set of items and S =
{s
1
, s
2
, . . ., s
m
} be a set of transactions, where each
transaction s
i
e S is a set of items that is s
i
_ I. An
association rule denoted by X Y, X,Y c

I, and X
Y = u, describes the existence of a relationship
between the two item sets X and Y. several
measures have been introduced to define the
strength of the relationship between item sets X
and Y such as SUPPORT , CONFIDENCE and
INTEREST [1,2,5,7]. The definitions of the
measures, from a probabilities view point are given
below.
I.
) , ( ) X ( Y X P Y SUPPORT =
, or the percentage of
transactions in the database that contain both X
and Y.
II.
) X ( P / ) Y , X ( P ) Y X ( CONFIDENCE =
, or
the percentage of transactions containing Y in
those transactions containing X.
III.
) Y)/P(X)P(Y P(X, Y) INTEREST(X =

Represents a test of statistical independence.
SUPPORT for an item set S is calculated as
F
S F
S SUPPORT
) (
) ( =

Where F(S)

is the number of transactions having S,
and F is the total number of transactions.
For a minimum SUPPORT value MINSUP, S
is a large (or frequent) item set if
SUPPORT(S) > MINSUP, or F(S) > F*MINSUP.
Suppose we have divided the transaction set T into
two sub sets T
1
and T
2
, corresponding to two
consecutive time intervals, Where F
1
is the number
of transactions in T
1
and F
2
is the number of
transactions in T
2
, (F=F
1
+ F
2
) and F
1
is the number
of transactions having S in T
1
and F
2
(S) is the
number of transactions having S in T
2,
(F(S) =F
1
(S)
+F
2
(S)). By Calculating the SUPPORT of S, in
each of the two subsets, we get
1
1
1
F
) S ( F
) S ( SUPPORT =
and
2
2
2
F
) S ( F
) S ( SUPPORT =

S is a large itemset if
MINSUP
F F
) S ( F ) S ( F
2 1
2 1
>
+
+
, or
MINSUP * ) F F ( ) S ( F ) S ( F
2 1 2 1
+ > +

In order to find out if S is a large itemset or
not, we consider four cases,
- S is a large itemset in T
1
and also a large
itemset in T
2
,
i.e.,
MINSUP * F ) S ( F
1 1
>
and
MINSUP * F ) S ( F
2 2
>
.
- S is a large itemset in T
1
but a small
itemset in T
2
, i.e.,
MINSUP * F ) S ( F
1 1
>
and
MINSUP * F ) S ( F
2 2
<
.
- S is a small itemset in T
1
but a large
itemset in T
2
, i.e.,
MINSUP * F ) S ( F
1 1
<

and
sup min * F ) S ( F
2 2
>
.
- S is a small itemset in T
1
and also a small
itemset in T
2
, i.e.,
MINSUP * F ) S ( F
1 1
<

and F
2
(S)< F
2
*MINSUP.
In the first and fourth cases, S is a large itemset
and a small itemset in transaction set T,
respectively, while in the second and third cases, it
is not clear to determine if S is a small itemset or a
large itemset. Formally speaking, let SUPPORT(S)
= MINSUP + o, where o > 0 if S is a large itemset,
and o < 0 if S is a small itemset. The above four
cases have the following characteristics,
- o
1
> 0 and o
2
> 0
- o
1
> 0 and o
2
< 0
- o
1
< 0 and o
2
> 0
- o
1
< 0 and o
2
< 0
S is a large itemset if
MINSUP
F F
) MINSUP ( * F ) MINSUP ( * F
2 1
2 2 1 1
>
+
+ + + o o
, or
) F F ( * MINSUP ) MINSUP ( * F ) MINSUP ( * F
2 1 2 2 1 1
+ > + + + o o

This can be written as 0 * F * F
2 2 1 1
> + o o
Generally, let the transaction set T be divided into n
transaction subsets T
i
's, 1 s i s n. S is a large
itemset if
0 * F
i i
n
1 i
>
=
o
, where F
i
is the number
of transactions in T
i
and o
i
= SUPPORT
i
(S) -
MINSUP, 1 s i s n. -MINSUP s o
i
s 1-MINSUP,
1 s i s n.
For those cases where
0 * F
i i
n
1 i
<
=
o
, there are two
options, either discard S as a large itemset (a small
itemset with no history record maintained), or keep
it for future calculations (a small itemset with
history record maintained). In this case, we are not
going to report it as a large itemset, but its
i i
n
1 i
* F o
=
formula will be maintained and
checked through the future intervals. In this paper,
they have introduced a Dynamic Data Mining
approach. The proposed approach performs
periodically the data mining process on data
updates during a current episode and uses that
knowledge captured in the previous episode to
produce data mining rules.
Fernando Crespo and Richard Weber they
show dynamic data mining are increasingly
attracting attention from the respective research
community. On the other hand, user of installed
data mining system are also interested in the related
technique and will be even more since most of
these installation will need to be update in the
future. Data mining is part of an interactive process
called KDD (Knowledge Discovery in Database).
This process consists basically of steps that are
performed before doing data mining such as,
selection, pre-processing, transformation of data. If
future behaviour is very similar to past behaviour
using the initial data mining system could be
jostled. Here is where dynamic data mining comes
in, a new research area that is concerned results. It
becomes the user neglects changes in the
environment and keeps on applying the initials
system without any updating. Every certain period
which depends on the particular application a new
system is developed using all the available data.
Based on the initial system and new data an
update of the classier is performed. It does not
require changes in subsequent processes, such as
design of marketing campaigns for customer
segments. Its disadvantages are that current
tendencies could not be detected.
The recent developments of Dynamic data
mining are shows some area of data mining various
methods have been developed in order to find
usefull information in a set of data. Among the
most important ones are decision trees, neural
networks, association rules and clustering methods.
For each of the data above mentioned data mining
methods tools updating have different aspects and
some updating approaches have been proposed to
do better. They used another method is that
Dynamic data mining using fuzzy clustering they
shows a methodology for dynamic data mining
using fuzzy clustering that assigns static objects to
dynamic classes. That is classes with the changing
structure over time. It starts with a given classier
and set of new objects that is that objects that
appeared after the creation of a classier and its
update is called a cycle. The length of such cycle
depends on the particular application we may want
to update buying behaviour of customers in a
supermarket once a year whereas a system for
dynamic machine monitoring should be updated
every 5 minutes.

Yi Wang ,Shi-Xia Liu and his friends they
shows in his paper Mining naturally smooth
evolution of clusters from Dynamic Data paper
many clustered algorithms have been proposed to
partition a set of static data points groups, they
consider an evolutionary clustering problems where
the input data points may move, disappeared and
emerge. These changes should be result in a
smooth evolution of the clusters. Mining this
naturally smooth evolution is valuable for
providing an aggregated view of the numerous
individual behaviours. They solve this novel and
generalized from of clustering problem by
converting it into a Bayesian learning problem.
Analogous to that the EM clustering algorithm
converts the problem of clustering a static data, say
X, set into learning a Gaussian mixture model X.
By utilizing characteristics of evolutionary
clustering problems, they derive a new
unsupervised learning algorithm which is useful
most efficient than the algorithms used to learn
traditional variable duration HSMMs. Because the
HSMM models the probabilistic relationship
between the dynamic data set corresponding
evolving clusters. They can interpret the learned
parameters as the evolving clusters intuitively
using the Viterbi filtering techniques. Because
learning as HSMM is in fact learning an optional
Viterbi filter. They evaluate the effectiveness of
this method experiments on both synthetic data and
real data.
They show in this paper coherence over 1
t T by modelling the underlying stochastic
process that generates X by a hidden semi Markov
model (HSMM). Analogous to that the EM
clustering algorithm clusters static data points by
learning a Gaussian mixture model, his method
mine the evolution of the clusters from dynamic
data points by learning a hidden semi-Markov
model (HSMM). The model output probability
density function (pdf) on each hidden state of the
HSMM by a Gaussian mixture model, which
describes the clusters of an X
t
X. By utilizing
characterizing of the evaluator clustering problem,
they derive a new unsupervised learning algorithm
which is much more useful for the describing all
the data from the dynamic databases.

CONCLUSION
In this paper we get some critical review
on Dynamic database. Above researcher has done
data mining Database using Dynamic database.
They certainly contributed a lot to the development
of data base dynamically. But there is the need to
overcome the demerits of above researches on
Dynamic database as per the analysis it is found
that the paper of Hebah H. O. Nasereddin, Vijay
Raghavan they propose an approach that
dynamically updates knowledge obtained from the
previous data mining process is good because of
KDD technique. But the limitation of their study is
define more suitable useful for data base
dynamically. As future work they will tested with
the different datasets that cover large sputum of
different data mining application. Such as web site
access analysis for improvements in e-commerce
advertising, fraud detection, screening and
investigation, retail site product analysis and
customer segmentation. After the analysis of
Fernando Crespo and Richard Weber we come to
the conclusion that presented a methodology for
dynamic data mining based on fuzzy clustering,
which allows updates of the underlying classier.
We help some methods like fuzzy or possibility
clustering technique can be used as well. When we
consider the work of Yi Wang, Shi-Xia Liu and his
friends they proposed to solve a novel and
interesting clustering problem. This problem is
totally different from the dynamic clustering
problems under studying. We are trying to solve
problem by converting in to a Bayesian learning
problem. We also try to describe process
accompanying visualization methods to present the
mined smooth evolution intuitively and
comprehensively.

REFERENCES
[1] R. Agrawal, T. Imilienski, and A. Swami, "Mining
Association Rules between Sets of Items in Large
Databases," Proc. of the ACM SIGMOD Int'l Conf. On
Management of data, May 1993.
[2] R. Agrawal, and R. Srikant, "Fast Algorithms for Mining
Association Rules," Proc. Of the 20 th VLDB
Conference, Santiago, Chile, 1994.
[3] R. Agrawal, J. Shafer, "Parallel Mining of Association
Rules," IEEE Transactions on Knowledge and Data
Engineering, Vol. 8, No. 6, Dec. 1996.
[4] C. Agrawal, and P. Yu, "Mining Large Itemsets for
Association Rules," Bulletin of the IEEE Computer
Society Technical Committee on Data Engineering, 1997.
[5] S. Brin, R. Motwani, et al, "Dynamic Itemset Counting
and Implication Rules for Market Basket Data,"
SIGMOD Record (SCM Special Interset Group on
Management of Data), 26,2, 1997.
[6] S. Chaudhuri, "Data Mining and Database Systems:
Where is the Intersection," Bulletin of the IEEE
Computer Society Technical Committee on Data
Engineering, 1997.
[7] M. Chen, J. Han, and P. Yu, "Data Mining: An Overview
from a Database Prospective", IEEE Trans. Knowledge
and Data Engineering, 8, 1996.
[8] M. Chen, J. Park, and P. YU, "Data Mining for Path
Traversal Patterns in a Web Environment", Proc. 16
th

Untl. Conf. Distributed Computing Systems, May 1996.
[9] D. Cheung, J. Han, et al, " Maintenance of Discovered
Association Rules in Large Databases: An Incremental
Updating Technique", In Proc. 12
th
Intl. Conf. On Data
Engineering, New Orleans, Louisiana, 1996.
[10] U. Fayyed, G. Shapiro, et al, "Advances in Knowledge
Discovery and Data Mining", AAAI/MIT Press, 1996.
[11] A. Hafez, J. Deogun, and V. Raghavan ,"The Item-Set
Tree: A Data Structure for Data Mining", DaWaK' 99
Conference, Firenze, Italy, Aug. 1999.
[12] C. Kurzke, M. Galle, and M. Bathelt, "WebAssist: a user
profile specific information retrieval assistant," Seventh
International World Wide Web Conference, Brisbone,
Australia, April 1998.
[13] M. Langheinrichl, A. Nakamura, et al ,"Un-intrusive
Customization Techniques for Web Advertising," The
Eighth International World Wide Web Conference,
Toronto, Canada, May 1999
[14] H. Mannila, H. Toivonen, and A. Verkamo, "Efficient
Algorithms for Discovering Association Rules," AAAI
Workshop on Knowledge Discovery in databases (KDD-
94) , July 1994.
[15] M. Perkowitz and O. Etzioni, "Adaptive Sites:
Automatically Learning from User Access Patterns", In
Proc. 6
th
Int. World Wide Web Conf., santa Clara,
California, April 1997.
[16] P. Pitkow, "In Search of Reliable Usage Data on the
WWW", In Proc. 6
th
Int. World Wide Web Conf., santa
Clara, California, April 1997.
[17] G. Rossi, D. Schwabe, and F. Lyardet, "Improving Web
Information Systems with Navigational Patterns," The
Eighth International World Wide Web Conference,
Toronto, Canada, May 1999.
[18] N. Serbedzija, "The Web Supercomputing Environment,"
Seventh International World Wide Web Conference,
Brisbone, Australia, April 1998.
[19] T. Sullivan, "Reading Reader Reaction: A Proposal for
Inferential Analysis of Web Server Log Files", In Proc.
3
rd
Conf. Human Factors & The Web, Denver, Colorado,
June 1997.
[20] C. Wills, and M. Mikhailov, "Towards a Better
Understanding of Web Resources and Server Responses
for Improved Caching," The Eighth International World
Wide Web Conference, Toronto, Canada, May 1999.
[21] M. Zaki, S. Parthasarathy, et al, " New Algorithms for
Fast Discovery of Association Rules," Proc. Of the 3 rd
Int'l Conf. On Knowledge Discovery and data Mining
(KDD-97), AAAI Press, 1997.
[22] C.M. Antunes, A.L. Oliveira, Temporal data mining: an
overview, Workshop on Temporal Data Mining
(KDD2001), San Francisco, September 2001.
[23] M. Bastian, H. Kirschnk, R. Weber, TRIP: automatic trac
state identication and prediction as basis for improved
trac management services, Proc. of the Second Workshop
on Information Technology, Cooperative Research
between Chile and Germany, 1517 January 2001,
Berlin, Germany.
[24] J.C. Bezdek, J. Keller, R. Krishnapuram, N.R. Pal, Fuzzy
Models and Algorithms for Pattern Recognition and
Image Processing, Kluwer, Boston, London, Dordrecht,
1999.
[25] M. Black, R.J. Hickey, Maintaining the performance of a
learned classier under concept drift, Intell. Data Anal. 3
(6) (1999) 453474.
[26] R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classication,
2nd Edition, Wiley, New York, Chichester, 2001.

Prof Shaikh and Anis 2012 Pies PDF

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Prof Shaikh and Anis 2012 Pies PDF

Uploaded by

Copyright:

Available Formats

Determine Data Mining Using Dynamic Data Base.

Mr. ABDUL JABBAR SHAIKH AZAD

You might also like