Professional Documents
Culture Documents
Volume 2, Issue 8, August - 2015. ISSN 2348 4853, Impact Factor 1.317
INTRODUCTION
Data mining fixates on the computerized revelation of new actualities and connections in officially
existing information. The different methods of information mining incorporate affiliation, relapse,
forecast, bunching and characterization. Bunching is the division of information into gatherings of
comparative articles. Cluster is a case of unsupervised learning as it learns by perception [1]. Classify is a
data mining capacity that function that assigns items in a collection to target classifications or classes.
The objective of arrangement is to precisely anticipate the objective class for every case in the
information [2]. This paper manages the utilization of the incorporated bunching order method on a
portion of the free information mining apparatuses accessible nowadays. Devices on which incorporated
bunching arrangement procedure has been executed are KNIME (Konstanz Information Miner), Tanagra
[3], orange and WEKA (Waikato Environment for Knowledge Learning) [4]. The different classifier
utilized for this reason for existing are Nave Bayes, Support Vector machine, K Nearest Neighbor, Zero
Rule, Decision tree and One Rule.
Data mining is the procedure of programmed grouping of cases taking into account information
examples acquired from a dataset. Various calculations have been produced and actualized to
concentrate data and find information designs that may be valuable for choice backing. Information
mining otherwise called KDD (Knowledge Discovery in Databases), information preprocessing, example
acknowledgment, grouping, order are the prevalent advances in information mining. In this paper,
55 | 2015, IJAFRC All Rights Reserved
www.ijafrc.org
www.ijafrc.org
www.ijafrc.org
VI. CLUSTERING
This pattern partitions the records in database into diverse gatherings. In the same gathering, the
gatherings have the comparative properties and the distinctions ought to make as bigger as could be
expected under the circumstances and in the same gathering, the distinctions ought to be as littler as
would be prudent. There is no predefined class in this gathering it goes under the unsupervised learning.
Techniques included in bunch examination are partioning systems, various leveled routines, thickness
Based strategies, network based techniques, model-based routines, grouping high-dimensional
information, requirement based bunching and Outlier investigation.
i. K-means Clustering
ii. Hierarchical clustering
iii.Density based clustering
VII. DATA MINING TOOLS
The data mining tools on which the integrated clustering-classification technique has been implemented.
WEKA tool
WEKA is Waikato Environment for Knowledge Analysis, data mining/machine learning tool developed by
Department of Computer Science, University of Waikato, New Zealand. It is a collection of open source of
many data mining and machine learning algorithms, including pre-processing on data, classification,
regression, clustering, association rule extraction and feature selection which supports .arff (attribute
relation file format) file format.
Tanagra
Tanagra was written an aid to education and research on data mining by Ricco Rakotomalala. The entire
user operation of Tanagra is based on the stream diagram paradigm. Under the stream diagram
paradigm, a user builds a graph specifying the data sources and operations on the data. Paths through the
graph can describe the flow of data through manipulations and analysis. Tanagra simplifies this paradigm
by restricting the graph to be a tree with only one parent to each node and the other one for data source
of an each operation.
KNIME
www.ijafrc.org
www.ijafrc.org
Classifier
NB
C4.5
KNN
SVM
KNIME
64.01%
68.75%
78.21%
73.01%
Classifier
NB
C4.5
KNN
SVM
KNIME
64.22%
68.59%
75.52%
73.81%
www.ijafrc.org
[1]
David Heckerman,"Bayesian Network for Data Mining and Knowledge Discovery", 1997.
[2]
David Hand, Heikki Mannila and Padhraic Smyth,"Principles of Data Mining", the MIT Press, 2001.
[3]
Ritu Chauhan, Harleen Kaur, M.Afshar Alam, "Data Clustering Method for Discovering Clusters in
Spatial Cancer Databases", International Journal of Computer Applications, Volume10, No.6,
November 2010.
[4]
J.R Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufman, 1993.
[5]
[6]
MacQueen J. B., "Some Methods for classification and Analysis of Multivariate Observations",
Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. University of
California Press., pp.281297,1967.
[7]
Lloyd, S. P., "Least square quantization in PCM", IEEE Transactions on Information Theory 28,pp.
129137,1982 .
www.ijafrc.org
Manish Verma, MaulySrivastava, NehaChack, Atul Kumar Diswar and Nidhi Gupta, "A
Comparative Study of Various Clustering Algorithms in Data Mining", International Journal of
Engineering Research and Applications, Vol. 2, Issue.3, 2012.
[9]
AUTHOR PROFILE
Dr.S.Prasath is currently working as an Assistant Professor in Department of
Computer Science, Erode Arts & Science College (Autonomous), Erode,
Tamilnadu, India. He received Ph.D degree from Bharathiar University,
Coimbatore, Tamilnadu, India in 2015. He has obtained his Masters degree in
Software Engineering from M.Kumarasamy college of Engineering, Karur under
Anna University, Chennai in 2008 and M.Phil degree in Computer Science in the
year 2009. His area of interests includes, Image Processing and Data Mining. He
has presented 6 papers in National and 2 International level conferences. He
has published 10 papers in National and International journals.
www.ijafrc.org