Untitled

Scribd Upload a Document Search Documents
Explore
DocumentsBooks - FictionBooks - Non-fictionHealth & MedicineBrochures/CatalogsGo
vernment DocsHow-To Guides/ManualsMagazines/NewspapersRecipes/MenusSchool Work+
all categoriesFeaturedRecentPeopleAuthorsStudentsResearchersPublishersGovernment
& NonprofitsBusinessesMusiciansArtists & DesignersTeachers+ all categoriesMost
FollowedPopular..Sign Up|Log In..
1First Page
Previous Page
Next Page
/ 12Sections not available
Zoom Out
Zoom In
Fullscreen
Exit FullscreenSelect View Mode
View ModeBookSlideshowScroll ...Readcast
Add a Comment
Embed & Share
Reading should be social! Post a message on your social networks to let others k
now what you're reading. Select the sites below and start sharing.Readcast this
Document..
Login to Add a Comment..
Share & Embed.Add to Collections
Download this Document for FreeAuto-hide: on
A CATEGORIZATION OF MAJOR CLUSTERING METHODSIN DATA MININGAbstract:Clustering
is the process of grouping data into classes or clusters ,so that theobjects wit
h in a cluster have high similarities in comparison to one another but are veryd
issimilar to objects in other clusters .Dissimilarities are assessed based on t
he attributevalues describing the objects. Clustering has its roots in many area
s, including datamining,statistics,biology,and machine learning.In this paper, W
e examine several clustering techniques, organized into thefollowing categories
:Partitioning methods, Hierarchical methods, Density- basedmethods, Grid-based m
ethods, Constraint-based clustering .Clustering can also be usedfor outlier det
ection.IntroductionData miningis refers to ‘extracting ‘ or ‘mining’ knowledge from larg
e amount of data. It is called as data mining. Data mining is also known as know
ledge mining fromdata, knowledge extraction, data or pattern analysis,data desig
ning.Datawarehouseisnothing but the containing large amount of information or da
ta base.The process of grouping a set of physical or abstract objects into class
es of similar objects is called clustering. A cluster is a collection of data ob
jects that are similar to oneanother with in the same cluster and are dissimilar
to the objects in other cluster. Acluster of data objects can be treated collec
tively as one group and so may be consideredas a form of data compression.The fo
llowing are typical requirements of clustering in data mining: Scalability Ability
to deal with different types of attributes Discovery of clustering with arbitrary
shape Ability to deal with noisy data Incremental clustering and insensitivity to th
eorder of input record High dimensionality Constraint based clustering1. partitioni
ng method
The most well known and commonly used partitioning methods are K-means andK-medo
ids,and their variations.(i)Centroid based technique:The K-means MethodThe K-mea
ns algorithm takes the input parameter,k,and partition a set of n objects intok
cluster so that the resulting intracluster similarly is high but the intercluste
r similarity islow.The square-error criterion is used defined as, K E=∑ ∑ │p-mi│, i=1
p €CiThe k-means partitioning algorithm: Algorithm: The k-means algorithm for part
itioning, where each cluster’s center isrepresented by the mean value of the objec
ts in the cluster.Input:K-> the number of clusters,D-> a data set containing n o
bjects.Output: A set of k cluster. Method: [1] Arbitrarily choose k objects from
D as the initial cluster center;[2] Repeat [3] (re)assign each object to t
he cluster to which the object is the most similar,based on the mean value of th
e objects in the cluster; [4] update the cluster means ,ie., calculate the mean
value of the objects for each cluster;[5] until to change; First it randomly se
lect k of the objects,each of its initially represents a cluster mean or center.
For each of the remaining objects , an object is assigned to a cluster towhich
it is similar, based on the distance between the object and the mea
ncluster.itcomputes the new mean for each cluster.this process iterates until th
e criterionfunction coverage’s.
Figure : Clustering of a set of objects based on the k-means method.(The meanof
each cluster is marked by a”+”).(ii)Represantative Object Based Technique: The K-med
oids Method The k-means algorithm is sensitive to outliers because an object wi
th extremelylarge value may substantially distort the distribution of data. This
effect is particularlyexacerbated due to the use of the square error function.
An absolute-error criterion is used, defined as K E= ∑ ∑ │p-Oj│, J=1 p € cj Where, E is
the sum of absolute error for all objects in the dataset; is the point inspace
representing in given object in cluster Cj.and Oj is representative object of Cj
.CASE1:p currently belongs to representative object ,Oj. If Oj is replaced by O
randomas a representative object and p is closest to one of the other represent
ative objects,Oi,i≠j,then p is reassigned to Oi.CASE2: p currently belongs to repr
esentative object.Oj. Oj is replaced by O random as arepresentative object and
p is closest to Orandom then p is reassigned to Orandom.CASE3:p currently belong
s to representative object, Oj. i≠j. If Oj is replaced byOrandom as a representa
tive object and p is still closest to Oi,then the assignment doesnot change.CASE
4: :p currently belongs to representative object, Oj. i≠j. If Oj is replaced byOr
andom as a representative object and p is still closest to Orandom. PAM(partiti
oning Around Medoids)was one of the first K-Medoids algorithm. Itattempts to det
ermine K partitions for n objects. After an internal random selection of krepre
sentative objects, the algorithm repeatedly tries to make a better choice of clu
ster representatives. The total cost of swapping is the sum of cost i
ncurred by anonrepresentative object. The complexity of each iteration is O(n-
K)2). Ads by Google
Relational Database Data
Automatic RDB Data Logging
Enterprise Transaction Modules
www.OLDI.com
PIDX invoicing made easy
Invoice and fully integrate all
your Oil & Gas trading partners
www.amalto.com
MIKE2.0 Methodology
An open source methodology for
Enterprise Information Management
www.openmethodology.org
CLUSTERING METHODS IN DATA MINING

Download this Document for FreePrintMobileCollectionsReport DocumentReport this
document?Please tell us reason(s) for reporting this document
Spam or junk
Porn adult content
Hateful or offensive
If you are the copyright owner of this document and want to report it, please fo
llow these directions to submit a copyright infringement notice.
Report Cancel
.
.This is a private document.
Info and Rating
Reads:75Uploaded:01/20/2011Category:Uncategorized.Rated:Copyright:Attribution No
n-commercial
.
FollowGerry Himawan San.....Share & Embed
Related Documents
PreviousNext
p.
p.
p.
p.
p.
p.
p.
p.
p.
p.
p.
p.
p.
p.
p.
p.
p.
p..More from this user
PreviousNext
12 p..Add a Comment
SubmitCharacters: 400
Print this documentHigh QualityOpen the downloaded document, and select print fr
om the file menu (PDF reader required).
Download and Print
.Sign upUse your Facebook login and see what your friends are reading and sharin
g.
Other login optionsLogin with FacebookSignupI don t have a Facebook account
. email address (required) create username (required) password (required) S
end me the Scribd Newsletter, and occasional account related communications.
Sign Up Privacy policy You will receive email notifications regarding your acco
unt activity. You can manage these notifications in your account settings. We pr
omise to respect your privacy. Why Sign up? Discover and connect with people of
similar interests.
Publish your documents quickly and easily.
Share your reading interests on Scribd and social sites.
..Already have a Scribd account?
email address or username password .Log In Trouble logging in? ..
Login SuccessfulNow bringing you back...
« Back to Login
Reset your password
Please enter your email address below to reset your password. We will send you a
n email with instructions on how to continue.
Email address:
You need to provide a login for this account as well.
Login:
Submit
.Upload a Document
Search Documents Follow Us!scribd.com/scribdtwitter.com/scribdfacebook.com/scri
bdAboutPressBlogPartnersScribd 101Web StuffScribd StoreSupportFAQDevelopers / AP
IJobsTermsCopyrightPrivacy.Copyright © 2011 Scribd Inc.Language:English.Choose the
language in which you want to experience Scribd:EnglishEspañolPortuguês (Brasil).

Untitled

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Untitled

Uploaded by

Copyright:

Available Formats

Scribd Upload a Document Search Documents

CLUSTERING METHODS IN DATA MINING

You might also like