Nature Inspired Techniques For Data Clustering

2014 International Conference on Circuits, Systems, Communication and Information Technology Applications (CSCITA)
Nature Inspired Techniques for Data Clustering

Sandeep U. Mane
Pankaj G. Gaikwad
Assistance Professor, Dept. of CSE,

Rajarambapu Institute of Technology,
Rajaramnagar, India
manesandip82@gmail.com
M. Tech Student, Dept. of CSE,

Rajarambapu Institute of Technology,
Rajaramnagar, India
gaikwadpankaj04@gmail.com
and creates a hierarchical structure that reflects the order in
which groups are merged or divided. K-Means clustering
algorithm is of partitioning type of clustering algorithm,
however divisive (top down) and agglomerative (bottom up)
algorithms are hierarchical clustering algorithms.
Abstract Nature is always a source of inspiration. In last few

decades, the research is stimulated on new computing paradigms
and result of this effort is emergence of new problem solving
techniques like Nature Inspired Computing, Evolutionary
Computing. Nature inspired problem solving techniques are
widely used to solve complex problems. These techniques are
widely used due to their decentralized and self-organized
behavior. Such behavior is observed in social systems such as
artificial bee colony algorithm, particle swarm optimization, ant
colony optimization, bat algorithm, firefly algorithm, glowworm
swarm optimization etc. In this paper we have given overview of
nature inspired techniques used for data clustering,
hybridization with traditional clustering techniques and their
effectiveness.
As like partitioning and hierarchical clustering methods;

clustering algorithms are also categorized into hard and fuzzy
(soft) clustering [2], [13]. A hard clustering algorithm assigns
each pattern to a single cluster during its operation. A fuzzy
clustering method assigns degrees of membership to each
input pattern in several clusters. A fuzzy clustering can be
converted to a hard clustering by assigning each pattern to
cluster according to largest measure of membership. Hard
clustering algorithms are sensitive towards missing values and
outliers detected in input dataset. These limitations can be
overcome by using fuzzy clustering algorithm [13], [14].
Keywords Data Clustering; Nature Inspired Techniques;

Traditional Clustering Techniques; Hybrid Clustering Techniques;
I.
INTRODUCTION
A traditional clustering algorithm poses a number of

challenges. From literature it is found that partitioning
algorithms gives better performance than hierarchical
algorithms, but the necessity is that it requires predefined
number of clusters. The result of such algorithms is highly
depend on the initial choice of centroid and in the process of
optimizing the objective function it may get stuck in local
optimum. To overcome these limitations researchers have
proposed nature inspired techniques. These nature inspired
techniques are decentralized and self-organized in behavior
[15].
Clustering is one of the data mining techniques which are

an unsupervised classification method, where patterns are
grouped based on similarity features [1], [2]. The absence of
category
information
distinguishes
data
clustering
(unsupervised) from classification or discriminate analysis
(supervised). The aim of clustering is to find structure in
pattern and is therefore investigative in nature [2]. The
measure of similarity is usually pattern dependent. Euclidean
distance is one of the commonly used parameter to measure
the similarity. Minimum Euclidean distance means better
similarity and vice versa. In precise manner, data clustering is
a technique where the information which is logically similar,
physically stored together. Clustering analysis is a subjective
process which is basic cause for the availability of number of
clustering algorithms [3]. The diversity in clustering algorithm
is due to the diversity of induction principles (mathematical
formulation which believe to form cluster) and clustering
models. Different researchers employ different cluster models
with different algorithms. Understanding these cluster
models is the key to make analysis of various algorithms.
Widely considered cluster models for clustering analysis are
connectivity based model (e.g. hierarchical clustering) [4], [5],
centroid based model (e.g. k-Means) [6], [7], distribution
based model (e.g. Gaussian mixture) [8], [9] and density based
model (e.g. DBSCAN) [10], [11]. Basically, traditional
clustering techniques are widely used in partitioning methods
and hierarchical methods [12].
Swarm Intelligence (SI) is one of the categories of natureinspired problem solving techniques. Swarm (i.e. individuals)
uses their environment and resources more efficiently by
collective intelligence. The nature inspired techniques like
artificial bee colony algorithm (ABC), particle swarm
optimization (PSO), ant colony optimization (ACO), bat
algorithm (BA), firefly algorithm and glowworm swarm
optimization (GSO) are introduced in following sections.
The theme of clustering analysis process is shown in fig.1.
Raw Data
Clustered Data
Clustering Analysis
Fig. 1. Theme of Clustering Analysis Process
Most of the Partitioning algorithms are based on

specifying an initial number of groups, and iteratively
allocating objects among groups for convergence. In contrast,
a hierarchical algorithm combines or divides existing groups
978-1-4799-2494-3/14/$31.00 2014 IEEE
Clustering Algorithms
In the literature, researchers have used standard datasets

for analysis of different clustering algorithms, available online
at archive.ics.uci.edu/ml/datasets.html and kdd.ics.uci.edu.
Ackerman et al. in [16] proposed theoretical proofs of
419
searching for food. In PSO, the term particle swarm

represents a solution in a high-dimensional space. The
particles represented with four vectors, its current position,
best position found so far, best position found by its
neighborhood so far, and their velocity. Swarm adjusts its
position in the search space according to the best position
reached by it (pbest) and the best position reached by its
neighborhood (gbest) during the search process.
clusterable-datasets. At second stage of Fig.1, various

categories of algorithms like traditional clustering techniques,
AI techniques, or nature inspired techniques are applied.
Based On the result, the analysis work is done at last stage.
The rest of the paper is organized as, section II gives
overview of nature inspired techniques. In section III, we have
presented applications of nature inspired techniques in data
clustering. In section IV, we have discussed prior work on
hybridization of nature inspired techniques with traditional
clustering techniques. Finally we have concluded in section V.
To solve clustering problem, the basic PSO algorithm is

modified. Initially, the process is started with initializing each
particle with K random cluster centers. After initialization
process, Euclidian distance is computed with all cluster
centroid and data vector is assigned to cluster which have
nearest centroid for each particle. In next step, fitness value is
computed. The clusters are formed with the best fitness value.
The reassignment of data vectors to cluster is done by updating
the position and the velocity of particles [20].
II. NATURE INSPIRED TECHNIQUES

There are different problem solving techniques are existing
which are inspired from happening in nature. In this section
we have discussed few techniques like, ant colony
optimization, particle swarm optimization, artificial bee
colony algorithm, bat algorithm and fire-fly algorithm.
C. Artificial Bee Colony Algorithm

The Artificial bee colony algorithm is one of the recently
introduced swarm based problem solving technique, simulates
the forging behaviour of honey bees. Originally ABC was
proposed to solve numerical optimization problems by
Karaboga and latter it is modified for clustering problems by
Karaboga et al. in [21], [22]. In ABC three groups of artificial
bees are used employed, onlookers and scouts bees. In the
modified algorithm, the first and second half of the colony
consists of employed and the onlooker bees. The employed
bees or the onlooker bees are equal to the number of cluster
centers in solution.
A. Ant Colony Optimization

Ant Colony Optimization is inspired by behaviour of an
ant colony. It was proposed by Dorigo & Di Caro [17]. The
interactive behavior is through pheromone deposition. Ants
lay down pheromones directing each other to resources while
exploring their food searching environment. Ants select new
path by taking probabilistic decision, based on the amount of
pheromone deposited on path. Stronger the pheromone trail,
the higher its probability. ACO is structured into three main
functions as follows:
a) Ant Solution Construct: This process is carried out
according to adjacent states of problem for iteratively building
solutions.
b) Pheromone Update: This process update the
pheromone trail at the end of every step. In addition to
pheromone trail reinforcement, ACO also includes pheromone
trail evaporation. Evaporation of the pheromone trails helps
ant to forget bad solutions that were found in the initial stages.
c) Deamon Actions: This process is optional for
applying additional updates from a global perspective.
Lumer and Faieta proposed Ant Clustering Algorithm
(ACA), the modified version of ACO is suitable for data
clustering. ACA uses a dissimilarity-based evaluation of the
local density, in order to make it suitable for data clustering.
Data items that are scattered within solution space can be
picked up, transported and dropped by the agents in a
probabilistic way. The picking and dropping operation are
influenced by the similarity and density of the data items
within the ant's local neighborhood. If an ant is not carrying an
object and finds an object Xi in its neighborhood, it picks up
this object with a probability that is inversely proportional to
the number of similar objects in the neighborhood. If however,
the ant is carrying an object x and perceives a neighbor's cell in
which there are other objects, then the ant drops off the object,
it is carrying with a probability that is directly proportional to
the object's similarity with the perceived ones [18].
Initially ABC generates a randomly distributed population.

The solution is updated by the employed, onlooker and scout
bees in next generations. An employed bee modifies old
solution depending on local information and tests fitness value
of new solution. After that employed bees share the nectar
information about food sources and their position with
onlooker bees. As like employed bees, onlooker bees choose a
food source depending on the probability value associated with
that food source. The food source of which the nectar is
discarded by the bees is replaced with a new food source by the
scout bees. The important control parameters in ABC for
clustering are, the number of food sources which are equal to
the number of employed or onlooker bees, the maximum cycle
number i.e. the limit for rejection of food sources.
D. Bat Algorithm
Bat-inspired algorithm is another nature-inspired
optimization algorithm developed by Xin-She Yang [23]. The
algorithm is based on the echolocation behavior of microbats
with varying pulse rates of emission and loudness. The process
of the echolocation of microbats can be summarized as
follows: Each virtual bat flies randomly with a velocity Vi at
position (solution) Xi with a varying frequency or wavelength
and loudness Ai. As they searches and finds their prey, they
changes frequency, loudness and pulse emission rate I.
Searching process is by a local random walk. Selection of the
best solution continues till certain stopping criteria are not met.
For this essentially uses a frequency-tuning mechanism to
control the dynamic behavior of a swarm in bats. The
B. Particle Swarm Optimization

Particle swarm optimization (PSO) is swarm based global
optimization technique proposed by Kennedy and Eberhart
[19]. It is inspired by the social behavior of bird flocking
420
objective function domain. In movement phase each

glowworm moves towards a neighbor that has a luciferin value
higher than its own, which is decided using a probabilistic
mechanism. The glowworms are attracted to neighbors that
glow brighter [28]. The clustering GSO algorithm is tested on
standard dataset available at UCI repository [29].
exploration and exploitation is balanced by tuning algorithmdependent control parameters [23].

The traditional bat algorithm is modified for clustering
process by randomly assigning k-clusters to each of the N bats.
In next stage, fitness of centroid in each bat is computed. The
data items or objects are placed in proper cluster based on
fitness value of centroid in a bat. In successive generations, the
new solution is generated by adjusting the frequency, updating
the velocity and creating new centroid values. For each bat,
best solution is selected among a set of best solutions from the
other bats. To accept new solution, increase in frequency and
reduced loudness is considered. Based on the newly selected
solution clusters are reassigned for centroid update assignment
[24]
III.
APPLICATIONS OF NATURE-INSPIRED TECHNIQUES IN

DATA CLUSTERING
Nature inspired algorithms, known as Swarm Intelligence

(SI), has attracted several researchers from the field of
engineering and technology. Swarm based techniques have
reported better performance over classical problem solving
techniques, to solve pattern recognition and clustering
problems. These algorithms belong to the domain, inspired
from the collective intelligence emerging from the behavior of
a group of social insects. These insects struggles to find and
store food and choosing materials for future usage; these
problems are solved by insect colonies without any kind of
supervisor or controller. In this section, several applications
are discussed; where nature inspired clustering techniques
applied. These areas are: image segmentation, document
retrieval, and web mining.
E. Firefly Algorithm
Firefly algorithm proposed by Yang [25] is swarm-based
algorithm inspired by the flashing behavior of fireflies. The
algorithm constitutes population-based iterative procedure
with numerous agents as fire flies. Agents communicate with
each other via bioluminescent glowing which enables them to
explore cost function space more effectively than in standard
distributed random search. The standard firefly algorithm has
three rules as follows:
Clustering in image segmentation is defined as the process

of identifying groups of similar objects. The goal of
segmentation is to simplify and/or change the representation
of an image into more meaningful and easier to analyze. In
precise, image segmentation is the process of assigning a label
to every pixel in an image such that pixels with the same label
share certain visual characteristics [30].
1) All fireflies are unisex and they will move towards more
attractive and brighter ones regardless of their sex.
2) The degree of attractiveness of a firefly is proportional
to its brightness.
3) The brightness of a fire fly is determined by the value of
the objective function of a given problem.
The standard firefly algorithm is modified so as to apply it,
to solve clustering problem. The modified firefly algorithm
has 2 stages: [25]
Das et al. has given a nature inspired metaheuristic

approach where firefly algorithm with unsupervised learning
approach effectively identifies the problem region in
mammographic images [31]. Sag and Cunkas in [32] presented
four different clustering algorithms namely k-Means, fuzzy cmeans (FCM), PSO and ABC, for synthetic, satellite image
segmentation. Quantization error is used for measuring quality
of cluster. Ouadfel and Meshoul in [33] proposed hybridized
fuzzy c-means algorithm and artificial bee colony algorithm for
image segmentation in clustering. The limitations of fuzzy cmeans algorithm are handled by using new mutation strategy
based on differential evolution in order to improve the
exploitation process.
a) Initialize fireflies with random values with objective

function acting as the Euclidean distance.
b) The clustering processes initializes with the position
of best firefly and refine the centers.
F. Glowworm Swarm Optimization
The glowworm swarm optimization (GSO) is one of the
effective techniques to solve optimization problems and
recently introduced for data clustering problems [26]-[28].
The data clustering problem is formulated as an optimization
problem which finds the optimal centroid of the clusters rather
than to find optimal data partitions. The GSO finds multiple
optimal solutions. The solution has either equal values for the
dedicated objective function or not.
Clustering the search result of an information retrieval

system helps the user to get more appropriate data in search. It
also helps for improving the efficiency of search engines.
Document clustering is a fundamental operation in
unsupervised document organization, topic extraction, and
information retrieval [34]. The cluster-based information
retrieval is used to retrieve one or more clusters in their
entirety, in response to a query. The task for the retrieval
system is to match the query against clusters of documents
instead of individual documents, and rank the clusters based on
their similarity to the query. Any document from a higher
ranked cluster, is considered more likely to be relevant than
any document from a lower ranked cluster on the list. The PSO
clustering algorithm is used to discover the proper centroid of
clusters for minimizing the intra-cluster distance as well as
maximizing the distance between clusters. The entire clustering
The clustering GSO consists of three main phases:

initialization phase, luciferin level update, and glowworm
movement. In initialization phase, the number of dimensions,
number of glowworms, and maximum number of iterations are
initialized. Also for each glowworm gj, a random position
vector (pi) is generated using uniform randomization within
the given search space. The luciferin update depends on the
glowworm position. During the luciferin-update phase, each
glowworm adds, to its previous luciferin level, a luciferin
quantity proportional to the fitness of its current location in the
421
problem, authors proposed a hybrid fuzzy clustering method

based on fuzzy C-means and fuzzy particle swarm optimization
(FPSO) which makes use of the strengths of both algorithms.
Euclidian distance measure is used for calculating distance
from object to cluster center. Experimentation is done with six
well known datasets such as Iris, Glass, Wisconsin breast
cancer data set, Wine, Contraceptive Method Choice, Vowel
data set[42]. Dai et al. proposed an improved ant colony based
clustering algorithm which makes use of basic model of
Denueubourg and LF algorithm. It enables the ants to consult
historical information when conveying objects by importing
adjusting process and short period memory. The result shows
advances in convergence speed of the algorithm and the
efficiency of the cluster [43].
behavior of the PSO clustering algorithm divided into two

stages namely, a global searching stage and a local refining
stage. The global searching stage guarantees that each particle
searched to cover the whole problem space. The refining stage
makes all particles converge to reach near the optimal solution
[35].
The ant colony approach is used to solve unsupervised
clustering and the data retrieval problem. The ant based
clustering algorithm was better than traditional partitioning
algorithm to solve text document clustering [36], [37]. When
multiple ant colony approach is used for data clustering, it
outperforms than single ant colony approach as well as kMeans algorithm; as multiple ant colony approach involves
parallel engagement of several individual ant colonies [38].
Rana et al. proposed a hybrid sequential approach for data

clustering using k-Means and particle swarm optimization
algorithm [44], [45]. In this paper, initialization problem of
partition based clustering methods is overcome by hybridizing
PSO with k-means clustering algorithm. The motivation for
this idea of hybridization is from PSO algorithm; at the
beginning stage of algorithm, the clustering process is started
due to its fast convergence speed and then the result of PSO
algorithm is tuned by the k-Means to near optimal solutions.
Jinfeng Ding et al. worked for ant swarm intelligence to obtain
global search [46]. Authors used fuzzy logic to locate objects in
cluster by updating pheromones according to the total cluster
variance. The ACO algorithm gives better performance
compared with genetic algorithms (GAs) and self-organization
map (SOM).
Web mining is a one of the challenging data mining task. In

web mining, the web structures, the regularity and dynamics of
web contents are identified to determine web access patterns.
In web mining data can be collected at the client-side, serverside and proxy servers or obtained from an organizations
database. Web mining is broadly categorized into three classes
namely web content mining, web usage mining and web
structure mining [39].
Web content mining is the process of extraction and
integration of useful data, information and knowledge from
web page content. Web content mining has two different
views: Information Retrieval View and Database View. Web
usage mining is the process of extracting useful information
from server logs. Usage data captures the identity or origin of
web users along with their browsing behavior at a web site.
The graph theory is used to analyze the node and connection
structure of a web site in Web structure mining. Document
structure mining is analysis of the tree-like structure of page to
describe HTML or XML tag usage. G. Sudhamathy in [39]
presented review of different techniques used for clustering,
temporal cluster migration, fuzzy clustering and PSO
approach for web usage mining. Karol and Mangat in [40]
discussed PSO algorithm for web usage mining, where, PSO is
used for finding web usage pattern, data feature extraction and
web service selection. Abraham and Ramos in [41] proposed
accurate trend prediction model based on ANT-LGP to
analyze the hourly and daily web traffic volume.
IV.
Jiacai and Ruijun proposed an extended fuzzy k-means

(xFKM) algorithm for clustering categorical valued data where
the cluster centroid vectors are represented as clustering
information. To solve the problem of clustering categorical
data, Ralambondrainy presented Conceptual Version of the kmeans (CVKM) algorithm which converts multiple categorical
attributes into binary attributes. One represents presence of a
category and zero absence of it. These binary attributes are
used as input values in the k-means algorithm [47]. In [48], the
fuzzy C-means and k-means are hybridized with PSO. These
approaches are compared with traditional approaches along the
evolution measures, Entropy and F-measure.
Rehab and Kader [49] proposed two phase algorithm for
data clustering. In first phase genetically improved PSO (GAIPSO) is used to combine the standard velocity and position
update rules of PSO with the ideas of selection, mutation and
crossover from GAs. The GAI-PSO algorithm searches the
solution space to find an optimum initial seed for the second
phase. In second phase, k-means algorithm can efficiently
converge to the optimum solution. The experiments conducted
on Iris and Wine datasets.
HYBRIDIZATION OF NATURE INSPIRED TECHNIQUES

WITH TRADITIONAL CLUSTERING TECHNIQUES
The purpose of hybridization is to overcome the

limitations of one problem solving techniques using another
technique and to increase strength of algorithm. To achieve
better performance in data clustering applications, researchers
are working on different traditional clustering techniques as
well as on nature inspired or SI techniques. Recently
researchers are attracted towards hybridization of traditional
and nature inspired techniques for better performance. As kMeans algorithm outperforms among clustering techniques,
researchers have focused on k-Means and other nature
inspired techniques for hybridization. Some of hybrid
techniques are presented here.
Krishnamoorthi and Natarajan [50] proposed an algorithm

which uses a fuzzy C-means operator in artificial bee colony
algorithm. The technique of FCM is used in scout bee phase of
ABC, are introduced by FCM operator. Krishnaveni and
Arumugam [51] proposed GHSBEEK clustering algorithm
combination of Global best Harmony Search (GHS) along with
features of artificial bee colony (ABC) and k-means algorithm.
Mohammad Ali Shafia et al. [52] proposed population based
Izakian and Abraham proposed Fuzzy c-means and fuzzy

swarm to solve fuzzy clustering problem. To solve clustering
422
be faster, more efficient and more robust. It is also observed

from literature that knowledge based clustering has not been
applied to large data sets or in domains with large knowledge
bases. From our study, we recommend that hybrid clustering
techniques can be applied to large datasets or knowledge bases.
hybrid Genetic Bee Tabu k-means Clustering algorithms

(GBTKC). The GBTKC combines Genetic Bee Tabu k-means
Clustering algorithms based on Honey Bee Algorithm which
added benefits of k-means algorithm for improving efficiency.
Sood and Bansal [53] proposed new technique where Bat
algorithm combined with k-medoid clustering algorithm to
overcome the limitations of partition based k-medoid clustering
algorithm. Abshouri et al. [54] proposed new hybrid clustering
algorithm of firefly and k-Harmonic Mean (FFAKHM). In this
work, the drawback of k-Harmonic Mean is overcome by using
firefly clustering algorithm. The results are compared against
KHM, PSO-KHM and PSO-Genetic-KHM algorithm. Hashmi
et al. [55] proposed swarm algorithms with comparative
analysis, namely PSO, bat algorithm, bees algorithm, cuckoosearch algorithm, artificial fish school algorithm and firefly
algorithm to overcome the limitations of k-means algorithm.
There are more hybrid techniques like evolutionary
approaches, heuristics approaches and local search methods
used with traditional techniques are available in literature [56],
[57].
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
In table I, we have summarized the data clustering

techniques with important features or purpose.
TABLE I.
Algorithm
Improved
ACO
FCM & fuzzy
PSO
Hybrid
kmeans & PSO
GAIPSO
GBTKC
GHSBEEK
clustering
Firefly
and
KHM
FCM-PSO
k-means-PSO
FCM-ABC
k-medoid with
Bat Algorithm
Hybrid
SI
based
approach
[7]
SUMMARY OF DATA CLUSTERING TECHNIQUES
Purpose/feature
To improve performance based on
Denueubourg model & LF algorithm
Hybridized to solve fuzzy clustering
problem
Results of PSO are tuned by the kMeans to near optimal solutions
For efficient convergence GAIPSO
hybrid with k-means
Hybridization of Bee Algorithm,
GA, Tabu and k-means used for
improving efficiency of k means
ABC algorithm applied to improve
convergence rate of Harmony Search
To overcome drawbacks of kHarmonic Mean
Used to measure performance over
Entropy and F-measures
Performance measured over Entropy
and F-measures
FCM operator is used to introduce
scout bee phase of the ABC
To overcome the drawbacks of
partition based k-medoid algorithm
Hybridizing ABC, ACO, PSO,
Cuckoo search, Bat Algorithm with
k-means
V.
Author& Year
Weihui
Dai
(2009)
Izakian et al.
(2010)
Sandeep Rana
(2010)
Rehab & Kader
(2010)
Mohammad Ali
Shafia et al.
(2011)
V. Krishnaveni
et al. (2012)
Abshouri et al.
(2012)
Karol
and
Mangat (2013)
Karol
and
Mangat (2013)
Krishnamoorthi
et al (2013)
Sood
and
Bansal (2013)
Adil Hashmi et
al. (2013)
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
CONCLUSION
[16]
Nature inspired techniques are widely used to solve

complex optimization problems. Data clustering is widely used
data mining tool. Clustering is a subjective process, the same
set of data items are partitioned based on applications. This is
because a single algorithm or approach is not adequate to solve
every clustering problem. Nature Inspired techniques plays an
important role in finding solutions to these problems due to
their special characteristics like self-organized, decentralized
and collective behaviors. It is advantageous to hybridize nature
inspired techniques with traditional techniques to enhance it to
[17]
[18]
[19]
423
Margaret H. Dunham, Handbook on Data Mining- Introductory

and Advanced topics, Eighth Impression, Dorling Kindersley
India Pvt. Ltd, Pearson Education, 2006.
Anil K. Jain, Data Clustering: 50 years beyond k-means, Pattern
Recognition Letters, 2009.
Vladimir Estivill Castro, Why so many clustering algorithms a
position paper, SIGKDD Exploration, vol. 4, issue 1, January
2002, pp. 65 - 75.
Sudipto Guha, Rajeev Rastogi, and Kyuseok Shim, CURE: An
efficient clustering algorithm for large databases, Information
Systems, vol. 26, no. 1, pp. 35 58, 2001.
Miron Livny Zhang, Tian Zhang and Raghu Ramakrishnan,
BIRCH: a new data clustering algorithm and its applications,
Data Mining and Knowledge Discovery, vol. 1, pp. 141-182, 1997.
L. Kaufmann and P. Rousseeuw, Clustering by means of
medoids, Elsevier Science, pp. 405-416, 1987.
Tapas Kanungo, David M. Mount, Nathan S. Netanyahu,
Christine D. Piatko, Ruth Silverman, and Angela Y. Wu, An
efficient
k-means clustering algorithm: analysis
and
implementation, IEEE Trans. Pattern Anal. Mach. Intell, vol. 24,
no. 7, pp. 881 892, 2002.
Z. Huang and M. Ng, A fuzzy k-modes algorithm for clustering
categorical data, In IEEE Trans Fuzzy System, pp. 446 452,
1999.
Rui Xu and Donald Wunsch II, Survey of clustering algorithms,
IEEE Transactions on Neural Networks, vol. 16 issue 3, pp. 645678, May 2005.
Martin Ester, Hans peter Kriegel, Jorg S, and Xiaowei Xu, A
density-based algorithm for discovering clusters in large spatial
databases with noise, AAAI Press, pp. 226-231, 1996.
R. Agrawal, J. Gehrke, and D. Gunopulos, Automatic subspace
clustering of high dimensional data for data mining applications,
In Proceedings of the ACM-SIGMOD'98 International Conference
on Management of Data, Seattle, WA, 1998, pp. 94-105.
B. Andreopoulos, A. An, and X. Wang, Hierarchical densitybased clustering of categorical data and a simplification, In
Proceedings of the 11th Pacific-Asia Conference on Knowledge
Discovery and Data Mining, Nanjing, China, Springer LNCS,
2007, pp. 11-22.
Osama Abu Abbas, Comparison between data clustering
algorithm, The international Arab Journal of Information
Technology, vol. 5, no. 3, pp. 320-325 July 2008.
M. Sarkar and T. Y. Leong, Fuzzy k-means clustering with
missing values, In Proc.of AMIA. Symposium, 2001, pp. 588-592.
Hidetomo Ichihashi and Katsuhiro Honda, Fuzzy c-means
classifier for incomplete datasets with outliers and missing values,
In Procedding of International Conference on Computational
Intelligence for Modelling, 2005, pp. 457-464.
Margareta Ackerman and Shai Ben-David, Which data sets are
clusterable - A theoretical study of clusterability, 2008, pp.1 - 8.
Priya Vaijayanthi, Natarajan A M and Raja Murugadoss, Ants for
document clustering, International Journal of Computer Science
Issues, vol. 9, issue 2, no 2, pp. 493-499, March 2012.
Lumer E. and Faieta B., Diversity and Adaptation in Populations
of Clustering Ants, In Proceedings of Third International
Conference on Simulation of Adaptive Behavior: from animals to
animates, Cambridge, Massachusetts MIT press, pp. 499-508.
Kennedy J, Eberhart R. C. and Shi Y, Swarm intelligence, The
Morgan Kaufmann Series in Evolutionary Computation, Edition:
1, New York, 2001.
[41] Ajith Abraham and Vitorino Ramos, Web usage mining using
artificial ant colony clustering and genetic programming, In
Procedding of IEEE International Conference on Fuzzy System,
2003, pp. 1384-1391.
[42] Hesam Izakian and Ajith Abraham., Fuzzy c-means and fuzzy
swarm for fuzzy clustering problem, Expert Systems with
Applications, vol. 38, pp. 1835-1838, 2010.
[43] Weihui Dai, An improved ant colony optimization cluster
algorithm based on swarm intelligence, Journal of Software, vol.
4, no. 4, pp. 299-306, June 2009.
[44] Sandeep Rana, Sanjay Jasola and Rajesh Kumar, A hybrid
sequential approach for data clustering using k-means and particle
swarm optimization algorithm, International Journal of
Engineering, Science and Technology, vol. 2, no. 6, pp. 167-176,
2010.
[45] Sandeep Rana, Sanjay Jasola and Rajesh Kumar, A review on
particle swarm optimization algorithms and their applications to
data clustering, Springer Science Business Media, pp. 211-222,
2010.
[46] Jinfeng Ding, Jingbo Shao, Yuyan Huang, Linyang Sheng ,Wei FU
and Yingmei Li, Swarm intelligence based algorithms for data
clustering, In Procedding of International Conference on
Computer Science and Network Technology, 2011, pp. 577-581.
[47] Wang Jiacai and Gu Ruijun, An extended fuzzy k-means
algorithm for clustering categorical valued data, In Procedding of
IEEE International Conference on Artificial Intelligence and
Computational Intelligence, 2011, pp. 504-507.
[48] Stuti Karol and Veenu Mangat, Evaluation of text document
clustering approach based on particle swarm optimization, Cent.
Eur. J. Comp. Sci., vol.3, no. 2, pp: 69-90, 2013.
[49] Rehab F. and Abdel-Kader, Genetically improved PSO algorithm
for efficient data clustering, In Procedding of Second IEEE
International Conference on Machine Learning and Computing,
2010, pp. 71-75.
[50] M. Krishnamoorthi and A.M. Natarajan, Artificial Bee Colony
Algorithm Integrated With Fuzzy c-Mean Operator for Data
Clustering, Journal of Computer Science, vol. 9, issue 4, pp. 404412, 2013.
[51] V. Krishnaveni, and G. Arumugam, The Performance Analysis of
a Novel Enhanced Artificial Bee Colony Inspired Global Best
Harmony Search Algorithm for Clustering, Springer-Verlag,
Berlin Heidelberg, pp. 21-28, 2012.
[52] Mohammad Ali Shafia, Mohammad Rahimi Moghaddam, and
Rozita Tavakolian, A hybrid algorithm for data clustering using
honey bee algorithm, genetic algorithm and k-means method,
Journal of Advanced Computer Science and Technology Research,
pp. 110-125, 2011.
[53] Monica Sood and Shilpi Bansal, k-Medoids Clustering Technique
using Bat Algorithm, International Journal of Applied
Information Systems, vol. 5, no. 8, pp. 19-22, June 2013.
[54] Abshouri, Azam Amin, Bakhtiary and Alireza, A new clustering
method based on firefly and KHM, Journal of Communication
and Computer, vol. 9, issue 4, pp. 387-391, Apr2012.
[55] Adil Hashmi, Divya Gupta, Yash Upadhyay and Shruti Goel,
Swarm intelligence based approach for data clustering,
International Journal of Innovative Research and Studies, vol. 2,
issue 6, pp. 572-589, June-2013.
[56] Y. Kim, W. N. Street, and F. Menczer, Feature selection in
unsupervised learning via evolutionary searching, In Procedding
of 6th ACM SIGKDD International Conference on Knowledge
Discovery Data Mining, 2000, pp. 365369.
[57] Eduardo Raul, Ricardo J. G. B. Campello, Alex A. Freitas, and
Andre C. Ponce, A survey of evolutionary algorithms for data
clustering, IEEE Transactions On Systems, Man, And
CyberneticsPart C: Applications And Reviews, vol. 39, no. 2, pp.
133-155, March 2009.
[20] C. A. Dhote, Anuradha Thakare, and Shruti Chaudhari, Data

clustering using particle swarm optimization and bee algorithm,
Fourth International Conference on Computing, Communications
and Networking Technologies, July 2013, pp.1-5.
[21] D. Karaboga, An idea based on honey bee swarm for numerical
optimization, Technical Report-TR06, Erciyes University,
Engineering Faculty, Computer Engineering Department, 2005.
[22] Dervis Karaboga and Celal Ozturk, A novel clustering approach:
artificial bee colony (ABC) algorithm, Applied Soft Computing,
pp. 652657, 2011.
[23] X. S. Yang, Bat algorithm: literature review and applications,
International Journal of Bio-Inspired Computation, vol. 5, no. 3,
pp. 141-149, 2013.
[24] X. S. Yang, Handbook on Nature-Inspired Metahuristic
Algorithms, Luniver Press, ISBN 1-905986-10-6, pp. 79-89.
[25] Rui Tang, Simon Fong, Xin-She Yang, and Suash Deb,
Integrating Nature-inspired Optimization Algorithms to k-means
Clustering, Seventh International Conference on Digital
Information Management, August 2012, pp. 116-123.
[26] Yongquan Zhou, Qifang Luo and Jiakun Liu, Glowworm swarm
optimization for optimization dispatching system of public transit
vehicles, Neural Process Letter, Springer Science Business
Media, New York, 2013.
[27] K. Krishnanand and D. Ghose, Glowworm swarm optimization: a
new method for optimizing multi-modal functions, International
Journal of Computational Intelligence Studies, vol. 1, pp. 93119,
2009.
[28] Ibrahim Aljarah and Simone A. Ludwig, A new clustering
approach based on glowworm swarm optimization, IEEE
Congress on Evolutionary Computation, Cancun, Mexico, June
2013, pp. 2642-2649.
[29] http://archive.ics.uci.edu/ml/datasets.html
[30] Ritu Agrawal and Manisha Sharma, A comprehensive survey of
contemporary researches on image segmentation through
clustering, Journal of Computer Science and Engineering
Research and Development, vol. 1, pp. 30-35, May-October 2011.
[31] Goutam Das, Md. Iqbal Quraishi and Manisha Barman, Firefly
algorithm based mammographic image analysis, Asian Journal of
Computer Science and Information Technology, 3 - 4, pp. 56 59,
2013.
[32] Tahir Sag and Mehmet Cunkas, ABC - based clustering algorithm
for image segmentation, ICCIT 2012, pp. 95-100.
[33] Salima Ouadfel and Souham Meshoul, Handling fuzzy image
clustering with a modified ABC algorithm, International Journal
of Intelligent Systems and Applications, vol. 12, pp. 65-74, 2012.
[34] Xiaohui Cui, Thomas E. Potok and Paul Palathingal, Document
clustering using particle swarm optimization, 16th IEEE
International Conference on Tools with Artificial Intelligence Boca
Raton, USA, 2004, pp. 1-7.
[35] Xiaohui Cui, Thomas E. Potok and Paul Palathingal Document
clustering using particle swarm optimization, Westin IEEE Swarm
Intelligence Symposium, 2005, pp. 185-191.
[36] V. Ramos and J. J. Merelo, Self organized stigmergic document
maps: environment as mechanism for context learning, In
Procedding of
Evolutionary and Bio-inspired Algorithms
Conference, 2002, pp. 284-293.
[37] P. S. Shelokar, V. K. Jayaraman and B. D. Kulkarni, An ant
colony algorithm for clustering, Analytica Chemica Acta, vol.509,
no. 2, pp. 187-195, 2004.
[38] Yan Yang and Mohamed S. Kamel, An aggregated clustering
approach using multi-ant colonies algorithms, Pattern
Recognition, vol. 39, no. 7, pp.665-671, 2006.
[39] G. Sudhamathy, Web log clustering approaches a survey,
International Journal on Computer Science and Engineering, vol. 3
no. 7, pp. 2896-2903, July 2011.
[40] Stuti Karol, and Veenu Mangat, Survey on particle swarm
optimization based web mining, Journal of Information and
Operations Management, vol. 3, issue 1, pp. 273-276, 2012.
424

Nature Inspired Techniques For Data Clustering

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Nature Inspired Techniques For Data Clustering

Uploaded by

Copyright:

Available Formats

2014 International Conference on Circuits, Systems, Communication and Information Technology Applications (CSCITA)

Nature Inspired Techniques for Data Clustering

Assistance Professor, Dept. of CSE,

M. Tech Student, Dept. of CSE,

Abstract Nature is always a source of inspiration. In last few

As like partitioning and hierarchical clustering methods;

Keywords Data Clustering; Nature Inspired Techniques;

A traditional clustering algorithm poses a number of

Clustering is one of the data mining techniques which are

Fig. 1. Theme of Clustering Analysis Process

Most of the Partitioning algorithms are based on

978-1-4799-2494-3/14/$31.00 2014 IEEE

In the literature, researchers have used standard datasets

searching for food. In PSO, the term particle swarm

clusterable-datasets. At second stage of Fig.1, various

To solve clustering problem, the basic PSO algorithm is

II. NATURE INSPIRED TECHNIQUES

C. Artificial Bee Colony Algorithm

A. Ant Colony Optimization

Initially ABC generates a randomly distributed population.

B. Particle Swarm Optimization

objective function domain. In movement phase each

exploration and exploitation is balanced by tuning algorithmdependent control parameters [23].

APPLICATIONS OF NATURE-INSPIRED TECHNIQUES IN

Nature inspired algorithms, known as Swarm Intelligence

Clustering in image segmentation is defined as the process

Das et al. has given a nature inspired metaheuristic

a) Initialize fireflies with random values with objective

Clustering the search result of an information retrieval

The clustering GSO consists of three main phases:

problem, authors proposed a hybrid fuzzy clustering method

behavior of the PSO clustering algorithm divided into two

Rana et al. proposed a hybrid sequential approach for data

Web mining is a one of the challenging data mining task. In

Jiacai and Ruijun proposed an extended fuzzy k-means

HYBRIDIZATION OF NATURE INSPIRED TECHNIQUES

The purpose of hybridization is to overcome the

Krishnamoorthi and Natarajan [50] proposed an algorithm

Izakian and Abraham proposed Fuzzy c-means and fuzzy

be faster, more efficient and more robust. It is also observed

hybrid Genetic Bee Tabu k-means Clustering algorithms

In table I, we have summarized the data clustering

SUMMARY OF DATA CLUSTERING TECHNIQUES

Nature inspired techniques are widely used to solve

Margaret H. Dunham, Handbook on Data Mining- Introductory

[20] C. A. Dhote, Anuradha Thakare, and Shruti Chaudhari, Data

You might also like