You are on page 1of 7

Volume: 04, December 2014,

Pages: 418-424

International Journal of Data Mining Techniques and Applications


ISSN: 2278-2419

A Study for the Discovery of Web Usage


Patterns Using Soft Computing Based Data
Clustering Techniques
1

Mohammed Tajuddin,2Zahid Ahmed Ansari, 3Syed A. Sattar

Department of CSE, Shaaz College of Engineering and Technology,Hyderabad, India


2

Department of CSE, P.A. College of Engineering, Mangalore, India

Department of CSE, Royal Institute of Technology, Hyderabad, India

mdtaj.syd@gmail.com, 2zahid_cs@pace.edu.in, 2syedabdulsattar1965@gmail.com

Abstract
Due to continuous proliferation of e-Commerce and Web information systems, Site owners facing intense
competition in attracting and retaining users. In todays highly competitive e-commerce environment, the
success of the site depends on the sites ability to retain visitors and turn casual browsers into potential
customers. Web servers of e-commerce sites accumulate huge volumes of user web activity logs. In this work,
we have presented k-Means Clustering based approach for the discovery of web user session clusters. Since the
web usage data usually involves imperfection and uncertainties we have also reviewed various Soft Computing
techniques to deal with such data.Also the size of web usage data is usually very huge, we have briefly
discussed few parallel computing options which may be utilized to enhance the web usage mining process.
Keywords-Web usage Mining; K-Means Algorithm; Soft Computing Techniques;OPENMP.
I.

INTRODUCTION

Due to the digital revolution and advancements in


computer hardware and software technologies,
digitized information is easy to capture and fairly
inexpensive to store. As a result huge amount of
data have been collected and stored in databases.
The rate at which such data is stored is growing at a
phenomenal rate. The fast growing tremendous
amount of data collected and stored in large and
numerous data repositories, has far exceeded our
human ability for comprehension without powerful
tools. The abundance of data, coupled with the
need for powerful data analysis tools has been
described as a data rich but information poor
situation. Hence, there is an urgent need for a new
generation of computational techniques and tools to
assist humans in extracting useful knowledge from
the rapidly growing volumes of data.Today the
information
Overload is a problem like the shortage of
information is. In our daily life activities we often
deal with flows of data much larger than we can
understand and use. Thus we need a way to shift
those data for extracting what is interesting and
relevant for our activities. Knowledge discovery in
large data repositories can find what is interesting
in them representing it in an understandable way
[1].Data Mining refers to extracting or mining large
amounts of data. It is an essential process where
intelligent methods are applied in order to extract
intelligent data patterns. Necessity is the Mother

Integrated Intelligent Research (IIR)

of Invention. Data mining techniques thoroughly


acquaints you with the new generation of data
mining tools and techniques and shows you how to
use them in order to make better business decisions
[2]. Data mining can be viewed as a natural
evolution of Information Technology. The major
reason that data mining has attracted a great deal of
attention in information industry in recent years is
due to the wide availability of huge amounts of
data and the imminent need for turning such data
into useful information and knowledge. The
information and knowledge gained can be used for
applications ranging from business management,
production control, and market analysis, to
engineering design and science exploration [3].
Clustering is the unsupervised classification of data
items into clusters [4]. Data Clustering identifies
the sparse and crowded place, and hence discovers
the overall distribution patterns of the dataset [5].
Web Usage Mining [6][7] is described as the
automatic discovery and analysis of data patterns in
web logs and associated data collected as a result of
user interactions with web resources on one or
more websites. The goal of Web usage mining is to
capture, model and analyze the behavioural
patterns and profiles of users interacting with a
Web site. The discovered patterns are usually
represented as collections of URLs that are
frequently accessed by groups of users with
common interests. Web usage mining has been
used in a variety of applications such as i) Web
Personalization systems [8],ii) Adaptive Web Sites

418

Volume: 04, December 2014,


Pages: 418-424

International Journal of Data Mining Techniques and Applications


ISSN: 2278-2419

[9], [10], iii) System Improvement to understand


the web traffic behaviour which can be utilized to
decide strategies for web caching [11], load
balancing and data distribution [12], iv) Fraud
detection: detection of unusual accesses to the
secured data [13], v) Business Intelligence [14],
etc.
The practice of electronic commerce has been in
existence since 1965 when consumers were able to
withdraw money from Automatic Teller machines
(ATMs) and make purchases through point of sale
terminals [15]. Websites are designed as
marketing tools to reach consumers outside the
firm, Therefore a web platform has transformed
itself in the past few years from a mere marketing
presence to a platform that can support all faces of
organizational work [16].
Due to continuous proliferation of e-Commerce
and Web information systems, Site owners facing
intense competition in attracting and retaining
users. In todays highly competitive e-commerce
environment, the success of the site depends on the
sites ability to retain visitors and turn casual
browsers into potential customers [7]. Web servers
of e-commerce sites accumulate huge volumes of
user web activity logs. Mining Web logs is critical
to discover users web usage patterns which can
assist in
Designing most attractive web sites
Providing personalized information to
customers
Making e-business decisions

from huge web logs containing irrelevant entries


and outliers. Section III discusses the web usage
mining using k-Means clustering algorithm. Section
IV describes soft computing methodologies to deal
with the data imperfection in web logs. Section V
briefly discusses the high performance computing
techniques that may be utilized to enhance the web
usage mining process and finally conclusions are
presented in section VI.
II.

ISSUES RELATED TO WEB USAGE DATA

Web usage data such as web access logs are usually


semi-structured containing imprecise or incomplete
information. Due to a variety of reasons inherent to
web browsing, outliers and incomplete data can
occur in the web usage data set. The presence of
significant noise and outliers require the techniques
which are more robust against the noise. Web
usage data usually do not have crisp boundaries
therefore they can be more accurately described by
the overlapping clusters rather than the crisp
clusters. This implies that soft computing
approaches can be useful instruments in order to
mine knowledge from such data [2]. Due to the
Huge Data Size, the performance is degraded;
therefore parallel data clustering techniques are
used.Maintaining
the
Integrity
of
the
Specifications.
III.

WEB USAGE MINING

In order to extract the common usage patterns from


the web access logs data clustering techniques can
be very useful tools.
Web access logs involve huge data files that are
rarely perfect, and consist of missing, inaccurate,
noisy or inconsistent data. Such imperfect and
hence corrupted data often negatively impact data
interpretation, model construction, and decisionmaking processes, resulting in inferences that are
not trustworthy. Soft computing is a consortium of
methodologies that works synergistically and
provides, in one form or another, flexible
information processing capability for handling reallife ambiguous situations [17]. Due to High
dimensionality and large data volumes of web
access logs,web usage mining is a computing
intensive and time requiring process. For this
reason, several web usage mining algorithms have
been implemented on parallel computing platforms
to achieve high performance in the analysis of large
datasets. Parallel Data Clustering techniques can
offer an efficient way to discover data clusters from
very large web log data sets by utilizing the parallel
computing concepts [2]. Organization of the rest of
the paper is as follows. Section II discusses the
issues related with discovery of web usage patterns

Integrated Intelligent Research (IIR)

Figure 1: Web Usage Mining Process


The main steps involved in Web Usage Mining
include i) Web log preprocessing, ii) Web Usage
Pattern Discovery and ii) Pattern Analysis. Web
usage mining process is shown in Fig. 1. Web log
data preprocessing deals with transformation of
raw web log data into numeric user session vectors.
The primary data sources used in Web usage
mining are the server log files, which include Web
server access logs and application server logs.
A sample web server log file entry in Extended
Common Log Format (ECLF) is given in Fig. 2 and
description of various fields is given in Table I.

419

Volume: 04, December 2014,


Pages: 418-424
Figure 1.

TABLE I.

International Journal of Data Mining Techniques and Applications


ISSN: 2278-2419

A Sample Web Log Entry.

DESCRIPTION OF LOG FIELDS


Field Value

1212265085.247

741

Description
The time of
request, in
coordinated
universal
time
The elapsed
time for
HTTP
request

192.168.23.62

IP address
of the client

TCP_MISS/200

HTTP reply
status code

10858

Bytes sent
by the
server in
response to
the request.

GET

The
requested
action

http://www.pace.edu.in/index.php

URI of the
object being
requested

client user
name, lf
disabled, it
is logged as
-

DEFAULT_PARENT/192.168.20

Hostname
of the
machine
where we
got the
object.

Content
Type of the
object

The main tasks involved in data preprocessing


include data cleaning, user identification, user
session identification and transformation of user
sessions into numeric vectors [18]. Data cleaning is
removing noise or irrelevant data and resolve
inconsistencies [19]. User Identification deals with
separation of the request pertaining to each

Integrated Intelligent Research (IIR)

individual user [20]. User session identification


deals with identifying the user sessions using some
heuristic techniques [21].
Each user session can be thought of a single
transaction ofmany URL references. We map the
user sessions as vectorsof URL references in andimensional space. Let U be a set of n unique URLs
appearing in the preprocessed log then
U u1 , u2 , , u n } and let X be the set of m user
sessions discovered by preprocessing the web log
data. Then X x1 , x2 , , x m } where each user

X can be represented as a bit vector


x wu1 , wu 2 , , wu m } where
wui 1; if

session

xi

wui xi ; and wui 0; otherwise.


A number of clustering algorithms have been used
in Web usage mining where the data items are user
sessions consisting of sequence of page URLs
accessed and interest scores on each URL page
based on the characteristics of user behavior such
as time elapsed on a page or the bytes downloaded
[21]. In this context, clustering can be used in two
ways, either to cluster users or to cluster items. In
user-based clustering, users are grouped together
based on the similarity of their web page
navigational patterns. In item based clustering,
items are clustered based on the similarity of the
interest scores for these items across all users. [22],
[23].
A. k-Means Clustering Algorithm
The k-Meansclustering algorithm [24] is one of the
most commonlyusedmethods for partitioningthe
data. Given a set of muser session data points
X xi | i 1 m , where each user session is
a n-dimensional vector representing presence or
absence of n URLs, k-means clustering algorithm
aims to partition the muser session data points into
kuser session clusters (k m) C = {c1, c2, , ck}.
Let V = {v1, v2, , vk} be the centers of the clusters
of C. k-Means clustering tries to minimize an
objective function (or a cost function) J(V, X) of
dissimilarity [25], which is the within-cluster sum
of squares. In most cases the dissimilarity measure
is chosen as the Euclidean distance. The objective
function J is defined in (2).
k

J ( X ,V )

J ( x , v ) u .d
i

ij

j 1

j 1

i 1

( xi , v j ) ,

(2)

where , J i ( xi , v j )

u .d
ij

( xi , v j ) ,

i 1

is the objective function w ithin cluster ci ,


u ij 1, if xi c j and 0 otherwise.
d 2 ( xi , v j ) is the disatnce between x and v
i
j

Euclidian distance between various user sessions


and cluster centers can be calculated using (3).

420

Volume: 04, December 2014,


Pages: 418-424
2

n
2

d ( xi , v j )

International Journal of Data Mining Techniques and Applications


ISSN: 2278-2419

xk v

j
k

(3)

k 1

where , n is the number of dimensions of each data point

x ki

is the value of k

v kj

is the value of k

th
th

dimensions of

xi

dimensions of

vj

The k-means clustering first initializes the cluster


centers randomly. Then each data point xi is
assigned to some cluster vj which has the minimum
distance with this data point. Once all the data
points have been assigned to clusters, cluster centers
are updated by taking the weighted average of all
data points in that cluster. This recalculation of
cluster centers results in better cluster center set.
The process is continued until there is no change in
cluster centers.
The partitioned clusters are defined by am kbinary
membership matrix U, where the element uij is 1, if
the ith data point xibelongsto the cluster j, and 0
otherwise.
Once
the
clustercenters
V = {v1, v2, vk}, are fixed, the membership
function uij that minimizes (2) can be derived as
follows:
1; if d 2 ( xi , v j ) d 2 ( xi , v j* ) j j*, j* 1,, k
uij
(4)
0; otherwise
The equation (4) specifies that assign each data
point xi to the cluster cj with the closest cluster
center vj. Once the membership matrix U=[uij ] is
fixed, the optimal center vjthat minimizes (2) is the
mean of all the data point vectors in cluster j:

vj

1
cj

(5)

i, xi c j

where,
m

c j , is thesizeof clusterc j andalsoc j uij


i1

Given an initial set of k means or cluster centers,


V = {v1, v2, , vk}, the algorithm proceeds by
alternating between two steps: i) Assignment step:
Assign each data point to the cluster with the closest
cluster center. ii) Update step: Update the cluster
center as the mean of all the data points in that
cluster. The input to the algorithm is a set of m data
points X xi | i 1m , where each data point is
an-dimensional vector, it then determines the cluster
centers vj and the membership matrix Uiteratively
as explained in algorithm show in Fig. 2.
The k-means algorithm provides locally optimal
solutions with respect to the sum of squared errors
represented by the error objective function. Since it
is a fast iterative algorithm, it has been applied to a
variety of areas [26]-[28]. The attractiveness of the
k-means lies in its simplicity and flexibility.

Integrated Intelligent Research (IIR)

However, it suffers from major shortcomings that


have been a cause for it not being implemented on
large datasets. The most important among these are
i) k-Means scales poorly with respect to the time it
takes for large number of points; ii) The algorithm
might converge to a solution that is a local
minimum of the objective function. The main
disadvantage of this algorithm lies in its sensitivity
to initial positions of the cluster centroids [21].Since
the performance of the k-Means algorithm depends
on the initial positions of the cluster centroids, it is
recommended to execute the algorithm multiple
times, each with a different set of initial
centroids.An excellent style manual for science
writers is [7].
IV.

SOFT COMPUTING TECHNIQUES

Soft computing is a collection of methodologies


that provides flexible information processing
capability for handling real-life ambiguous
situations. Its aim is to exploit the tolerance for
imprecision, uncertainty, approximate reasoning,
and partial truth in order to achieve tractability,
robustness, and low-cost solutions [29]. The
guiding principle is to devise methods of
computation that lead to an acceptable solution at
low cost by seeking for an approximate solution to
an imprecisely or precisely formulated problem. At
present, the principal soft computing tools include
fuzzy sets, rough set theory [30], Neural Networks
and Genetic Algorithms. They are most widely
applied in the data mining step of the overall KDD
process.
Fuzzy sets are suitable for handling the issues
related to understandability of patterns, incomplete
data, mixed media information, human interaction
and can provide approximate solutions faster
[31].Fuzzy Clustering also referred to as soft
clustering, data elements can belong to more than
one cluster, and associated with each element is a
set of membership. Fuzzy clustering is the process
of assigning these membership levels, and then
using them to assign data elements to one or more
clusters [32]. Rough sets are suitable for handling
different types of uncertainty in data and have been
mainly utilized for extracting knowledge in the
form of clusters [2]. Rough set theory can be used
to represent overlapping clusters. Rough sets
provide more flexible representation than
conventional sets, at the same time they are less
descriptive than the fuzzy sets [33]. Neural
networks are suitable in data-rich environments and
are typically used for extracting embedded
knowledge in the form of rules, quantitative
evaluation of these rules, clustering, selforganization, classification and regression [34],
[35].
Genetic algorithms provide efficient search
algorithms to select a model, from mixed media

421

Volume: 04, December 2014,


Pages: 418-424

International Journal of Data Mining Techniques and Applications


ISSN: 2278-2419

data, based on some preference criterion/objective


function. They have been employed in regression
and in discovering association rules [36], [37].
V.

HIGH PERFORMANCE COMPUTING


TECHNIQUES

The increasing volume of web logs generated by


web servers of popularsites, require high
performance parallel processing models for robust
and speedy web usage analysis. The need for
parallel computing has resulted in a number of
programming models proposed for high
performance computing. Some of the popular high
performance computing techniques include,
OpenMP, CUDA, Map Reduce and MPI. A brief
discussion about these techniques is given below.
OpenMP is a shared memory multiprocessing
application
program
inference
for
easy
development of shared memory parallel programs
[38]. It provides a set of compiler directives to
create threads, synchronize the operations, and
manage the shared memory. The programs using
OpenMP are compiled into multithreaded
programs, in which threads share the same memory
address space and hence the communications
between threads can be very efficient. OpenMP is
much easier to use because the compiler takes care
of transforming the sequential code into parallel
code according to the directives [39].
MPI is a message passing library specification
which defines an extended message passing model
for parallel and distributed programming on
distributed computing environment [40].In MPI
model, each process has its own address space and
communicates other processes to access others
address space. MPI provides point-to-point,
collective, and parallel I/O communication models
[41].
Map Reduce is a parallel programming paradigm to
use Hadoop which is recognized as a representative
big data processing framework [42]. Hadoop
clusters consist of up to thousands of commodity
computers and provide a distributed file system
called HDFS which can accommodate big volume
of data in a fault-tolerant way. The clusters become
the computing resource to facilitate big data
processing [43]. Map Reduce organizes an
application into a pair of Map and Reduce
functions. It assumes that input for the functions
comes from HDFS file(s) and output is saved into
HDFS files. Data files consist of records, each of
which can be treated as a key-value pair. Input data
is partitioned and processed by Map processes, and
their processing results are shaped into key-value
pairs and shuffled into Reduce tasks according to
key. Map processes are independent of each other
and thus they can be executed in parallel without
collaboration among them. Reduce processes play
role of aggregating the values with the same key.

Integrated Intelligent Research (IIR)

Map Reduce runtime launches Map and Reduce


processes with consideration of data locality.
CUDA (Compute Unified Device Architecture)
was developed in 2006 by NVIDIA as a general
purpose parallel computing programming model, to
run on NVIDIA GPUs for parallel computations
[44]. With CUDA, programs are granted access to
GPU memory and therefore, are able to utilize
parallel computation not only for graphic
application but general purpose processing [45].
VI.

CONCLUSION

This study is an attempt to explore various


techniques for the discovery of web usage clusters
from the web log data using soft computing and
parallel computing. Conclusions drawn from this
study are described below:

a.

b.

c.

d.

a.

b.

c.

Web log data preprocessing is essential forthe


transformationofraw web log data into numeric
user session vectors. The main tasks involved
in data preprocessing include data cleaning,
user identification, user session identification
and transformation of user sessions into
numeric vectors.
Since the web log data involves incomplete
and ambiguous information, for the faithful
discovery of the web usage clusters following
soft computing techniques may be utilized.
Fuzzy sets are suitable for handling the issues
related to understandability of patterns,
incomplete data, mixed media information,
human
interaction
and
can
provide
approximate solutions faster.
Rough sets are suitable for handling different
types of uncertainty in data and have been
mainly utilized for extracting knowledge in the
form of clusters.
Neural Networks are suitable for extracting
embedded knowledge in the form of rules,
clustering, self-organization, classification and
regression.
Genetic algorithms provide efficient search
algorithms to select a model, from mixed
media data, based on some objective function.
To deal with the massive web usage data size
following parallel computing methodologies
may be utilized.
OpenMP is a shared memory multiprocessing
application program inference for easy
development of shared memory parallel
programs.
MPI is a message passing library specification
which defines an extended message passing
model
for
parallel
and
distributed
programming environment.
CUDA provides a general purpose parallel
computing programming model to run on
NVIDIA GPUs for parallel computations.

422

Volume: 04, December 2014,


Pages: 418-424
d.

International Journal of Data Mining Techniques and Applications


ISSN: 2278-2419

Map Reduce is a parallel programming


paradigm to use Hadoop which is recognized
as a representative big data processing
framework.

REFERENCES
Gordon S. Linoff, Michael J. A. Berry, Data
Mining Techniques: For Marketing, Sales, and
Customer Relationship Management, 3rd Edition,
ISBN: 978-0-470-65093-6,888 pages, March 2011.
[1]

Berry, Michael J., and Gordon Linoff. Data


mining techniques: for marketing, sales, and
customer support. John Wiley & Sons, Inc., 1997.
[2]

Han, J. Kamber, M. (2006): Data Mining


Concepts and Techniques (2nd ed.). San Francisco:
Morgan Kaufmann.
[3]

Jain, Anil K., M. Narasimha Murty, and Patrick


J. Flynn. "Data clustering: a review." ACM
computing surveys (CSUR) 31.3 (1999): 264-323.
[4]

Zhang, Tian, Raghu Ramakrishnan, and Miron


Livny. "BIRCH: an efficient data clustering method
for very large databases." ACM SIGMOD Record.
Vol. 25. No. 2. ACM, 1996.
[5]

Mobasher, Bamshad, et al. "Improving the


effectiveness of collaborative filtering
on
anonymous web usage data." Proceedings of the
IJCAI 2001 Workshop on Intelligent Techniques
for Web Personalization (ITWP01). 2001.
[6]

Zahid Ansari and Amjad Khan, Fast Global kMeans Method To Discover User Session Clusters
from Web Log Data, International Journal of
Computer Engineering and Applications (IJCEA),
(ISSN:2321-3469), pp. 26-35, Vol. 8 No. 3,
Dec.2014.
[7]

B. Mobasher. Data mining for web


personalization. Lecture Notes in Computer
Science, 4321:90, 2007.
[8]

Etzioni O. Perkowitz, M. Adaptive web sites:


Automatically synthesizing web pages. In
Proceedings of the 15th National Conference on
Artificial Intelligence, Madison, WI (July1998)
727-732, 1998.
[9]

[10]Etzioni

O. Perkowitz, M. Adaptive web sites.


Communications of ACM, 43:152158, 2000.
[11]Edith

Cohen, Balachander Krishnamurthy, and


Jennifer
Rexford.
Improving
end-to-end
performance of the web using server volumes and
proxy filters. SIGCOMM Comput. Commun. Rev.,
28:241253, October 1998.
[12]G.

Vigna, W. Robertson, Vishal Kher, and R.A.


Kemmerer. A stateful intrusion detection system
for world-wide web servers. In Computer Security
Applications Conference, 2003. Proceedings. 19th
Annual, pages 3443, 2003.

Integrated Intelligent Research (IIR)

[13]Ajith

Abraham. Business intelligence from web


usage mining. Journal of Information &
Knowledge Management, 2(4):375390, 2003.
[14]Molla,

Alemayehu, and Paul S. Licker. "ECommerce Systems Success: An Attempt to


Extend and specify the Delone and MaClean Model
of IS Success." J. Electron. Commerce Res. 2.4
(2001): 131-141.
[15]Isakowitz,

Tomas, Michael Bieber, and Fabio


Vitali.
"Web
information
systems." Communications of the ACM 41.7
(1998): 78-80.
[16]Mobasher,

Bamshad, et al. "Improving the


effectiveness of collaborative filtering
on
anonymous web usage data." Proceedings of the
IJCAI 2001 Workshop on Intelligent Techniques
for Web Personalization (ITWP01). 2001.
[17]Zahid

Ansari, M.F.Azeem, A. Vinaya Babu and


Waseem Ahmed, A Fuzzy Approach for Feature
Evaluation and Dimensionality Reduction to
Improve the Quality of Web Usage Mining
Results
,
Intl.J.Advances
Science
and
IT,Vol.2,2012.
[18]Zahid

Ansari, A. Vinaya Babu, Waseem Ahmed


and Mohammad Fazle Azeem, A Fuzzy Set
Theoretic Approach to Discover User Sessions
from Web Navigational Data, in International
Conference on IEEE Recent Advances in Intelligent
Computational Systems, Trivandrum, pp. 879-884,
Sep. 22-24 2011.
[19]Zahid

Ansari, M. F. Azeem, A. V. Babu and W.


Ahmed. Preprocessing User's Web Navigational
Data to Discover Usage Patterns, in proceedings
of The Seventh International Conference on
Computing and Information Technology, Bangkok,
Thailand, pp. 184-189. May 2011.
[20]Zahid

Ansari , Mohammad Fazle Azeem, A.


Vinaya Babu and Waseem Ahmed. A Fuzzy
Clustering Based Approach for Mining Usage
Profiles from Web Log Data International
Journal of Computer Science and Information
Security, (ISSN 1947-5500), IJCSIS Publications ,
pp. 70-79 Vol. 9, No. 6, June 2011.
[21]Zahid

Ansari, Waseem Ahmed , M.F. Azeem


and A.Vinaya Babu. Discovery of Web Usage
Profiles Using Various Clustering Techniques.
International Journal of Computer Information
Systems,
(ISSN 2229-5208, Silicon Valley
Publishers, pp. 18-27 Vol. 1, No. 3, July 2011.
[22]Zahid

Ansari, Discovery of Web User Session


Clusters Using DBSCAN and Leader Clustering
Techniques, International Journal of Research in
Applied Science & Engineering Technology
(iJRASET), (ISSN:2321-9653), pp. 209-207, Vol 2,
Issue. 12, December 2014.

423

Volume: 04, December 2014,


Pages: 418-424

International Journal of Data Mining Techniques and Applications


ISSN: 2278-2419

[23]Zahid

Ansari, A. Vinaya Babu, Waseem Ahmed


and Mohammed Fazle Azeem. A Comparative
Study of Mining Web Usage Patterns Using
Variants of k-Means Clustering Algorithm.
International Journal of Computer Science and
Information Technologies, . (ISSN: 0975-9646),
Tech Science Publications , pp. 1407-1413 Vol. 2
No. 4, July 2011.
[24]Zahid

Ansari, Mohammed Tajuddin, Syed Ab.


Sattar, Discovery of Web User Session Clusters
Using Partitioning Based Clustering Techniques,
International Journal of Computer Technology and
Applications (IJCTA) (ISSN:2229-6093), pp. 20492056, Vol 5, No. 6, Nov - Dec 2014.
[25]Zahid

Ansari, Web User Session Cluster


Discovery Based on k-Means and k-Medoids
Techniques, International Journal of Computer
Science & Engineering Technology (IJCSET),
(ISSN : 2229-3345), .pp. 1105-1113, Vol 5, No. 12,
December 2014.
[26]L.

Kaufman, P.J. Rousseeuw, Finding Groups


in Data. An Introduction to Cluster Analysis,
Wiley, New York, 1990.
[27]B.

Mobasher. Data mining for web


personalization. Lecture Notes in Computer
Science, 4321:90, 2007.
[28]Hasan,

M. A., Chaoji, V., Salem, S. and Zaki,


M. J. (2009): Robust partitional clustering by
outlier and density insensitive seeding.
[29]M.

L. Raymer, W. F. Punch, E. D. Goodman,


and L. A. Kuhn, Genetic programming for
improved data mining: An application to the
biochemistry of protein interactions, in Proc. 1st
Annual. Conf. Genetic Programming1996, Stanford
Univ., CA, July 2831, 1996, pp. 375380.

memory parallel programming. Vol. 10. MIT


press, 2008.
[34]Zahid

Ansari , Mohammad Fazle Azeem, A.


Vinaya Babu and Waseem Ahmed. A Fuzzy
Clustering Based Approach for Mining Usage
Profiles from Web Log Data International Journal
of Computer Science and Information Security, pp.
70-79 Vol. 9, No. 6, June 2011.
[35]A.

B. Tickle, R. Andrews, M. Golea, and J.


Diederich, The truth will come to light: Directions
and challenges in extracting the knowledge
embedded within trained artificial neural
networks, IEEE Trans. NeuralNetworks, vol. 9,
pp. 10571068, 1998.
[36]H.

J. Lu, R. Setiono, and H. Liu, Effective data


mining using neural networks, IEEE Trans.
Knowledge Data Eng., vol. 8, pp. 957961, 1996.
[37]I.W.

Flockhart and N. J. Radcliffe, A genetic


algorithm-based approach to data mining, in
Proc2nd Int. Conf. Knowledge Discovery
DataMining (KDD-96). Portland, OR, Aug. 24, p.
299, 1996.
[38]Chapman,

Barbara, Gabriele Jost, and Ruud


Van Der Pas. Using OpenMP: portable shared
memory parallel programming. Vol. 10. MIT
press, 2008.
[39]B.

Barney, Introduction to Parallel Computing,


Lawrence Livermore National Laboratory, 2007.
[40]W.

Gropp, S. Huss-Lederman, A. Lumsdaine et


al., MPI: TheComplete Reference, theMPI-2
Extensions, vol. 2, TheMITPress,1998.
[41]W.

Gropp, S. Huss-Lederman, A. Lumsdaine et


al., MPI: TheComplete Reference, theMPI-2
Extensions, vol. 2, TheMITPress,1998.

[30]Zadeh,

Lotfi A. "Toward a theory of fuzzy


information granulation and its centrality in human
reasoning and fuzzy logic." Fuzzy sets and
systems 90.2 (1997): 111-127.

[42]J.Dean

[31]M.K.

[43]Ranger

Pakhira, A Modified k-means Algorithm


to Avoid Empty Clusters, International Journal of
Recent Trends in Engineering Vol 1, No. 1. 2009.
[32]Mitra,

Sushmita, Sankar K. Pal, and Pabitra


Mitra. "Data mining in soft computing framework:
a survey." IEEE
transactions on neural
networks 13.1 (2002): 3-14.
[33]Chapman,

Barbara, Gabriele Jost, and Ruud


Van Der Pas. Using OpenMP: portable shared

Integrated Intelligent Research (IIR)

and S.Ghemawat, MapReduce:


simplified data processing on large clusters,
Communications of the ACM, vol. 51, no. 1, pp.
107113, 2008.
et. al, Evaluating MapReduce for
multi-core and multiprocessor systems, in
Proceedings
of
the
13th
IEEE
InternationalSymposium on High Performance
Computer Architecture (HPCA07), pp. 1324,
Scottsdale, USA, February 2007.
[44]NVIDIA,

CUDA C Programming Guide, no.


July. NVIDIA Corporation, 2013.
[45]Wikipedia,

General-purpose computing on
graphics processing units, 2013.

424

You might also like