You are on page 1of 5

International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 9 Sep 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 2996



An Effective Algorithm for Mining and Grouping Online Transactions in
Online Systems
NarasimhamParimi
1
, Mirza Mohsin Raza
2
, Prof. S.V.Achutha Rao
3

1
pursuing M.Tech(CSE), Vikas College of Engineering and Technology, Nunna, Vijayawada. JNTU-K, India
2
working as a Associate Professor in Department of CSE at Vikas College of Engineering and Technology , Nunna,
Vijayawada, India.
3
working as a Professor & Head Department of CSE at Vikas College of Engineering and Technology ,Nunna,
Vijayawada, India.


Abstract: Online transaction data between online visitors and
online functionalities usually convey users task-oriented behavior
models. Grouping online transactions might be captured
knowledge which provides information, in return, creating user
accounts, which may be associated with different navigational
models. Some future online applications like, online
recommendations or online personalized applications, the
previous related works is most important to make online users get
their preferred information accurately. We demonstrated
usability and scalability of the proposed approach through
performing experiments on two real world data sets. The practical
results have proved the methods effectiveness in comparison with
some previous studies.

Keywords: clustering, recommendations, scalability.

I. INTRODUCTION

With the popularizing and spreading of online
application, now a days online has become a strong platform
for, not restricting to retrieving data, and also finding
knowledge, fromonline data storages. Generally, online users
may show different behavior types associated with their
information needs and intended tasks when they are traversing
the Online. These task-oriented behaviors are explicitly
characterized by sequences of clicks on different online items
purchased by customers. Thus as result, those tasks are
internally captured by inducing the underlying relationships
among the click-streamdata. For example, image a online site
designed for information about automobiles; there will be a
variety of customer groups with various access interests during
their visiting such an E-commerce online site. One type of
customers intends to make comparison before to purchasing a
customer willing to purchase specific type car of wagon, for
example, would have to browse the online pages of each
company, compare their offers, where like another one will just
be more interested in one specific brand car, such as Ford,
rather than one specific car category..In online data mining
research, many data mining techniques, such as clustering is
adopted widely to improve the usability and scalability of
online mining. Access transaction over the online can be
expressed in the two finite sets, user transaction and
hyperlinks/URLs. A user transaction U is a sequence of items,
this set is formed by m users and the set A is set of distinct n
clicks (hyperlinks/URLs) clicked by users that are U ={t1, t2, .
. . , tm} and A ={hl1, hl2, . . . , hln}, where for every ti T U
is a non-empty subset of U. The temporal order of users clicks
within transactions has been taken into account. A user
transaction t T is represented as a vector. A well-known
approach for clustering online transactions is using rough set
theory De and Krishna proposed an algorithmfor clustering
online transactions using rough approximation. It is based on
thesimilarity of upper approximations of transactions by given
any threshold. However, there are some iterations should be
done to merges of two or more clusters that have the same
similarity of upper approximations and didnt present how to
handle the problem if there are more than one transaction under
given limited value. To avoid these problems, here we are
proposing an another technique for clustering online
transaction. We use the concept of similarity class proposed by
[11]. But, the proposed technique differs on how to allocate
transaction in the same cluster and how to handle the problem
if there is more than one transaction under given threshold.
Generally, online mining techniques can be defined as those
methods to extract so-called nuggets (or knowledge) from
online data repository, such as content, linkage, usage
information, by utilizing data mining tools. Among such online
data, user click-stream, i.e. usage data, can be mainly utilized
to capture users navigational patterns and identify user willing
tasks. Once the customer moving behaviors are effectively
characterized, they will provide benefits for further online
applications, in turn, facilitate and improve online service
quality for both online-based organizations and for end users.
As a result, online usage mining recently has become one more
active and hotter topic, and a variety of research communities
fromdatabase management(DBMS), artificial intelligence(AI)
and information systems(IS) etc., have addressed this topic and
achieved great success as well [1-7]. Meanwhile, with the
International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 9 Sep 2013
ISSN: 2231-2803 http://www.ijcttjournal.org Page 2997

benefits of great progress in data mining research, many data
mining techniques, such as clustering association rule mining
and sequential pattern mining are adopted widely to improve
the usability and scalability of online mining.
While online shoppers are generally well satisfied, there will
be place to increase their satisfaction related to delivering and
getting returns. When no fee shipping is a better impresser,
getting customers back to store sites to make continues
purchases and causing shoppers to recommend an online
retailer, purchasers are interested to pay a trail fee for getting
their product fast. When making comparison on shopping,
purchasers take product price and shipping charges almost
equally into consideration.
There are several other things that retailers can do to improve
the experience for their online customers. The first thing is to
prompt the average expected delivery date of the order;
customers are willing to wait for their orders but want to know
just how long that will take. Timely receiving of products
boosts shoppers to recommend an online seller. Purchasers are
also like to know updates and delivery notifications to
understand when their package is arriving.


II-RELATED WORK

In this paper, the comparisons among the proposed
technique and the technique proposed by [11] are presented by
given two examples, where two small data sets of transactions
are considered.
The first transactions data is adopted from given in Table 1
containing four objects (|U| =4) with five hyper_links (|A| =
5). The logic of implementing three important steps. The first
one of three techniques is getting the measure of similarity that
gives information about the users access patterns related to
their common areas of interest by similarity relation between
two


U/A Hl1 Hl2 Hl3 Hl4 Hl5
T1 1 1 0 0 0
T2 0 1 1 1 0
T3 1 0 1 0 1
T4 0 1 1 0 1

In this paper, we address these issues by proposing another
alter- approach for clustering online transaction and generating
user profile. After data preprocessing, we produce a user
transaction collection and a page view corpus via user and page
view identification process respectively, in turn, construct the
session-page view matrix as usage data, in which each cell is
expressed by a weight re presenting the contribution made by a
specific page view during one user transaction. In this manner,
we could map the relationships among the co-occurrence
observations (i.e. user transactions) into a high-dimensional
space. Moreover, an improved LSA-based clustering algorithm,
named latent usage information (LUI), is proposed to find out
user segments with similar behaviors effectively and precisely
fromaforementioned usage data by using linear algebra theory,
especially single value decomposition of matrix due to
revealing deeper relationships among online transactions. The
dis-covered user clusters are exploited to generate a variety of
goal-oriented user profiles by calculating the centroid of
corresponding cluster in the form of weighted pageview set.
Experiments are conducted on two real world datasets to
validate the usability and scalability of usage mining.
Meanwhile, an evaluation metric is adopted to assess the
quality of discovered clusters, and comparisons are made with
some previous work as well. The experimental results have
shown that the proposed approach is capable of effectively
discovering user access pattern and revealing the underlying
relationships among user visiting records.

III-CLUSTRING ONLINE TRANSACTION

Here we adopt a modified standard K-means clustering
algorithm, named MK-means clustering, to classify user session
based on the transformed SP matrix over the latent k-
dimensional space. This algorithmdoes not need to predefine
value k and k initial centroids, whereas the standard k-means
has to do so to start clustering. The algorithmis described as
follows:

Algorithm: MK-means clustering
Input: usage data SP and similarity threshold
1. Choose the first user session s1 as the initial cluster
C1 and centroid of this cluster, i.e. C1={s1} and
Cid1=s1.

2. For each session si, calculate the similarity between
si and the centroids of other existing cluster
sim(si,Cidj).


3. if ''(,)max((,))ikijjsimsCidsimsCid= > , then allocate
si into Ck and re-calculate the centroid of cluster Ck
as '1kkkjCCidCs=j ;

4. Otherwise, let si itself construct a new cluster and be
the centroid of this cluster.


5. Repeat step 2 to 4 u

Output: cluster set CS={C
k
}


Shopping Experience and Satisfaction
Consumer satisfaction with online shopping overall is high,
at 86%. Online shoppers are most satisfied with ease of check-
out (83%), variety of brands/products (82%), and online
tracking ability (79%). Online shoppers are least satisfied with
feasibility of shipping, in addition to flexibility to select
delivery date (58%) and re-route packages (57%), and the ease
of making returns and exchanges (65%). In addition to ease of
making returns and inter changes, there is a chance to increase
purchaser satisfaction by having a clear return policies. Logical
International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 9 Sep 2013
ISSN: 2231-2803 http://www.ijcttjournal.org Page 2998

services may directly effect 6 out of 11 of the aspects that
influence a customers shopping experience. For retailers
looking to increase customer satisfaction, it is important to look
not only at how satisfied users are with various aspects of the
online shopping experience, but also how important this factors
are. To make this, a one-fourth analysis was performed,
mapping formulized importance of each factor versus the
comfort percentage. Items in the upper-right quadrant are those
with both high importance and high satisfaction. Due to high
priority, it is importent for retailers to continue to maintain high
levels of satisfaction on these elements ease of checking over,
different types of brands and products featured, and the ability
to create an account to store purchase history and personal
information. The factors in the bottom half of the chart are of
lower importance in driving overall online shopping
satisfaction. While frequently cited by consumers as a
compulsory, free of cost or offered shipping is actually less
important in driving overall satisfaction than those factors
mentioned above, specifically easy of checking out and
different of brands and products specified. The upper left
quadrant of the chart contains the factors driving satisfaction
that are highly important but currently have low satisfaction.
These factors a clear and easy to understand returns policy
and ease of making returns and exchanges should be areas of
focus for retailers looking to increase their overall customer
satisfaction.

Comparison Shopping
While it is important to look at what motivates customers to
return to a retailer, it is also important to look at what factors
are taken into consideration when current or prospective
shoppers are comparison shopping. When comparison
shopping, consumers take product price and shipping charges
almost equally into consideration. The result buying decision
might be that the shopper chooses to buy froma retailer who
does not offer free or discounted shipping if the total price
including shipping is less than that of a retailer offering free or
offered shipping. Itemprice and delivery rates were rated as the
most important factors in comparison purchasing. Shipping
speed, purchaser reviews, retail purchaser brand, and delivery
time feasibility are all taken into account by consumers when
comparison purchasing, but on the low rate than itemprice and
shipping charges.


Retailer Recommendation


In addition to retaining satisfied customers and attracting those
who are comparison shopping, another way retailers can
increase their business is through the recommendations of
current customers. When asked what would lead or has led to a
recommendation of a retail purchaser, the availability of free of
cost delivery or offer shipping is the main factor. Exact timing
arrival of products and free or easy returns rate as the next
important factors that prompt shoppers to recommend the
online retailer. Since 41% of shoppers said receiving my
product when expected led them to suggested a retail
purchaser, both making communication about delivery time
and reliable delivery are critical aspects to a positive customer
experience. The current study focuses on three determinants
that could influence the impact of computer-mediated
recommendations on consumers online product choices: the
nature of the product recommended, the nature of the online
site on which the recommendation is proposed, and the type of
recommendation source. Prior research has shown that the type
of product affects consumers use of personal information
sources and their influence on consumers choices suggests that
goo ds can be classified as possessing either search or
experience qualities. Search qualities are those that the
consumer can determine by inspection before to purchase, and
expected features are those that are not determined prior to
purchase. Because it is complicated or may be impossible to
evaluate experience products before purchase, consumers
should rely more on product recommendations for these
products than for search products. In support of this view, they
found that consumers assessing a search product (e.g., a 35-mm
camera) are more likely to use own-based decision-making
processes than consumers assessing an experience product, and
that consumers evaluating an experience product (e.g., a film-
processing service) rely more on other-based and hybrid
decision-making processes than consumers assessing a search
product. The nature of the online site can also influence the
impact of a given recommendation. Based on previous online
site classifications suggest that recommendation sources can be
used and promoted by three different types of online sites:
sellers (e.g., retailer or manufacturer online sites such as
Amazon.com), commercially linked third parties (e.g.,
comparison shopping online sites such as MyShine.com), and
non-commercially linked third parties (e.g., product or
merchant assessment online sites such as
Consumerreports.org). More independent online sites such as
non-commercially linked third parties that facilitate consumers
external search effort by decreasing search costs are assumed to
be preferred by consumers (Alba et al., 1997; Bakos, 1997;
Lynch & Ariely, 2000). By providing more alternatives to
choose from and more objective information, independent
online sites should be perceived as more useful by consumers.
In addition, prior research on attribution theory suggests that
consumers discredit recommendations fromendorsers if they
suspect that the latter have incentives to recommend a product
(for reviews, refer to Folkes, 1988; Mizerski, Golden, &
Kernan, 1979). According to the discounting principle of the
attribution theory (Kelley, 1973), which suggests that a
communicator will be perceived as biased if the recipient can
infer that the message can be attributed to personal or
situational causes, consumers would attribute more non-
product related motivations (e.g., commissions on sales) to
recommendation sources that are promoted by commercially
linked third parties and sellers than independent third party
onlinesites. Consequently consumers would follow product
comparision shopping
product price
shipping charges
peer review
International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 9 Sep 2013
ISSN: 2231-2803 http://www.ijcttjournal.org Page 2999

recommendations in a greater proportion when shopping on
more independent than on less independent onlinesites. In light
of research on consumers use of relevant others in their pre-
purchase external search efforts (Olshavsky & Granbois, 1979;
Price & Feick, 1984; Rosen & Olshavs ky,1987) and in
consideration of the emergence of online information sources
providing personalized recommendations (Ansari et al., 2000),
Senecal and Nantel (2002) assert that online recommendation
sources can be sorted into three broad categories: (1) other
consumers (e.g., relatives (2)human experts (e.g., salespersons,
independent experts), and (3) expert systems such as
recommender systems. We posit that these online
recommendation sources will have different levels of influence
on consumers online product selection. It suggest that
information received from sources that have some personal
knowledge about the consumer have more influence on the
latter than sources that have no personal knowledge about the
consumer. Thus, a recommendation source providing
personalized information to consumers (e.g., recommender
system) should be more influential than a recommendation
source providing non-personalized information (e.g., other
consumers).


Results of generated user profiles
We utilize aforementioned LUI method to classify user
transactions. For compari-son purpose, we also performPACT
approach based on standard K-means used in [9] to generate
user profiles. From the results, it is found that generated
profiles are overlapping of page views since some page
views are listed in more than one user clusters. Table 1 depicts
2 user profiles generated from KDD dataset using LUI
approach. Each user profile is listed in a ordered page views
sequence with weights, which means the greater weight of a
page view contribute, the more likely it is to be visited. The
first profile in Table 1 represents the activities involved in
online-shopping circumstance such as login, shopping_cart,
and checkout etc., especially occur-ring in purchasing leg-wear
products, whereas second user profile reflects customers
concern focused on the interests with regard to the department
store itself.
Analogously, some informative finding can be obtained in
Table 2, which is de-rived fromCTI dataset. In this table, three
profiles are generated: the first one reflects the main topic of
international student concerning issues regarding applying for
ad-mission, and second one involves in the online applying
process for graduation, whereas the final one indicates the most
common activities happened during students browsing the
university onlinesite, especially while they are determining
course selec-tion, i.e. selecting course, searching syllabus list,
and then going through specific syl-labus.
Pageview # Pageview content weight
29 Main-shopping_cart 1.00
4 Products-product
Detailleagwear
0.86
27 Main-Login2 0.67
8 Main-home 0.53
44 Check-expressCheckout 0.38
65 Main-welcome 0.33
32 Main-registration 0.32
45 Checkout-confirm_order 0.26

Pageview # Pageview content weight
19 Admissions-requirement 1.00
3 Admissions-costs 0.41
15 Admissions-intrnational 0.24
13 Admissions-I20visa 0.21
387 Homepage 0.11
0 Admission 0.11

Delivery Timing
As seen above, 60% of online shoppers say that an estimated or
guaranteed delivery date is important at check-out. Because
online shoppers have a range of time they are willing to wait
for the delivery of their orders, retailers that offer a range of
delivery time options allow themselves to appeal to a wider
range of customers. While 48% of customers stated that they
are not willing to wait more than 5 days for most of their
purchases, 23% said that they would be willing to wait 8 days
or more. Just over 40% of online shoppers indicated that they
have abandoned their shopping cart because of an issue with
the estimated delivery time. Of web customers that they have
removed their cart because of expected delivery date, a one-
fourth indicated that no expected delivery was made. In those
which were shown an expected delivery date and abandoned
their cart, 64% of the time the estimated delivery time was 5
days or more than that. Performing expected delivery date is a
Good win for retailers who are not currently doing so.

IV. CONCLUSION

We mapped the relationships among the co-occurrence
observations (i.e. user transactions) into a high-dimensional
space to construct the usage data in the formof session-page
view matrix. Then a dimension reducing algorithm(i.e. single
value decomposition) was employed on the usage matrix to
capture the latent usage information for partitioning user
transaction. Based on the decomposed latent usage information,
we proposed a modified k-means clustering algorithm to
generate user session clusters. Moreover, the discovered user
groups are utilized to construct user profiles expressed in the
formof a weighted page view collection, which represents the
common usage pattern associated with one kind of specific
visitors access interests. The constructed user profiles
corresponding to various task-oriented behaviors are
represented as a set of page view-weight pair collection, which
each weight represents the identity contributed by the page.
Experiments are conducted on two real world datasets to
validate the usability and scalability of usage mining.
Meanwhile, an evaluation metric is adopted to assess the
quality of discovered clusters, and comparisons are made with
some previous works as well. The experimental results have
shown that the proposed approach is capable of effectively
International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 9 Sep 2013
ISSN: 2231-2803 http://www.ijcttjournal.org Page 3000

discovering user access pattern and revealing the underlying
relationships among user visiting records as well.
The future works will be focused on the research issues,
such as performing experiments over more datasets, broadening
comparison and make use of discovered user profiles for
further online application, for example, online recommendation
and personalization.


REFERENCES

Eytan Adar. User 4xxxxx9: Anonymizing query logs.

Roberto Baeza-Yates. Online usage mining in search
engines. OnlineMining: Applications and Techniques,
2004.
B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F.
McSherry, and K. Talwar. Privacy and accuracy and
consistency too A holistic solution to contingency
table.

Michael Barbaro and TomZeller. A face is exposed
for searcher. New York Times
http://www.nytimes.com/2006/08/09/technology/09ao
l.html?ex=1312776000en=f6f61949c6da4d38ei=5090,
2006.

Avrim Blum, Katrina Ligett, and Aaron Roth. A
learning theoritical approach to non interactive
database privacy. In STOC, 2008.

J ustin Brickell and VitalyShmatikov. The cost of
privacy, destruction of data mining utility in
anonymized data publishing. In KDD, 2008.

AUTHORS PROFILE

Parimi Narasimham,
Pursuing M.Tech (CSE)
Vikas College of
Engineering and Technology
(VCET), Nunna,
Vijayawada. JNTU-K, India

Mirza Mohsin Raza, is
working as aAsst. Professor
of CSE department at Vikas
College of Engineering and
Technology (VCET),
Nunna, Vijayawada(Dist),
JNTU-K, A.P, India

Prof S.V.Achutha Rao, is
working as aHOD of CSE at
Vikas College of
Engineering and Technlogy
(VCET), Nunna,
Vijayawada, JNTU-K, India

You might also like