You are on page 1of 5

Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing

Study and Application of Web-based Data Mining in E-Business

Yanguang Shen Lili Xing Yiting Peng


College of information and College of information and National Natural Science
Electronic Engineering Electronic Engineering Foundation of China
Hebei University of Hebei University of Beijing,100085, China
Engineering Engineering
Handan, 056038, China Handan, 056038, China
zhbaizh@163.com baobao2005lili@yahoo.com.cn

Abstract Internet efficiently, in order to help the data owners


discover the valuable information and then make right
The paper is engaged in a discussion over decisions in business, becomes the urgent problem that
applications of Web mining to the intelligent search e-business operators are concerned about. The rapidly
engine, customer relationship management, developing technology of Web-based data mining
personalized service and commercial credit evaluation offers an effective approach to the problems that e-
in e-business. And analysis and reasoning of the mass business is confronted with.
of information in e-business are made by the
technology of Web mining, which can dig out potential 2. Web mining
modes and predict customers' action, to help
enterprises’ decision-makers adjust their marketing 2.1. Summarization of Web mining
strategy, reduce the risk, make right decisions and get
competitive advantage. Data mining is an uncommon course to extract the
previously unknown and potentially useful information
and knowledge from massive, incomplete, disturbed,
1. Introduction fuzzy and random data. Web mining tries to extract
interesting, potentially useful and hidden information
Currently, e-business, with its advantages of low- from the documents and activities on the Web to help
cost, high efficiency, free from the restrictions of time people abstract knowledge from WWW by way of data
and space, gradually prevails in the globe. But mining to some extent, which is a cross field of
meanwhile it is confronted with many problems. How database, data mining, artificial intelligence,
to collect the information about the environment inside information retrieval, natural language understanding
and outside of the enterprise comprehensively, and so on.
accurately and timely through the Internet, especially
the hidden information that is the key to the success of 2.2. Data sources of Web mining in e-business
the enterprise, so as to adapt to the changes of the
market and enhance the competitiveness? How to There is a great deal and large variety of data in e-
adjust the marketing strategy and optimize the structure business which can be used for data mining analysis.
of the website and the way of service according to the Usually the types of data which can be used for Web
access habit of customers to increase the efficiency of mining to produce different knowledge modes are as
the website, in order to improve the customer follows [1]:
relationship management and realize the lifetime value 1. Server data. Customers will leave their respective
of customers? How can enterprises ingratiate the log data on Web servers when visiting these sites.
customers, realize recommendation on their own These log data are usually stored in server in the form
initiative and offer personalized service to them? These of document files, generally including server logs,
problems are the key to the success of the e-business. error logs, cookies logs and so on.
In the new mode of business, how to organize and 2. Query data. Query data is a typical kind of data
make full use of the abundant information on the produced on e-business Web servers. For example,

0-7695-2909-7/07 $25.00 © 2007 IEEE 812


DOI 10.1109/SNPD.2007.117
customers stored on line perhaps search for some targeted pages so as to meet the specific requirements
products and some advertisement information, and this of visitors.
query information is just related to the server log 4. Classification and prediction. The discovery of
through cookies or register information. classification gives a description of the public
3. On-line market data. The major part of the data is attributes to a special group, which can be used to
about e-business websites, purchases of customers, classify new items. The purpose of classification is
merchandises and so on, which is stored in traditional mapping the data items in the database to one of the
relational databases. given types by constructing a classification model or
4. Web pages. Web pages include HTML or XML classifier so that it can be used in prediction. That is to
pages, which comprise texts, pictures, audio, and video say, historical data record is used to give the extended
and so on. description of the given data automatically, so that
5. Hyperlinks between Web pages. It is an future data can be predicted and some business
important resource, which indicates the relation of activities suitable for a particular clan of customers can
hyperlinks between pages. be carried out.
6. Customer registration information. It is the 5. Cluster analysis. Customers with similar
information that customers have to input via a Web characteristics can be gathered from the Web-visiting
page and submit to the server. It is usually about the information with cluster analysis. Clustering customer
demographic characteristics of users. In Web mining, information or data items in Web logs facilitates the
customer registration information should be integrated development and implementation of future marketing
with visiting logs to improve the accuracy of data strategies, which include sending email for sales
mining and produce more knowledge about customers. automatically to a specific customer cluster,
recommending specific commodities to customers of a
2.3. Knowledge schemas that Web mining can certain cluster and so on. For e-business, customer
acquire clustering can provide strong theoretical support to
market segmentation. By extracting features from
Web mining techniques can be employed to dig out clustered customers, e-business website can provide
some relevant knowledge schemas from various data their customers with personalized service.
sources of websites to guide operators of the sites to 6. Anomaly detection. Anomaly detection describes
work better and provide better services to customers. the minority and extreme cases of the analyzed object
Usually, Web mining can be used in websites to mine revealing the internal causes so as to reduce operation
out the following knowledge schemas. risks. The application of anomaly detection in e-
1. Path analysis. It can be used to determine the business can be reflected in credit card fraud screening,
most frequently visited path in a Web site. Through unusual customer detection, and network intrusion
path analysis, we can find important pages and detection and so on [2].
improve the design of Web pages and the structure of
the Web site. 3. Application of Web mining to e-business
2. The discovery of association rules. The mutual
relations between the various documents customers 3.1. Intelligentized search engine based on Web
visited on the website can be found with the discovery mining
of association rules in e-business. The correlation
between pages and the relativity of the purchases can Current search engines have some disadvantages
be found as well. With these relativities in mind, the such as low precision and the great deal of useless
sites can be better organized and effective marketing information returned, so that e-business enterprises
strategies can be put in practice to increase cross-sales. can't acquire enough crucial information to enhance
Meanwhile, the customers’ burden of filtering competitiveness. The technology of Web mining and
information can be reduced greatly. the search engine can be combined to make the
3. The discovery of sequential patterns. It is the intelligent search engine to meet the needs of e-
finding in time-stamp-orderly sequence transactions of business enterprises.
those internal models of transactions in the way of The follows are some aspects of Web mining
“some items following another”. It can facilitate e- mainly adopted by search engines: automatic document
businesses predicting client access mode and help offer classification, automatic abstract formation, online
targeted advertising services to clients. The discovery clustering of retrieval results, relevance ranking and
of a series of models can help the server choose the personalized search engine [3]. We can sort the
retrieval results by document classification to help

813
users locate the object knowledge fleetly. Most search different categories to improve the satisfaction of
engines automatically intercept the first several customers and to maintain old customers consequently.
sentences of a document and make an abstract with We can determine which category a new visitor
fixed words, so that it has the fault of reflecting belongs to and whether he or she is potentially
information incomprehensively. Automatic abstract profitable by analyzing the records of the pages that the
formation can resolve this problem and help users get visitor browses, so that we can deal with different
the retrieved information more accurately, more customers accordingly , reduce the cost of sales,
conveniently and faster. The clustering of the retrieved increase the rate of transforming from visitors to
document set can assemble the relevant documents to buyers, and hence dig out potential customers;
keep away from those irrelevant. The processed Customers with similar browsing behaviors are
information can be offered to users in the visual grouped together and their common features are
hierarchical form of the hyperlink construction. And extracted, so that customers can be clustered, which
users can then choose their favorite clusters to reduce can help e-business enterprises to better understand
the number of Web pages to be browsed. The search customers’ interests, consuming habits and trends,
engine combined with personalized technology of Web predict customers’ needs, recommend specific
mining can get the intrinsic characters of data objects commodities to them accordingly and realize cross-
on the basis of numerous training samples and extract selling [4]. The trading volume and the rate of
information purposively according to them, so that it successful trades will be increased and the efficiency
expands the search engine’s store of key words users of distribution will be improved.
have searched by according to users’ preferences and In addition, the structure and content of the site is
consequently the retrieval results can meet users’ needs the key to customers’ interest. With the discovery of
more closely. Or it can set up users’ interest library on association rules, we can rearrange them dynamically
the analysis of the information users have browsed. In for different customers, and put together the
a word, the personalized search engine can enhance the commodities with some degree of support and trust to
recall and precision of search. promote sales; By the means of path analysis we can
identify the paths along which a category of customers
3.2. Application of Web mining to Customer visit the site frequently. These paths reflect the
Relationship Management (CRM) sequence and habits of such customers visiting pages
of the site. We can hyperlink the related documents
The core of CRM, on one hand, is to discover customers have visited in order that they can access
potential markets and customers by collecting effective their favored easily. Such a site will leave a good
data about customers and their activities; and on the impression on customers, strengthen their loyalty,
other hand, is to meet customers’ needs and to realize arouse their interest, prolong their time present on the
customers’ lifetime value by improving the customer site and increase their chance of visiting again.
service and a deep analysis of customers. CRM By Web mining, we can acquire reliable market
provides traditional enterprises with management feedback to evaluate the rate of return on
systems and technical artifices for their survival in the advertisement investment and decide whether the
network economy era. It requires enterprises to transfer online marketing mode is successful or not; According
from the "product-oriented" model to "customer- to the browsing mode of visitors interested in a certain
oriented". product, we can determine the location of the
advertisement to increase the pertinence and the rate of
3.2.1. Application of Web mining to CRM. Web return on advertisement investment and reduce
mining can help enterprises identify customers’ companies’ operating costs.
features, which enables enterprises to provide targeted
services for customers. Web mining used in CRM of e- 3.2.2. Safeguard the privacy of customers.
business has several aspects, such as the acquisition Safeguarding the privacy of customers is a basic part in
and maintenance of customers, identification of the commercial operation that can not be ignored.
value of customers, analysis of customers’ satisfaction, Therefore, as an e-business enterprise, mining an
and improvement of site structure and so on. individual customer should be avoided. Customers’
With Web mining, we can understand the dynamic privacy should be protected with both technology and
behavior of visitors and optimize the operation mode management. Technologically, we usually adopt the
of e-business websites. We can put the large number of encrypted identifier, and minimize individual customer
customers acquired into different categories and data mining. In management, many enterprises have
provide personalized services for customers from added the position of Chief Privacy Officer, who
makes an appropriate balance between the individual

814
demand for privacy and the right of using private in e-business. It is the executor of the “customer-
materials in a reasonable way on the part of the oriented” ﹑“one-to-one” sales principle.
enterprise [5]. E-business enterprises manage the The personalized service recommendation system
protection of customers’ privacy as a sole main body. mainly applies the ideas and methods of data mining to
In addition, trade self-regulation is an effective way to such resources as Web server logs and Web databases
protect customers’ privacy. At present, e-business [6]. It mines the regular visiting patterns of users, puts
websites are more and more inclined to establish their them into particular categories and recommends Web
self-images in customers by way of self-regulation, so pages accordingly. And the system can adjust the
that customers can submit data free of worry. recommendation set timely to provide personalized
access for users by constantly tracing their current
3.3. Application of Web mining to the access. It is composed of five modules: data collection,
personalized service recommendation system data preprocess, data storage, offline mining and online
recommendation. The system construction model is as
It can help enterprises implement better CRM to figure 1 shows.
setup the personalized service recommendation system
Data collection module Data preprocess module Data storage module

Web
database Data cleaning
User identification User
Session identification transaction
Transaction identification library
Usage log Site files Path complement
library

Offline mining module


Useful
mode Mode analysis Mode rule set Mode discovery Data mining engine
set

Online recommendation module


User session User session
Recommendation Web
engine Personalized service
Recommendation server
rule set Improvement of the Network
structure of site centre

Figure 1. Construction model of personalized service recommendation system based on Web


mining.
The data collection module can collect such data as preprocessed data into the transaction library. The
Web databases and usage logs to prepare for later mining engine of the offline mining module uses the
mining. The data preprocess module preprocesses the data mining technology in the algorithms library, such
collected data, its process including data cleaning, user as statistical analysis, association rules, cluster analysis
identification, session identification, transaction and sequence patterns. It serves to discover the
identification and so on. The quality of the data browsing modes of users, and analyze and translate
preprocess is greatly correlated to the efficiency and them through mode analysis. Based on practical
result of mining. Data storage module stores the application it can change the statistic results,

815
discovered rules and modes into knowledge through abnormal examples by Web mining. In addition, we
observation and selection. The useful mode that has can track the operation of enterprises by data mining
been chosen out can guide the practice of e-business. technology to evaluate the enterprise assets, analyze its
The online recommendation module sets a profitability, predict developing potential, form perfect
recommendation engine in the front of the Web server. security assurance system, implement monitor and
It generates the corresponding recommendation set, control of the whole course online, supervise of free
combining the users’ current browsing and the Web presses online, stick up for enterprise credibility,
page recommendation set the users have browsed. And strengthen the security management of Internet
then recommendation set pages are added to the newly transactions and online payment. With the credit
requested pages, which are transmitted to users’ evaluation model of data mining, we can find the data
browsers through the Web server, realizing real-time features of customers’ transactions and set up customer
personalized services. At the same time, the credit level, in order to prevent and reduce the credit
recommendation results are transmitted to the site risks effectively and enhance the level of enterprise
management centre for it to adjust the site design, credit discrimination and risk management through
optimize its structure and enhance its efficiency. mining the amount data of history transactions [7].
In a word, the personalized service recommendation
system with the data mining technology has two stages. 4. Conclusions
The first stage is studying offline. The second is the
online using of the mode. The feature capture of Hidden knowledge behind massive data can be
mining and online recommendation and the generation found by e-business-oriented Web mining, which can
of rules are processed offline. But online services are guide e-business enterprise to increase sales, improve
provided through the online recommendation engine the enterprise-customer relationship and increase the
when users access the site. The online and offline operating efficiency of the website. The development
modules are interrelated to each other. The online and application of e-business-oriented Web mining has
module offers recommendation based on the rule a good prospect and will be paid more and more
models provided by the offline module; and the offline attention to.
module generates correspondent rules with the data
accumulated online and by recommendation 5. References
algorithms. The mining algorithm and recommendation
strategy can be chosen according to the need of sites of [1] Zhong Xie, “Recommendation System of Commercial
different styles. The mining results and Site Research Based on Web Data Mining”, Master’s Degree
recommendation set are fed back to users by the Thesis of Southwest Normal University, 2002(05), pp. 8-10.
recommendation engine. The access information will
be recorded in the server after users login the site. And [2] Fengzhao Yang, and Hui Bai, “Application of Anomaly
after being preprocessed these data will be processed Detection in E-business”, Information Magazine, Xi’an,
using mode identification and mode analysis with 2005(12), pp.51-53.
concrete mining algorithm and recommendation
[3] Yan Li, Xinzhong Chen, and Bingru Yang, “Research on
strategy in the exclusive data mining module. Users’ Web Mining-based Intelligent Search Engine”, Computer
access information is also transmitted to the Engineering and Applications, Beijing, 2002(04), pp.34-36.
recommendation engine, which extract the
correspondent user’s mining results and [4] Xiang Su, Weiling Jiao, and Pei Wu, “Application and
recommendation set from the mining module and feed Study of Web Mining”, Infotmation science , Theory &
them back to users visually to realize personalized Application, Beijing, 2005, 28(06), pp.651-655.
services.
[5] Jing Hao. “Application of Data Mining in the CRM of E-
3.4. Application of Web-based data mining to business”, Master’s Degree Thesis of Wuhan University.
2005(05), pp.47-53.
commercial credit evaluation
[6] Fenghui Li, “On Ec-oriented Web Data Mining”,
Developed social credit is the important foundation Master’s Degree Thesis of Shandong Science and
for the development of e-business. It can prevent the Technology University,2004(06), pp.35-79.
risks of investment and operation effectively with the
statistics of the differences between the data of website [7] Chuiwei Lu, “Study and Application of Data Mining
and the historical record, the deviation between the Technology in E-business”, Market Modernization, Beijing,
results and expectations and full analysis of the 2006(04), pp. 87.

816

You might also like