Professional Documents
Culture Documents
Proceedings of ICEEM-2016
Abstract -With the explosive growth of internet, large number of users are doing online search to satisfy their information need.
Web Usage Mining plays an important role in discovering knowledge representing the online users behaviour from the available
web log data. Satisfying online users need by the traditional web usage mining system is a challenging task as it solely constructed
by the web usage data of online users. Web-page recommendation is used to effectively capture intuition of online users. In order
to make Web-page recommendation system to accurately capture the intuition of the users, we proposed two novel knowledge
representation models to provide semantic enhancement to the web-page recommender system. The first model, namely semantic
network of a website, which represents domain knowledge by domain terms, Web-page and relations between them. Web Usage
model generates the frequent web access patterns by sequential pattern mining algorithms based on the usage data from the web
server. The second model, namely Conceptual Prediction Model (CPM), which integrates the semantic knowledge with the web
usage model resulting in weighted semantic network of semantic web usage knowledge. CPM constructs weighted semantic network
with the Frequently Viewed Terms as nodes, where weight represents the probability of transition between adjacent terms, using
Markov models.
Index terms: Web Usage mining, semantic knowledge, conceptual prediction model, semantic network, domain terms.
By using this knowledge, when user comes online for the
next time, they predict next Web-page(s) that user most
likely to visit, given the current Web-page and previously
visited k- Web pages.
The performance of these approaches depends on
the sizes of training usage datasets. The bigger the
training dataset size is, the higher the prediction accuracy
is. The main drawback of these Web-page
recommendations are that they solely based on the Web
access sequences learnt from the Web usage data.
Therefore, if a user is visiting a new Web-page that is not in
the training usage data, then these approaches does not offer any
recommendations to this user. This problem is referred to as
new-item problem.
Some studies are showing that semanticenhanced approaches are used to overcome these newitem problem [2],[3] by using domain ontology.
Integrating domain knowledge with Web usage
knowledge improves the prediction accuracy of the
recommender systems using ontology based Web mining
techniques [4][6].Web usage mining enriched with
semantic information showed higher performance than
classic Web usage mining algorithms [5]-[6]. However,
the main issue in these approaches are the problem facing
in representing and acquiring the semantic domain
knowledge. A lot of researches are going in this domain
ontology.
The domain ontology are mostly used to
represent the semantics of a website, which can be
constructed manually by experts or automatically by
learning models, such as the Bayesian network or a
collocation map, for many different applications. Given
the very large size of Web data in todays websites,
building ontology manually for a website is challenging
task and they are time consuming and less reusable.
According to Stumme, Hotho and Berendt, it is
1. INTRODUCTION:
Web Mining is the major area in data mining
applications which discover patterns from the web data,
in order to better understand the needs of web-based
applications. Web mining can be divided into three
different types, which are web usage mining, web content
mining and web structure mining. Web Usage Mining
(WUM) is the process of discovering or extracting
patterns from the users access data in the web. Usage
data of the user is collected from one or more Web
servers. Web usage mining is very useful in
understanding the users interests and their network
behaviours. A typical application of WUM is represented
by the recommender system.
The main goal of a Web-page recommender
system is to effectively forecast the Web-page(s) that will
be visited next while user navigating through the website.
Web-Page recommendation is a system that captures
intuition of online users by their browsing patterns and
recommending those to users in the form of links to
stories, books, or interested pages. There are lot of
difficulties in developing an effective Web-page
recommender system, such as how to effectively learn the
users online behaviour and Web-page navigation
patterns from available historical usage data and, how to
discover these knowledge, and how to make online
recommendations system based on the discovered
knowledge.
In order to efficiently represent Web access
sequences (WAS) from the Web usage data, some studies
shown that approaches based on the tree structures and
probabilistic models are used [1]. These approaches are
using the historical web usage data and construct user
profile, which consist of links between Web-pages that
user are mostly interested, based only on the usage data.
www.iirdem.org
14
IIRDEM 2016
ISBN: 978-81-930654-7-5
Proceedings of ICEEM-2016
impossible to manually discover the meaning of all Webpages and their usage for a large scale website [10].
Automatic construction of ontologies saves time and
discovers all possible concepts within a website and links
between them, and they are reusable. However, the
drawback of this automatic approach is the need to
design and implement the learning models which can
only be done by professionals at the beginning.
This paper presents a novel method to provide
better Web-page recommendation by integrating Web
usage and domain knowledge. Two new knowledge
representation models and a set of Web-page
recommendation strategies are proposed in this paper.
The first model is a semantic network that represents
domain knowledge, which can be constructed
automatically. As it is fully automated, it can be easily
integrated with the Web-page recommendation process.
The second model is a conceptual prediction model,
which is a navigation network of domain terms based on
the frequently viewed Web-pages. This represents the
integrated Web usage and domain knowledge which
supports Web-page prediction and it can also be
constructed
automatically.
The
proposed
recommendation strategies predict the next pages with
probabilities for a given Web user based on his or her
current Web-page navigation state through these two
models. This new method has automated the knowledge
base construction and alleviated the new-item problem.
This method yields better performance compared with
the existing Web usage based Web-page recommendation
systems.
This paper is structured as follows: Section 2
discusses about the related works; Section 3 briefs the
architecture diagram and the implementation of web
usage mining. Section 4 presents the first model, i.e. a
semantic network of domain terms. Section 5 presents the
second model, i.e. a conceptual prediction model (i.e.
integrating the semantic knowledge with the web-page
recommendation). For each of the models presented in
Sections 4-6, the corresponding queries that are used to
retrieve semantic information from the knowledge
models have been presented. Section 6 presents a set of
recommendation strategies based on the queries to make
semantic-enhanced Web-page recommendations.
2. LITERATURE SURVEY:
Research work related to the web-page
recommender system that combines the web usage
mining with the semantic knowledge is very limited. So
they can be classified by the following two approaches :
www.iirdem.org
15
IIRDEM 2016
ISBN: 978-81-930654-7-5
Proceedings of ICEEM-2016
www.iirdem.org
3 ARCHITECTURE OF WEB-PAGE
RECOMMENDER SYSTEM:
The
implementation
of
the
recommendation system is taken place in two
components: offline and online. Offline component
builds the knowledge base by analysing the historical
data, such as server access log file or web logs which are
captured from the server, then these web logs are used in
the online component for capturing intuition list of the
user so as to recommend page views to the user whenever
user comes online for the next time. Data collection, data
pre-processing, pattern discovery and pattern analysis
are the steps to be taken in web usage mining in offline
phase.
16
IIRDEM 2016
ISBN: 978-81-930654-7-5
Proceedings of ICEEM-2016
i.
55%.
www.iirdem.org
17
IIRDEM 2016
ISBN: 978-81-930654-7-5
Proceedings of ICEEM-2016
represents concepts as domain terms and Web-pages, and
relations between the concepts. To construct the semantic
network, domain terms are collected from the Web-page
titles and then we extract the relations between these
terms by these two aspects: (i) the collocations of termsdetermined by the co-occurrence relations of terms in
Web-page titles; and (ii) the associations between terms
and webpages.
In order to know how these terms are
semantically related, the domain terms and co-occurrence
relations are weighted. Based on these relations, we can
guess how closely the Web-page is associated with each
other semantically. To infer the semantics of Web-pages,
we can query about the relations including relevant pages
and key terms for a given page, and the pages for given
terms, thereby achieving semantic enhanced Web-page
recommendations. This semantic network is considered
to be TermNetWP.
The following are the procedures to automatically
construct TermNetWP:
1) Collect the titles of visited Web pages.
2) Extract term sequences from the Web-page
titles.
3) Build the semantic network TermNetWP.
4) Implement an automatic construction of
TermNetWP.
To reuse and share the domain term network by
Web-page recommender system, TermNetWP is
implemented in OWL. The input to this network is a term
sequence collection (TSC), in which each record consists
of:
1) The PageID of a Web-page d D;
2) A sequence of terms X = t1 t2 . . . tm TS, m >0, extracted
from the title of the Web-page;
3) The URL of the Web-page.
www.iirdem.org
18
IIRDEM 2016
ISBN: 978-81-930654-7-5
Proceedings of ICEEM-2016
4 TermNetWP ALGORITHM:
4.1 Definitions of TermNetWP
The
notations
used
in
TermNetWP
are
Class OutLink involves two object properties: (i) fromInstance defines one previous term instance, and (ii) toInstance defines one next term instance. Class Instance
also has two object properties: (i) hasOutLink, which is
the inverse of from-Instance relation, and (ii)
fromOutLink, which is the inverse of to-Instance
relation.
summarized as follows:
4.3 Queries
Based on TermNetWP, we can query: (i) domain
terms for a given Web-page, and (ii) Web-pages mapped
to a given domain term.
(dj, t) = =0 (, ) + (, )
www.iirdem.org
19
IIRDEM 2016
ISBN: 978-81-930654-7-5
Proceedings of ICEEM-2016
, =
=1 ,
(1)
(2)
(3)
,,
,
(4)
5. TermNavNet ALGORITHM:
In Section 4, we presented TermNetWP, which
represents the semantics of Web-pages within a website
efficiently but they are not sufficient for making effective
Web-page recommendations on their own. To overcome
this issue, we should integrate the TermNetWP with Web
usage knowledge to obtain the semantic Web usage
knowledge.
The notations used to represent the TermNavNet are
summarized as follows:
x: Number of occurrences of tx in F;
x, y: Number of times that tx followed by ty in F and there is no
term between them;
www.iirdem.org
20
IIRDEM 2016
ISBN: 978-81-930654-7-5
Proceedings of ICEEM-2016
Recommendation strategy-1 uses TermNetWP and the firstorder CPM:
Step 1 builds TermNetWP;
Step 2 generates FWAP using LL-Mine;
Step 3 builds FVTP;
Step 4 builds a 1st-TermNavNet given FVTP;
Step 5 identifies a set of currently viewed terms
{tk} using query Querytopic (dk) on TermNetWP;
Step 6 infers next viewed terms {tk+1} given each
term in {tk} using query Recterm (tk) on the 1st-order
TermNavNet;
Step 7 recommends pages mapped to each term
in {tk+1} using query Querypage (tk+1) on TermNetWP.
Recommendation strategy-2 uses TermNetWP and the secondorder CPM:
Step 1 builds TermNetWP;
Step 2 generates FWAP using LL-Mine;
Step 3 builds FVTP;
Step 4 builds a 2nd-order TermNavNet given
FVTP.
Step 5 identifies a set of previously viewed terms
{tk-1}, and a set of currently viewed terms {tk} using query
Querytopic (d), d {dk-1, dk}, on TermNetWP;
Step 6 infers next viewed terms {tk+1} given each
pair {tk-1,tk} using query Recterm(tk-1, tk) on the 2nd-order
TermNavNet;
Step 7 recommends pages mapped to each term
in {tk+1} using query Querypage (tk+1) on TermNetWP.
5.3 Queries
RecTerm (tx, ty) is used to query the next viewed
terms for a given current viewed term curt and previous
viewed term prt by applying second order transition
probability. If first-order transition probability is used
and we want to query the next viewed terms for a given
current viewed term curT using the query RecTerm (tx).
6.
SEMANTIC-ENHANCED
WEBPAGE
RECOMMENDATION
STRATEGIES
Precision=
(5)
www.iirdem.org
||
||
Satisfaction =
21
||
||
(6)
IIRDEM 2016