You are on page 1of 5

A Survey on Web User Personalization Techniques

Dhanalakshmi.D1 Graduate Student IEEE Member Dr. J.Komala Lakshmi2 IEEE Member
Research Scholar, Department of Computer Science Assistant Professor, Department of Computer Science
S.N.R SONS College (Autonomous) S.N.R SONS College (Autonomous)
Coimbatore, Tamil Nadu, India Coimbatore, Tamil Nadu, India
dhanadurairaj@gmail.com jkomalalakshmi@gmail.com

Abstract— Information on the World Wide Web (WWW) is of the emerging research fields due to the vast growth of the
growing every second by second. This vast growth causes huge web as well as users of the Web. One of the important goals for
amount of information on the web. Out of which, seekers seek the research community is, providing the best service to the
their required information. Researchers are continuously users.
focusing toward the satisfaction of user’s requirement in
Information retrieval. This is an emerging and interesting area B. Personalization
where the data mining techniques is being applied. Web mining is
the application of data mining, which includes web content Web usage patterns are extracted in any one of the sources
mining, web usage mining and web structure mining. or combination of one or more such as server web log files,
Personalization is one of the applications of web usage mining client side cookies and ISP or transaction database. Most of the
and which is helpful to provide the user required information in researches make use of server web log files for pattern
an efficient manner. In this paper, we will discuss on various extraction. [3] Provides web personalize, in which data are
techniques used in web user personalization and conclude the extracted from server log files. After completing the
paper with our findings on current web scenario and what is preprocessing task, with the help of user transactions and
expected on future work. server URL cluster is created using multi-variant k-means
method. In a web site, when a user requests a URL in an active
Keywords—Web mining, Data Mining, WWW and Machine session, the given URL is compared with URL Clusters.
Learning Accordingly a set of recommended URL’s are sent along with
the requested URL.
I. INTRODUCTION
The author’s [4] Nicolaas Matthijs and Filip Radlinski
Web is popular and primary source for information proposed personalization approach by analyzing user’s long
retrieval. Information retrieval and recommendation is an term browsing behavior. The user’s profile which is used to re-
emerging field in this digital era [1]. It consists of overloaded rank web search results according to their preferences. In this
information. Information users, among others could encounter work, they are considering entire visited web page instead of
the following problem while interacting with the web [2]. A) web snippets for extracting concepts using C/NC method and
Finding relevant information which is due to the irrelevance of viterbi algorithm. While re-ranking the search results, the terms
many of the search result and un-indexed information that is in the profile are compared with terms in the web-snippets
relevant. B) Users typically pose a short query consisting of a according to that, scores are assigned for the snippets. TF, TF-
few keywords describing their information need. Information IDF and personalized B25 weighting approaches were used for
on the web are unstructured or semi-structured which are well computing weighting for each term.
understood by humans not machines and huge amount of
information well processed by processed by machines and not Research paper [5] proposed mobile user personalization
by humans. Research need to tackle these circumstances. Web for information retrieval based on the factors such as time,
mining research area which is the aggregation of different location interest of the user. Graph-based ontological
research communities such as database, IR and AI particularly representation is used to create user profile from web usage
machine learning and NLP [2]. pattern of a user at a particular time and location. The past user
profile is compared with new submitted query to match the
relevant user profile. Based on that, research results from
II. WEB MINING
reputed search engine will be re-ranked.
A. An Overview Research paper [6] based on web user personalization with
Web Mining is the use of data mining techniques which content, location and time preference. In which, user submitted
extracts interesting unknown patterns from the web. Nowadays query is analyzed to know how much content and location
everything is digitized. In this digital era, people mostly depend concepts associated at a particular time then ontology will be
on the internet for their information need. Web mining is one created for content concept and location concept. Concepts are

978-1-4799-3975-6/14/$31.00 ©2014 IEEE


2014 IEEE International Conference on Computational Intelligence and Computing Research

extracted based on the frequency of a keyword that exists in a A. Web page recommendation
web snippet. From the user clicked results click content Research work [12] recommends web page for users in a
entropy and location entropy will be created to show how much web site with the help of domain knowledge and user
the user interested in particular content or location concepts. preference. Ontologies are constructed for domain and user
User profile will be created based on ontology and entropy. preferences.
This research makes use of Ranking SVM for learning user
preference.
B. Dynamic Personalizations
Research [7] Mobile user’s needs are depend on two Research paper [13] Users interest may change over time.
factors such as time and experience. Interests of users are not Their interests are dynamic. This paper focuses on mining
static, it is changing from time to time along with needs. To users preferences to build user profile based on temporal
create dynamic profile with the consideration of time-zones pattern analysis using temporal rule mining .This paper
and user experiences XML seems to be the best because of proposed a new algorithm called Fuzzy-Temporal Association
extensibility as well as a way to standardize the profile through Rule Mining Algorithm (FTARM).
the incorporation of XML schemas.
Research paper [8] focuses on search engine C. Personalization with Location
personalization. This work is based on both positive and Research paper [14] Mining user preferences and query
negative preferences of user. Concepts are extracted from associated information in terms of concepts. It consists of two
user’s click through web snippets to build concept-based user types of concepts such as content and location concepts.
profiles automatically. To achieve this, preference mining rules Personalization is performed using ontology based approach.
are applied. The user profiling strategies were evaluated and
compared with the personalized query clustering method. D. Document Retrieval
Research paper [9] Ontologies and the Semantic Web are Research paper [15] uses Ranking SVM for Document
two important research fields that are beginning to receive retrieval in Information Retrieval System. In IR, queries are
great attention. Semantically analyzing the web content as well used to retrieve documents. This research focuses on how to
as the user’s query is more important in the personalization intensify training for top-ranked documents and training
domain. Language Modeling and Question Answering are two performed with queries are less relevant with documents using
important Natural Language Processing (NLP) research areas RSVM with Hinge loss function and two parameters such as
that could lead to break-through in the development of gradient descent and quadratic programming for obtaining
personalized search systems. New search engines based on maximum-margin classification.
these technologies may be able to understand the users’
intention through the analysis of user-supplied natural language E. Document Clustering
questions. They may be able to better understand keywords in Research paper [16] proposed mining model consists of
the queries by recognizing various sentence types, analyze sentence-based concept analysis, document-based concept
syntax, and disambiguate word senses in context. As a result, analysis, corpus-based concept-analysis, and concept-based
search results will be more accurate, satisfactory, and reliable. similarity measure. This work identifies terms which are highly
Research paper [10] is too focus for personalizing web convey the semantics of sentence, document and corpus levels
search by extracting concepts for user given query and for user rather than the traditional analysis of the document only. The
clicked result from web snippets. By this method, personalized similarity between documents is calculated based on the
results were displayed. Concepts are extracted only from web- extracted concepts using a new concept-based similarity
snippets not from web pages due to the reasons such as huge measure (ctf). Documents are clustered based on that similarity
volume of information and time consuming. measure.
Research paper [11] focused on personalized travel Research paper [17] represented text-categorization, using
recommendation. Freely available community-contributed distributional features in which a novel values are assigned to a
photos are the primary source to achieve this task. From which word is called distributional features. Distributional features in
user specific profiles or attributes (e.g., gender, age, race) as the sense, compactness of the appearance of the word and the
well as travel group types (e.g., family, friends, couples) are position of the first appearance of the word. In this research,
automatically retrieved. User profile and context information tfidf-style equation is constructed and ensemble learning
were detected from mobile sensors. This work makes use of technique is used. It considers frequency of a word i.e., how
probabilistic Bayesian learning framework to recommend the many times a word exists in a document and where the word
next travel location from his/her current location or even appears at the very first time and compactness of words. In this
delivering context-related advertisements or services. research, just combines existing frequency with the
distributional features to improve performance prune with little
additional cost.
III. WEB USAGE MINING
The following papers listed below are related to Web usage F. Association rule
Mining.
This paper [18] proposes a new interactive post processing
approach, ARIPSO (Association Rule Interactive post-
2014 IEEE International Conference on Computational Intelligence and Computing Research

Processing using Schemas and Ontologies) to prune and filter IV. DISCUSSIONS AND FINDINGS
discovered rules. They proposed and used Domain Ontologies
and Rule Schema formalism. With the integration of domain A. Discussions
knowledge, user knowledge and rule schema the number of
rules can be reduced and it leads to set of effective rules. Fig.1 shows that, a simple user interaction with the search
engine. When a user gives a query as “fruit in red or green
G. Path Analysis color skin white color flush sweet taste name starts with A” in
This [19] work did personalization based on analyzing order to get “apple” fruit image. The browser returns
user’s unvisited pages that contain relevant information to the approximately 50 apple images and also images of grapes,
user. It constructs domain ontology and creates user profile water melon, jackfruit, pears, tomato and some irrelevant
based on PERSONALISED PAGE VIEW (PPV) GRAPH results like leaves etc., on the first page. In this simple
from the concepts extracted using tf-idf. It recommends user example, user is not directly giving the query as apple instead
interested/preferred information in the shortest path. of that; he/she gives some keywords related to the fruit apple.
The results from the retrieval have more relevant results as well
H. Knowledge sharing and adaptation as some irrelevant results.
The work [20] is focused how to extract information and
identity new attributes in the new unknown sites using the B. Findings
already defined information extraction knowledge such as Information retrieval got a great attention in research
wrapper using Bayesian learning and expectation- communities due to this digital era. In real scenario, users are
maximization (EM) techniques. not aware of the exact keyword for their information retrieval
process. Due to this, they are getting both relevant and
I. Page-Level Data Extraction irrelevant results. In some cases without knowing the exact
keyword users are getting irrelevant results. This was explained
In [21] this paper, proposed an unsupervised, page-level
with the simple example Fig 1. Today’s search engines are
data extraction approach to figure out the structure description
more powerful. They analyze semantic relationship between
and templates for each individual Deep Website, which
words and use various glossaries trying to provide the best
contains either singleton or multiple data records in one
results to their users. Even though, there is gap in between the
Webpage. FiVaTech record-level extraction system applies to
human and computer. In the sense that, analyzing and
tree matching, tree alignment and mining techniques to achieve
understanding differs from human to machine. The machine
the challenging task.
should analyze and understand the information need of users
and provide the best service to the users. From the study of
J. Knowledge based Mining twenty five research papers and from the example Fig 1, we
Research [22] is trying to provide the knowledge based conclude that the integration of personalization, knowledge
mining instead of data mining using ontological approach. based mining and semantic web are bring the traditional web
Through ontology, they are trying to create semantic web in scenario into web 3.0. Current and ongoing researches are
which data are arranged in structured manner which is helpful making use of ontologies. Table 1 show that, most of the
to acquire most relevant and accurate results. But the authors have preferred ontologies for personalization. With the
construction of ontologies needs manual effort. help of ontologies, researchers are trying to make the machine
to think and analyze like a human being in the field of web
K. Semantic Web with Ontologies usage mining. Concepts are very important to construct
Research paper [23] gives an overall view of semantic web ontologies because it clearly explains the contents of a web
and application of Data mining in semantic web and documents. Exact concepts are used to increase the accuracy of
Ontologies. the information retrieval process.
From this survey, we conclude that to personalize the
L. Analysis of Web Log website and to extract the most relevant concepts from
Research paper [24] discuss about how to analyse web logs analyzing the entire documents using NLP. In offline, when we
and how to use web analytics tools. It also provides a case- analyze the entire documents and constructing domain
study of a SDSS sky server and the normalized data set of sky knowledge ontologies with extracted concepts is not affecting
server web log. the response time of an Information Retrieval system. By
incorporating personalization with this approach, we can
M. Mobile Apps Classification provide the required and effective result to the user. In
Information retrieval, providing the relevant and required
Hengshu Zhu [25] proposed a novel method for information to the user is most important. To do this, the
automatically classifying mobile apps with the help of web machine should know the following such as what the user
knowledge and user contextual information. actually means and where that occurs, how it occurs.
Ontologies and Natural Language Processing (NLP) techniques
are very much powerful and helpful to achieve the above
mentioned task. These are the findings with the study of twenty
five research papers.
2014 IEEE International Conference on Computational Intelligence and Computing Research

TABLE I. Various techniques for web user personalization


N Source of Data Technique Output Ref
o. .
Server Log
multivariant k-means Personalized URL
files+referrer+ag 3
1 method Recommendation
ent
C/NC, TF, TF-IDF and Personalized search
Firefox add on
2 personalized B25 results from search 4
Alter Ego
weighting Engine
Analyzing Conceptual Term
3 Document Clustering 5
Documents Frequency
Graph based Ontological
Mobile phone Personalization for
4 representation and Cosine 6
with sensors Mobile Users
similarity Measure
Ontology representation Personalization for
5 Client 7
and RSVM Mobile users
Personalization for
6 Mobile phone XML Schema 8
Mobile users
Documents
Natural Language
analyzed from
Processing, Verb argument
7 sentence, Document Clustering 9
structure, Conceptual term
document and
frequency Fig. 1. Information retrieval with sample query. (Courtesy: Google)
corpus level
Click through
Bipartite graph,
data( positive Concept based user
8 Personalized 10
and negative profiles
Agglomerative Clustering
preferences)
Community- V. FUTUREWORK
Recommendation of
9 contributed Bayesian learning 11
Photos
next travel location Ontologies are helpful for the next generation web i.e.,
User Domain (Association Rule semantic web which bring us the knowledge driven paradigm
10
Knowledge, Interactive post-Processing Interesting Association
12 instead of data driven on the web. Table 1 shows that,
Domain using Schemas and Rules
Knowledge Ontologies
research on personalization and Information retrieval are
User’s browsing effective with the help of Ontologies.
Personalization
behavior from Ontology, Personalized
11 (Preferred information 13 Paper [23] ontology construction includes terms, concepts,
client Add-on in page view (PPV) graph
in the shortest path)
the Browser taxonomic relationships, non-taxonomic relationships and
12 Overview of various personalization approaches with examples 14
access log files
axioms. From the survey of twenty five research paper, our
13 of ISP cache
Fuzzy-Temporal Dynamic
15 research work narrowly focused in to web page
Association Rule Mining personalization
proxy servers recommendation. In particular, when incorporating the
Bayesian learning and
14 Unseen sites expectation-maximization
Extracting Information
16
concepts extraction methodology [16] for domain ontology
(EM) techniques
from unseen websites construction and user preferences which will lead to effective
Concepts from
Personalized search web page recommendation.
15 frequent patterns results from search 17
Web snippets
Engine
16
Analyzing tfidf+ distributional
Document Clustering 18
VI. CONCLUSION
Documents features
Website and From the survey of twenty five research papers, we
Extracting data from
17 individual web tree templates and schema
web pages
19 conclude that incorporating ontologies and Natural language
page
Browsing Personalized search
processing techniques in the web usage mining research
18 behavior from
ontology-based, multi-facet
results from search 20 provides efficient results. In order to give the better
(OMF) profile, RSVM
Mobile phones Engine personalization result to the user, machines should know what
User’s search Knowledge based about the web documents and pages deals and what, the users
19 Ontology, semantic mining 21
query information
Domain are requesting. These two facts should be understood by the
Recommendation of
20 Knowledge and Ontology
web page
22 machine. For this, machine needs to think, understand and
user preferences
decide like human beings. For achieving this, ontologies and
Overview of
21 Semantic Web Ontology
Recommendation of
23 natural language processing techniques are supportive.
web page
and Ontologies
KBData Mining Paper [23] ontology construction includes terms, concepts,
22 Using Semantic Ontology Personalization 22 taxonomic relationships, non-taxonomic relationships and
Web
Weblog and
axioms. From the survey of twenty five research paper, our
SQL log from research work narrowly focused in to web page
23 Normalization Web log Analysis 24
SDSS Sky recommendation.
server
VSM, LDA model, Gibbs In particular, when incorporating the concepts extraction
Web knowledge Sampling, BP-Growth
Automatic Web App methodology [16] for domain ontology construction and user
24 & App usage Algorithm ,Maximum 25
from mobiles Entropy and Limited-
classification preferences which will lead to effective web page
Memory BFGS recommendation.
2014 IEEE International Conference on Computational Intelligence and Computing Research

ACKNOWLEDGMENT [13] Veeramalai Sankaradass and Kannan Arputharaj,“An Intelligent


Recommendation System for Web User Personalization with Fuzzy
I express my heartfelt thanks to Dr.H.Balakrishnan, Temporal Association Rules,” European Journal of Scientific Research
Principle, S.N.R Sons College for providing excellent Vol. .51 No.1, 2011, pp.88-96.
infrastructure and support for my Research. I thank my [14] KennethWai-Ting Leung, Dik Lun Lee, Wang-Chien Lee, “Personalized
Research guide Dr.J.Komala Lakshmi for her support, effort, Web Search with Location Preferences,” ICDE Conference 2010.
invaluably constructive criticism and friendly advice and also I [15] Yunbo Cao, Jun Xu, Tie-Yan Liu, Hang li, Yalou Huang and Hsiao-
thank Dr.AnnaSaro Vijendran, Head of Department of Wuen Hon,“Adapting Ranking SVM toDocument Retrieval”, SIGIR’06,
August 6–11, 2006, Seattle, Washington, U.S.A.
Computer Application for her support.
[16] Shady Shehata , Fakhri Karray and Mohamed S. Kamel, “An Efficient
Concept-Based Mining Model for Enhancing Text Clustering,” IEEE
REFERENCES Transactions on knowledge and data engineering, vol. 22, no.10,
October 2010.
[17] Xiao-Bing Xue and Zhi-Hua Zhou, “Distributional Features for Text
[1] Dhanalakshmi.D and Dr.J.Komala Lakshmi, “ A survey on data mining Categorization,” IEEE Transactions on knowledge and data engineering,
research trends,” International Journal Of Engineering And Computer vol.21, no.3, March 2009.
Science, ISSN:2319-7242, vol. 3, Issue 10 October, 2014,pp.8911-8919. [18] Claudia Marinica and Fabrice Guillet,”Knowledge-Based Interactive
[2] Raymond Kosala and Hendrik Blockeel,“Web Mining Survey” Post mining of Association Rules Using Ontologies,” IEEE Transactions
SIGKDD Explorations, Copyright@2000 ACM SIGKDD, July 2000 on knowledge and data engineering, vol. 22, No. 6, June 2010.
Vol.2, Issue 1-Page. [19] S.Sendhilkumar and T. V. Geetha, “Concept based personalized Web
[3] Bamshad Mobasher,”Web Personalizer:A Server Side Recommender Search,”Advances in Semantic Computing (Eds. Joshi, Boley &
System Based on Web usage Mining,” downloaded from internet. Akerkar), TMRF e-Book, Chapter 5, vol. 2, pp 79- 102, 2010.
[4] Nicolaas Matthijis, Personalizing Web Search using Long Term [20] Tak-Lam Wong and Wai Lam,”Learning to Adapt Web Information
Browsing History,” WSDM’11, February 9-12,2011,Hong Kong, China, Extraction Knowledge and Discovering New Attributes via a Bayesian
Copyright 2011 ACM 978-1-4503-0493-1/11/02. Approach”,IEEE Transactions on knowledge and data engineering, Vol.
22, No. 4, 2010.
[5] Ourdia Bouidghaghen, Lynda Tamine and Mohand Boughanem,
“Context-Aware User’s Interests for Personalizing Mobile Search,” [21] Mohammed Kayed and Chia-Hui Chang,” FiVaTech: Page-Level Web
2011 12th IEEE International Conference on Mobile Data Management, Data Extraction from Template Pages,” IEEE Transactions on
978-0-7695-4436-6/11, DOI 10.1109/MDM.2011.51. knowledge and data engineering, Vol. 22, No. 2, February 2010.
[6] D.Dhanalakshmi, R.Kousalya,V.Saravanan, “Time based Web User [22] Sumaiya Kabir, Shamim Ripon, Mamunur Rahman and Tanjim
Personalization and Search,” International Journal of Computer Rahman,”Knowledge-Based Data Mining Using Semantic Web,” 2013
Applications (0975 –8887) Vol. 46, No.23, May 2012. International Conference on Applied Computing, Computer Science, and
Computer Engineering, IERI Procedia 7, 2014, pp. 113 – 119.
[7] Christoforos Panayiotou, Maria Andreou, George Samaras
and Andreas Pitsillides, “Time Based Personalization for the Moving [23] Konstantin Todorov ,”Data Mining, Ontologies and the Semantic Web”,
User,” Proceedings of the International Conference on Mobile Business April 2013,Online tutorial.
(ICMB’05) 0-7695-2367- 6/05,IEEE. [24] M. Jordan Raddick, Ani R. Thakar, and Alexander S. Szalay and Rafael
[8] Kenneth Wai-Ting Leung and Dik Lun Lee, “Deriving C oncept-Based D.C. Santos, “Ten Years of SkyServer I:Tracking Web and SQL e-
User Profiles from Search Engine Logs,” IEEE Transactions on Science Usage,” July/August 2014, www.computer.org/cise, Co-
knowledge and data engineering, vol. 22, No.7,July 2010. published by the IEEE CS and the AIP.
[9] Alessandro Micarelli, Fabio Gasparetti, Filippo Sciarrone and Susan [25] Hengshu Zhu, Enhong Chen, Hui xiong, Huanhuan Cao and Jilei Tian,
Gauch, “Personalized Search on the World Wide Web”, P. Brusilovsky, “Mobile app classification with enriched contextual information,” IEEE
A. Kobsa, and W. Nejdl (Eds.): The Adaptive Web, LNCS 4321, pp. Transactions on mobile computing, vol.13, no.7, July 2014.
195–230, 2007,© Springer-Verlag Berlin Heidelberg 2007.
[10] Jie Yu and Fangfang Liu, “Mining user context based on interative
computing for personalized web search,” 2nd International Conference
on Computer Engineering and Technology,2010,Vol.2,pp.209-214.
[11] Yan-Ying Chen, An-Jung Cheng and Winston H.Hsu, ”Travel
recommendation by mining people attributes and travel group types
from community-contributed photos,” IEEE Transactions on
multimedia,vol.15, no.6, october 2013.
[12] Thi Thanh Sang Nguyen, Hai Yan Lu, and Jie Lu, “Web-Page
Recommendation Based on Web Usage and Domain Knowledge, ”
IEEE Transactions on knowledge and data engineering, Vol. 26, No. 10,
october 2014.

You might also like