Web Prefetching

NEW APPROACHES IN WEB PREFETCHING TO
IMPROVE CONTENT ACCESS BY END-USERS
A THESIS
Submitted by
VENKETESH P
in partial fulfilment for the requirement of award of the degree

of
DOCTOR OF PHILOSOPHY
FACULTY OF SCIENCE AND HUMANITIES

ANNA UNIVERSITY
CHENNAI 600 025
SEPTEMBER 2013
ii
iii
iv
v
ABSTRACT
The growth of Internet at a rapid pace with enormous number of users
and web services constantly demands good infrastructure to deliver the web
contents to users with minimal delay. Tremendous increase in the global traffic
due to demands from large number of users strains the servers and network,
resulting in poor quality of service (availability, reliability) and latency perceived
by the users. Web caching and prefetching provides effective mechanisms to
mitigate the user perceived latency. This thesis is focused on the study of existing
prefetching mechanism and suggesting new approaches for effective prefetching
in web environment. The goal of web prefetching is to download (prefetch) the
contents and store it in local cache before user actually requests them. It
minimizes the latency time perceived by users when accessing the content.
The thesis primarily focuses on two key aspects:
· Methods to generate predictions that improve prefetching
activity
· Mechanism to effectively manage the contents of cache (regular
and prefetch) by designing cache replacement scheme
Web predictions can be generated at server, proxy or client using
variety of information depending on the location where they are implemented.
Server based predictions consider access history of several users stored in a log
vi
file to generate predictions. Client based predictions consider contents of web
pages accessed by a user to generate predictions.
The major contributions of this research work are as follows:
The first part of thesis focuses on improving the client based
predictions by designing two approaches: Naïve-Bayes and Fuzzy Logic, that
uses hyperlinks accessed by users’ to generate the predictions. The hypertext
information associated with each hyperlink is used to compute its priority and
then sorted (highest to lowest) to create prediction (hint) list. Both prediction and
prefetching engine are implemented in the client machine focusing on browsing
behavior of single or multiple users.
The second part of thesis discusses a prediction algorithm designed to
build Precedence Graph by analyzing the user access patterns from log files. It
considers the object URI and referrer recorded for each request to conceive
precedence relation for adding arcs between nodes to build a graph. Predictions
are then generated by analyzing the arcs in graph.
Finally, third part of thesis discusses cache replacement scheme that
plays significant role in effectively maintaining the contents of cache to achieve
good hit rate. Client-side cache is partitioned into regular and prefetch cache to
enhance the services of web caching and prefetching. LRU algorithm is used to
manage the contents of prefetch cache. Fuzzy Inference System (FIS) based
algorithm is used to manage the contents of regular cache. When web objects
vii
available in prefetch cache are frequently accessed by user, they are moved to
regular cache for its extended presence to serve user requests.
The performance of proposed approaches has been evaluated using
recall and precision metrics. Experimental results indicate the effectiveness of
proposed approaches over existing schemes.

ix
TABLE OF CONTENTS
CHAPTER No. TITLE PAGE No.
ABSTRACT v
LIST OF TABLES xiii
LIST OF FIGURES xiv
LIST OF ABBREVIATIONS xvii
1. INTRODUCTION 1
1.1 WEB PREFETCHING 3

1.1.1 Server Based Prefetching 6
1.1.2 Client Based Prefetching 7
1.1.3 Proxy Based Prefetching 7
1.2 OBJECTIVES OF THE THESIS 9
1.3 ORGANIZATION OF THE THESIS 10
2. LITERATURE SURVEY 12
2.1 INTRODUCTION 12
2.2 CONTENT BASED PREDICTION 15
2.3 ACCESS PATTERN BASED PREDICTION 18
2.3.1 Graph Models 20
2.3.1.1 Dependency Graph 21
2.3.1.2 Double Dependency Graph 22
2.3.2 Markov Models 24
2.3.3 PPM Models 26
2.3.4 Web Mining Models 26
x
2.4 COMMERCIAL PRODUCTS 29

2.5 PERFORMANCE EVALUATION 30
2.6 PERFORMANCE METRICS 31
2.7 CACHE REPLACEMENT 34
2.7.1 Machine Learning Techniques 36
2.8 SUMMARY 37
3. HYPERLINK BASED WEB PREDICTION 39
3.1 INTRODUCTION 39
3.2 NAÏVE BAYES APPROACH 42
3.2.1 Prediction/Prefetch Procedure 43
3.2.2 Implementation 47
3.2.2.1 Prediction Engine 47
3.2.2.1.1 Tokenizer 47
3.2.2.1.2 User-Accessed repository 49
3.2.2.1.3 Computing priority value 51
of Hyperlinks
3.2.2.1.4 Prediction List 54
3.2.2.2 Prefetching Engine 55
3.3 FUZZY LOGIC APPROACH 56
3.3.1 Prediction/Prefetch Procedure 56
3.3.2 Implementation 60
3.3.2.1 Prediction Engine 60
3.3.2.1.1 Predicted- Unused 61
Repository
xi
3.3.2.1.2 Computing priority value 62

of Hyperlinks
3.3.2.1.3 Prediction List 64
3.3.2.2 Prefetching Engine 64
3.4 EVALUATION 65
3.4.1 Experimental Results 67
3.4.1.1 Naïve Bayes Approach 68
3.4.1.2 Fuzzy Logic Approach 71
3.5 CONCLUSION 76
4. PRECEDENCE GRAPH BASED WEB PREDICTION 77

4.1 INTRODUCTION 77
4.2 PRECEDENCE GRAPH 80
4.2.1 Introduction 80
4.2.2 Building the Graph 82
4.2.3 Updating the Graph 84
4.2.4 Predictions from the Graph 86
4.2.5 Prefetching the web objects 88
4.2.6 Implementation Example 91
4.3 GRAPH TRIMMING 94
4.3.1 Invoking Trimming operation 95
4.3.2 Trimming Algorithm 98
4.4 EXPERIMENTAL ENVIRONMENT 104
4.4.1 Experimental Setup 104
4.4.2 Training Data 105
4.4.2.1 Preprocessing Log Files 107
4.5 RESULTS 107
xii
4.6 CONCLUSION 113
5. CACHE REPLACEMENT SCHEME TO

ENHANCE WEB PREFETCHING 115
5.1 INTRODUCTION 115
5.2 CACHE REPLACEMENT- OVERVIEW 117
5.3 FUZZY INFERENCE SYSTEM 120
5.3.1 Membership Function 121
5.3.2 Fuzzy Rules 126
5.3.3 DeFuzzification 127
5.4 PROPOSED FRAMEWORK 129
5.4.1 Fuzzy System – Input /Output 133
5.4.2 Managing Regular Cache 135
5.5 IMPLEMENTATION 137
5.5.1 Training Data 137
5.5.2 Data Preprocessing 138
5.6 PERFORMANCE EVALUATION 142
5.6.1 Performance Metrics 142
5.6.2 Experimental Results 143
5.7 CONCLUSION 147
6. CONCLUSION 149
6.1 SUGGESTIONS FOR FUTURE WORK 151
REFERENCES 153
LIST OF PUBLICATIONS 163
CURRICULUM VITAE 164

xiii
LIST OF TABLES
TABLE No. TITLE PAGE No.
3.1 User-Accessed Repository 50
4.1 Sample user requests in a session 90
4.2 Hints generated for user requests 93
4.3 Notations used in Trimming algorithm 95
4.4 Important fields in a log file entry 106
5.1 Input Parameters to FIS 133
5.2 Symbols used with their meanings 134
5.3 Preprocessed data from the log file 140
5.4 Training data created from preprocessed file 141

xiv
LIST OF FIGURES
FIGURE No. TITLE PAGE No.
1.1 Prefetching Web pages between user requests 4
1.2 Server based Prefetching 6
1.3 Client based Prefetching 7
1.4 Proxy based Prefetching 8
2.1 Page access without Prefetching 13
2.2 Page access with Prefetching 13
2.3 Dependency Graph (window size = 2) 22
2.4 Double Dependency Graph (window size = 2) 23
3.1 Prediction and Prefetching – Naïve Bayes Approach 45
3.2 Some commonly used Stop Words 49
3.3 Prediction List for Prefetching 54
3.4 Fuzzy based Prediction and Prefetching 58
3.5 Prediction List based on Fuzzy Computations 64
3.6 Pages Predicted and Prefetched in a User Session 69
3.7 Recall in Naïve Bayes Approach 70
3.8 Precision in Naïve Bayes Approach 71
3.9 Recall in Fuzzy Logic Approach 72

xv
3.10 Precision in Fuzzy Logic Approach 73
3.11 Comparison of Recall in various approaches 74
3.12 Comparison of Precision in various approaches 75
4.1 Browser & Server interaction with Prediction and 79
Prefetching
4.2 Precedence Graph 84
4.3 Sample HTTP header with referer information 88
4.4 Sample HTTP response with link to be prefetched 89
4.5 Precedence Graph built using the user requests 91
4.6 Adjacency Map for the Precedence Graph 92
4.7 Precedence Graph before Trimming 101
4.8 Precedence Graph after Trimming 104
4.9 Recall for user requests with different Thresholds 109
4.10 Precision for user requests with different Thresholds 110
4.11 Number of Arcs in different Graphs 111
4.12 Number of Nodes in PG with/without Trimming 112
4.13 Number of Arcs in PG with/without Trimming 113
5.1 Framework of Fuzzy Inference System 121

xvi
5.2 Membership Functions for Recency 123
5.3 Membership Functions for Frequency 124
5.4 Membership Functions for Delay Time 125
5.5 Membership Functions for Object Size 125
5.6 Methods to perform Defuzzification 128
5.7 Framework for managing regular/prefetch cache 129
5.8 Workflow of caching/ prefetching system 131
5.9 Sample Log File of a client used for preprocessing 138
5.10 Hit Ratio using traces of Group-A (user 1 to 5) 144
5.11 Hit Ratio using traces of Group-B (user 6 to 10) 145
5.12 Hit Ratio using traces of Group-C (user 11 to 15) 145
5.13 Byte Hit Ratio using traces of Group-A (user 1 to 5) 146
5.14 Byte Hit Ratio using traces of Group-B (user 6 to 10) 146
5.15 Byte Hit Ratio using traces of Group-C (user 11 to 15) 147
xvii
LIST OF ABBREVIATIONS
CDN - Content Distribution Networks
DDG - Double Dependency Graph
DG - Dependency Graph
HTML - Hyper Text Markup Language
HTTP - Hyper Text Transfer Protocol
PG - Precedence Graph
PPM - Prediction by Partial Match
RG - Referrer Graph
RTT - Round Trip Time
URI - Uniform Resource Identifier
URL - Uniform Resource Locator

1
CHAPTER 1
INTRODUCTION
The enormous growth of World Wide Web and the development of
new applications and services have dramatically increased the number of users,
resulting in increased global traffic that degrades the Quality of Service
(availability, reliability, security) and latency perceived by the users. The
problem of tackling web access latency challenges the researchers even with the
availability of high bandwidth connections, fast processors and large amount of
storage space. Downloading a web object from server involves two main
components: a) time taken by request to reach the server plus the time taken by
response to reach the client (i.e. RTT – Round Trip Time) and b) object transfer
time (depends on bandwidth between client and server). Web caching provides
solution for improving the web performance by reducing user perceived latency
through usage of local storage (cache) and effective management of web objects
stored in the cache. The limitations of web caching is handled using web
prefetching technique that fetches the web documents from server even before
the user actually requests them. Web prefetching has been proposed as a
complementary mechanism to web caching.
Web request is a reference made through HTTP protocol to a web
object that is referenced by its Uniform Resource Identifier (URI). Web object is
2
a term used for all possible objects (HTML pages, images, videos etc) that can be
transferred using HTTP protocol and stored in a cache.
Locality of reference exhibited in user access patterns to the web has
three important properties (Bestavros 1997): temporal, geographical and spatial.
Web caching benefits from the temporal locality, where the objects that are
recently referenced will be accessed again in the near future. Replication
techniques (such as CDN – Content Distribution Network) benefit from
geographical locality, where the objects referenced by a user are likely to be
accessed by users in the same geographical area. Spatial locality indicates that
object closer to the currently accessed object has high probability of being
accessed in the near future. Web prefetching exploits the spatial locality property
to predict future requests.
Web applications in current scenario focus more on providing
personalized browsing experience to the users. Web latency perceived by user
when accessing web pages is represented as the time interval between issuing a
request for web page and actual display of page in the browser window. Web
prefetching is a widely used technique to reduce the user perceived latency by
exploiting the spatial locality inherent in the user’s accesses to web objects.
Prefetching mechanism implements a prediction algorithm to process variety of
user information for generating the predictions (hint list), which is used to
prefetch (download) the web objects and store it in cache before they are actually
requested by the users. The benefits of prefetching were constrained in the past
3
due to limitations in the availability of user’s bandwidth, since prefetching could
increase network traffic when its predictions are not accurate enough. In current
scenario, with vast improvement in the bandwidth availability, it provides
prefetching with new opportunities to improve the web performance at
reasonable cost.
1.1 WEB PREFETCHING
It provides effective solution to mitigate the user-perceived latency.
Prefetching can be termed as proactive caching scheme, since it caches the web
pages prior to receiving the requests for those pages from users. The
implementation of web prefetching in the basic web architecture requires two
main components: prediction engine and prefetching engine. Prediction engine
implements an algorithm to predict the user’s next request and provide these
predictions as hints to the prefetching engine. Prefetching engine on receiving the
hints decides to prefetch (download) the web objects when it has the available
bandwidth and idle time. Prefetching of web objects in advance reduces the user
perceived latency when these objects were actually requested by the users. The
prediction algorithms were used in various domains such as recommendation
systems, e-commerce, content personalization, web prefetching and cache
prevalidation. Prediction algorithms had to generate accurate predictions to avoid
performance degradation of the system, since prefetching of mispredicted web
objects waste client, server and network resources.

4
The prediction and prefetching engine can be located in any part of the
web architecture: client, proxy and server. Prefetching engine acts independently
from the prediction engine and can be placed in any element of the web
architecture to receive the hint list. The common trend is to place the prefetching
engine at client to effectively reduce user perceived latency. Commercial
products (Mozilla Firefox, Google Web Accelerator) perform client-side
prefetching to maximize its performance. The prefetched objects are stored in
cache until they are demanded by user or evicted from cache. The system needs
to predict and prefetch only the web objects that can be stored in the cache.
Prefetching algorithms should be designed carefully to avoid adverse effect it
imposes on the network architecture and its performance. Aggressive prefetching
can lower the actual cache hit rate by prefetching useless web objects and storing
it in cache by removing useful objects.
User access User access

page Pi page Pi+1
Page view time (VT)
(Browser Idle)
Prefetch ‘N’ pages, store in cache If Pi+1 in cache,

Receive ‘Prefetch hit’
Predictions
Start Prefetch
Figure 1.1 Prefetching web pages between user requests

5
Figure 1.1 represents prefetching of web pages during idle time
between the user requests. The value of ‘N’ depends on page view time (VT). If
user spends more time in a page, then more number of pages can be prefetched;
else minimal number of pages gets prefetched. A web resource will be
prefetched if it is cacheable and its retrieval is safe to the system. Web objects
that can be prefetched include documents such as jpg, gif, png, html, htm, asp,
php and pdf.
The cached or prefetched page can be reused if it is valid at the time of
access by a user. To achieve good performance, it is important to minimize the
impact of generating unwanted prefetching requests in order to serve the on-
demand web page requests from users. Web prefetching is used to enhance
several web services such as: a) accessing static web objects and dynamically
generated web objects b) web search engines and c) Content Distribution
Networks (CDN). Prefetching needs to address two important issues: a) which
page to be prefetched (prediction problem) and b) when to prefetch a page
(timing problem).
Prediction approach requires in-depth knowledge about users’
information needs as well as the contents of relevant web pages in order to
achieve high predictive performance. Prediction algorithms are categorized based
on the type of information used to generate the predictions. Based on the location
of prediction engine in the web architecture, the amount and scope of
information used by the algorithm to generate the predictions varies. Prefetching

6
mechanism classified into three types: Server-based, Client-based and Proxy-
based prefetching based on the location of prediction engine in the web
architecture.
1.1.1 Server Based Prefetching
In this mechanism, prediction engine is placed at the web server and
prefetching engine at the client as shown in Figure 1.2. Prediction engine uses
the access pattern of all the users to a specific web server for generating the
predictions. Prefetching engine receives the hints (predictions) from web server
and use it to prefetch the web objects during idle time. This approach has been
widely explored in the literature because of its prediction accuracy and its
potential use in real scenarios. Web server is able to observe every client access
and provide accurate access information when proxies are not involved during
the web access.
Request Client
Prediction
Engine Response + Hints
Hints
Prefetch
Web Server Prefetch
Response Engine
Figure 1.2 Server based Prefetching

7
1.1.2 Client Based Prefetching
It deploys both the prediction and prefetching engine at client as
shown in Figure 1.3, where the prediction engine analyzes the navigational
pattern of individual user to generate the predictions. Prefetching engine located
at the client receives the hints (predictions) and use it to prefetch web objects
during idle time. The mechanism covers the usage behavior of single or few
users across different web servers. When client based prediction model is built
using individual client access information, it provides opportunity to make
predictions that are highly personalized and thus reflect the behavior patterns of
individual user.
Client
Request
Prediction
Response Engine
Web
Server Hints
Prefetch
Prefetch
Response Engine
Figure 1.3 Client based Prefetching
1.1.3 Proxy Based Prefetching
The proxy server that sits between the web server and client holds both
prediction and prefetching engine to perform the following tasks: a) prefetch web
8
objects on its own and stores them in its cache b) provide hints to the client about
the web objects it can prefetch in its idle time. The mechanism covers different
users accessing different servers. It takes advantage of multi-user, multi-server
information to generate the predictions.
Proxy based prefetching as shown in Figure 1.4 provides advantages
such as: a) allows users of non-prefetching browsers to benefit from the server
provided prefetch hints b) ability to perform complex and precise predictions due
to availability of information from several sources (clients, servers).
Web Request Proxy Request

Client Server
Web
Prefetch / Server
Prefetch Response
Prediction
Engine + Response
Engine
Hints
Prefetch Prefetch
Figure 1.4 Proxy based Prefetching
In coordinated proxy-server prefetching mechanism, it effectively
utilizes the access information available in both proxy and server to generate the
predictions and coordinates the prefetching activity. The access information
available in proxies will serve data prefetching for group of clients sharing
common surfing interests. Access information in the web server will be used
9
only when the predictions cannot be generated using information available in
proxies. Client and proxy side prefetching provides greater geographic and IP
proximity to the client by separating caching from HTTP server and placing it
closer to the clients.
1.2 OBJECTIVES OF THE THESIS
The thesis aims to improve web prediction and prefetching to
effectively mitigate the user access latency. Contributions made in this thesis are:
· Designed methods that use hypertext associated with hyperlink to
generate the predictions. In the first method, predictions are
generated based on the computations that use Naïve-Bayes
classifier. In the second method, predictions are generated based
on the computations that use Fuzzy logic. Performance of both the
approaches compared with respect to the quality of predictions that
are generated.
· Designed method to build Precedence Graph (PG) by analyzing
the user access requests stored in log files at web server.
Predictions are generated based on the information maintained in
the graph.
· Designed cache replacement scheme to manage the client-side
cache that has been partitioned into two parts: regular cache (for
web caching) and prefetch cache (for web prefetching).

10
1.3 ORGANIZATION OF THE THESIS
This thesis is organized as follows:
Chapter 2 discusses the related work carried out in web prefetching. It
analyzes several web prediction algorithms found in the literature that generates
web predictions based on web content and user access patterns. Detailed
discussion is carried out for different techniques in each category. Cache
replacement policies that improves caching and prefetching mechanisms is also
discussed.
Chapter 3 discusses the usage of hypertext to suggest list of
hyperlinks that can be prefetched during browser idle time. It discusses two
techniques: Naïve Bayes and Fuzzy Logic for generating the predictions. In these
schemes, both the prediction and prefetching engine are located at the client
machine. The user’s browsing behavior is monitored through web browser and
based on it the predictions are generated.
Chapter 4 introduces a new prediction algorithm that uses Precedence
Graph to generate the predictions. In this scheme, prediction engine is located at
the web server and prefetching engine at client machine. It uses the access
patterns of users stored in log files at the server to build Precedence Graph that
get updated dynamically based on the user requests. Graph trimming is
periodically performed to manage the growth in graph size.

11
Chapter 5 discusses cache replacement scheme that is designed to
enhance the performance of web prefetching. The client-side cache is partitioned
into two parts: regular cache (to support web caching) and prefetch cache (to
support web prefetching). The contents of regular cache are managed using
Fuzzy Inference System (FIS) algorithm and contents of the prefetch cache
managed using LRU algorithm. Frequently accessed contents in prefetch cache
will be moved to the regular cache for effectively satisfying the user requests.
Finally, Chapter 6 presents the conclusion and directions for future
research work.
12
CHAPTER 2
LITERATURE SURVEY
2.1 INTRODUCTION
There has been significant amount of research work carried out in the
past for enhancing the performance of web prefetching. Several techniques were
designed to be used at client-side, server-side and hybrid client/server for
enhancing the delivery of web pages to the users. The browsing behavior of users
was analyzed to identify interests on specific domain for supporting services like
web personalization and prefetching. This chapter discusses various prediction
algorithms found in the literature to support web prefetching in providing
efficient service to the clients. Prediction algorithms can be implemented in
different parts of the web architecture: client, proxy and server. The algorithms
are categorized based on the type of information used to generate the predictions.
We discuss the algorithms by categorizing them into two types: a) algorithms
that make prediction by analyzing the content of recently visited web pages and
b) algorithms that predict future accesses based on the past user access patterns.
Prediction algorithms discussed in the literature used different data structure and
computation resources with variations in prediction accuracy. To maintain

13
acceptable prediction accuracy, most prefetching algorithms limit the number of
pages to be prefetched for satisfying the user requests.
Client request Client receives

page Pi Pi from server
T0 T1 T2
(T2 - T1 = page access latency)
Figure 2.1 Page access without Prefetching
Client request Client receives Pi

page Pi from cache
T0 T1 Tp T2
Prefetch Pi Pi stored in cache
Tp-T1 = page access latency (zero or negligible)

Tp < T2
Tp-T1 < T2-T1
Figure 2.2 Page access with Prefetching
Figure 2.1 and 2.2 represents the page access without and with
prefetching mechanism. As shown in Figure 2.1, when client requests page Pi it
is retrieved from server, which consumes time resulting in noticeable delay when
14
user accesses the page. In case of page access with prefetching, Pi is prefetched
in advance with the anticipation that it will be accessed in future. When user
requests Pi and if it is in cache, then it is served with zero or negligible latency. It
reflects the advantages and need for applying prefetching in the web architecture.
The main benefit of employing prefetching will be it prevents bandwidth
underutilization and reduces user perceived latency.
Web cache replacement is required to effectively utilize the storage
space in cache, which helps to satisfy large number of user requests with minimal
access latency using the web objects stored in the cache. We discuss several
cache replacement schemes proposed in the literature for improving the cache
usage to satisfy the user requests. The performance of web prefetching can be
fine tuned by using efficient replacement algorithm for managing the prefetch
cache apart from the regular cache used for web caching. Prefetching is
beneficial to the system only if the prefetched pages are really requested by users
before they become invalid or purged from cache. Otherwise, resources are
wasted on fetching unwanted pages from server that degrades the overall system
performance.
Prefetching mechanism can encounter two forms of interference: self-
interference and cross-interference, which needs to be handled effectively to get
the benefits of prefetching. Self-interference occurs when prefetching hurts its
own performance by interfering with demand requests. Cross-interference occurs
when prefetching hurts the performance of other applications on the client.

15
Cost function based prefetching approaches depend on factors such as
popularity and lifetime of web objects in deciding the objects to be prefetched.
Jiang et al (2002) suggested Prefetch by Lifetime approach in which ‘n’ objects
that have longest lifetime was selected to minimize the bandwidth requirement.
Object lifetime reflects the average time interval between consecutive updates to
the object. Prefetching algorithm should consider objects with longer lifetime,
since they are the best candidates to minimize extra bandwidth consumption that
is required to update the objects residing in cache.
Long term prefetching mechanism allows clients to subscribe the web
objects in order to increase cache hit rate and reduce user latency. Selection of
objects to be prefetched will be based on long term characteristics such as access
frequency, update intervals and size. Web servers proactively ‘push’ fresh copies
of subscribed objects into web caches whenever objects are updated.
Venkataramani et al (2002) suggested good fetch approach that attempted to
balance object popularity and object update rate to achieve good hit rate
improvement with minimal bandwidth.
2.2 CONTENT BASED PREDICTION
The content based algorithms analyze information such as: hyperlinks,
text surrounding the hyperlinks and labels with metadata information to generate
the predictions. Anchor text will be one of the major resources for getting
semantics of the target page. An automatic resource compilation system
(Chakrabarti et al 1998) performed analysis of text and links for determining the
16
web resources that are suitable for a particular topic. Davison (2000) conducted
an analysis that focused on examining the descriptive quality of web pages and
the presence of textual overlap in web pages. The text in and around the
hypertext anchors of selected web pages were used (Davison 2002) to determine
the user’s interest in accessing the web pages. Craswell et al (2001) indicated that
the anchor texts were highly useful in site finding based on the analysis of link
and content based ranking methods in finding the web sites. A framework that
used link analysis algorithm was designed (Chen et al 2002) to exploit the
explicit (hyperlinks embedded in web page) and implicit (imagined by end-users)
link structures.
The keyword-based semantic prefetching approach (Xu and Ibrahim
2004) used neural networks to predict the future requests based on semantic
preferences of past retrieved web documents. Topical locality assumes that pages
connected by links are more likely about the same topic that the user is interested
with. Pons (2006) proposed a methodology to prefetch web objects of slower
loading web pages by semantically bundling it with the faster loading web pages.
Semantic link prefetcher (Pons 2006a) was used to predict and prefetch the web
objects during the limited view time interval of web pages. A transparent and
speculative algorithm designed (Georgakis and Li 2006) for content based web
page prefetching indicate that the textual information in both the visited pages
and followed links were influential in determining the preferences of a user.

17
Eirinaki and Vazirgiannis (2005) presented a personalization algorithm
that combined usage data and link analysis techniques for ranking and
recommending the web pages to end user. Web pages of different categories
were analyzed (Chauhan and Sharma 2007) to suggest usage of cohesive and
non-cohesive text present near the anchor text for extracting information about
the target web page. Georgakis (2007) presented a client side algorithm that
learnt and predicted user requests based on user behavior profile that was built
using the user’s web surfing behavior. It used part-of-speech tagger to filter
useful user keywords. Tagging was used to identify the lexical or linguistic
category for individual words. Dutta et al (2009) proposed web page prediction
approach through linear regression that depended on the transition probability
and ranking of links in the current web page for prediction accuracy.
In chapter 3, we discuss generation of web predictions based on the
content of hypertext associated with hyperlink. Naïve Bayes and Fuzzy logic
approaches were used to generate the predictions, which proved to be effective in
reducing the access latency with less system complexity. Bayesian network
classifiers are popular machine learning algorithms that have received
considerable attention from scientists and engineers across various fields such as
medicine, military applications, forecasting, control, statistics and cognitive
science. Naive Bayes a simple Bayesian network has been applied successfully in
many domains. It has gained popularity in solving various classification
problems (Fan et al 2009).

18
In Naive Bayes all the attributes are assumed to be conditionally
independent and it ignores any correlation among them. It has been used
extensively in applications such as: email spamming, mining log files for system
management, semantic web, document ranking by text classification, and
hierarchical text categorization. Naive Bayes classifiers are very fast and they
have very low storage requirements. They are very good in domains with many
equally important features. It acts as a good dependable baseline for text
classification. Naive Bayes depends on probability estimations called as posterior
probability to assign a class to an observed pattern.
2.3 ACCESS PATTERN BASED PREDICTION
The path profiles of users stored in web logs provide useful
information for predicting the user’s future requests. Several techniques were
explored in the literature that predicted future requests based on the past
sequence of user requests. The information available in web access logs varies
depending on the format of logs and log data selections made by administrators.
Insufficient log data leads to inaccuracies in predictions. Page popularity ranking
is a log data analysis procedure that is used to determine the pages that are most
likely to be requested next by users.
The top-10 prefetching approach (Markatos and Chronaki 1998)
allowed servers to push popular documents to proxies at regular intervals based
on the client’s aggregated access profiles. Schechter et al (1998) built a sequence
prefix tree (path profile) based on the requests in server logs that used longest
19
matched most-frequent sequences to predict user’s next requests. A quantitative
model (Cooley et al 2000) based on support logic used information such as usage,
content and structure to automatically identify the interesting knowledge from
web access patterns. The n-gram model by Su et al (2000) compressed the
prediction model size that fits in to main memory with improved prediction
accuracy and moderate decrease in applicability. Web pages were clustered into
different categories based on their access patterns (Mukhopadhyay et al 2006).
Pages were categorized into levels based on their page rank and those pages at
the top levels had higher probability of being predicted and prefetched.
Fisher and Saksena (2004) designed a server-driven link prefetching
mechanism and implemented it in Mozilla web browser. It depended on the
origin server or intermediate proxy server to provide the list of web documents to
be prefetched. A coordinated proxy-server prefetching (Chen and Zhang 2005)
utilized the access information adaptively by managing prefetching at both proxy
and web servers. The access information stored in proxies served data
prefetching for clients sharing common surfing interests. Web server access
information was utilized for data objects that were not qualified for proxy based
prefetching.
The history based prefetching algorithm (Liu and Oba 2008) achieved
high prediction accuracy with limited memory by storing only the useful request
sequences and discarding those that will not yield useful predictions.
Dimopoulos et al (2010) modeled users’ navigation history and web page content
20
with weighted suffix trees to support web page usage prediction. Based on the
access time of user requests, the access sequences were partitioned into different
data blocks (Ban and Bao 2011). They used a decision method to select the
training data based on the prediction precision.
2.3.1 Graph Models
A server side prefetching approach proposed by Padmanabhan and
Mogul (1996) built Dependency Graph (DG) for representing the access patterns
of users. Predictions were generated based on the graph that was updated
dynamically by server based on client access patterns. Dependency Graph
achieved acceptable performance when it was proposed but it did not consider
the structure of current web pages (i.e. HTML object with several embedded
objects), which reduced its effectiveness in the current web. Domenech et al
(2006a, 2010) improved the web prefetching performance by designing Double
Dependency Graph (DDG) that considered the characteristics of current web
sites when generating the predictions. It differentiated the dependencies between
objects of the same page and objects of different pages.
The prediction algorithms (DG, DDG) learn user patterns from
sequence of accesses. It considers that two objects are related if they are being
requested by the same user in close time. De la Ossa et al (2007) proposed
Prediction at Prefetch mechanism that allowed the prediction algorithm (used DG
and DDG) to provide hints for both the standard object requests and prefetch
requests. A web prediction algorithm that built Referrer Graph (RG) based on
21
object URI and its referrer was designed by De la Ossa et al (2010). The graph
possessed minimal number of arcs when compared to DG and DDG algorithms.
2.3.1.1 Dependency Graph
Dependency Graph had a node for every object that had been accessed
by the user. An arc is drawn between the nodes A and B if at some point in time
the client accessed node B within ‘w’ accesses after A was accessed, where ‘w’
represents the lookahead window size. The confidence of each arc will be the
ratio of number of accesses to B within a window after A to the number of
accesses to A. It represented the confidence level of transition between nodes.
Figure 2.3 represents the Dependency Graph constructed with a lookahead
window size of 2 using the access patterns of two users. The access sequence of
user1 will be: {HTM1, IMG1, HTML2, IMG2, HTML4, IMG4} and that of user
2 will be: {HTML1, IMG1, HTML3, IMG2}.Aggressiveness of prefetching
controlled by applying a cutoff threshold parameter to the weight of arcs.
Each node is represented with its object and occurrence count. Each
arc represented with pair of values {arc count, arc confidence}, e.g. {1, 0.5}
between HTML1 and HTML3 where arc confidence computed as arc count /
source node count i.e. 1 / 2= 0.5.

22
HTML2 1, 1 IMG
1 2
1, 1 1
1, 0.5
1, 0.5 1, 1
2, 1
HTML1 HTML4
2, 1 IMG1 1, 1
2 1
2
1, 1 1, 1
1, 0.5
1, 0.5
HTML3 IMG4
1 1
Figure 2.3 Dependency Graph (window size = 2)
2.3.1.2 Double Dependency Graph
Double Dependency Graph (DDG) is similar to DG algorithm, but
distinguishes two classes of dependencies: dependencies to object of same page
and dependencies to object of another page. The graph had node for every object
that had been accessed, with an arc from node A to B if client accessed B within
‘w’ accesses to A. The arc is termed primary if A and B are objects of different
pages, i.e. either B is an HTML object or user accessed one HTML object
between A and B. The arc is termed secondary if there are no HTML accesses
between A and B. The graph had same order of complexity as that of DG, but it
distinguishes two classes of arcs.

23
HTML2 1, 1 IMG
1 2
1, 1 1
1, 0.5
1, 0.5 1, 1
2, 1
HTML1 HTML4
2, 1 IMG1 1, 1
2 1
2
1, 1 1, 1
1, 0.5
1, 0.5
HTML3 IMG4
1 1
Figure 2.4 Double Dependency Graph (window size = 2)
Figure 2.4 represents DDG constructed with a lookahead window size
of 2 using the access patterns of two users. The access sequence of user-1 will
be: {HTM1, IMG1, HTML2, IMG2, HTML4, IMG4} and that of user-2 will be:
{HTML1, IMG1, HTML3, IMG2}. Primary arcs represented with continuous
lines and secondary arcs with dashed lines. Predictions are obtained by applying
threshold to both primary and secondary arcs in graph.
Chapter 4 discusses the proposed Precedence Graph built based on the
requested object and its referrer for generating the web predictions. Precedence
Graph had less number of arcs when compared to DG and DDG, which helps it
to provide effective predictions with less memory requirement compared to other
algorithms.
24
2.3.2 Markov Models
Markov models were effectively used in web prefetching by utilizing
the information gathered from web logs. They focus on minimizing the system
latency or improving the web server efficiency. The precision of Markov models
comes from the consideration of consecutive orders of preceding pages. The goal
is to build effective user behavioral models that can be used to predict web pages
that user will most likely access in the future. The order of Markov model
indicates how many past user accesses were used to define the context in a node.
Low order Markov models lack web page prediction accuracy due to minimal
usage of pages in history and high order Markov models suffer from high state
space complexity.
A probabilistic sequence generation model (Sarukkai 2000) using
Markov chains predicted the next request based on the history of user access
requests. Markov predictors were used (Nanopoulos et al 2003) to design web
prefetching algorithm. Models based on Markov probabilistic techniques
(Davison 2004) used information from user access history and web page content
to accurately predict the user’s next request. Deshpande and Karypis (2004)
presented Markov model with reduced state complexity and improved prediction
accuracy. It applied different techniques to intelligently select the parts of
different order Markov models. Three pruning schemes (support, confidence and
error) were presented to prune the states of All-Kth order markov model.
25
Markov–Knapsack approach (Pons 2005) enhanced the web page
rendering performance by combining Multi-Markov web-application centric
prefetch model with Knapsack web object selector. Integration of semantic
information into Markov models for prediction (Mabroukeh and Ezeife 2009)
allowed low order Markov models to make intelligent accurate predictions with
less complexity than higher order models. Feng et al (2009) constructed Markov
tree using web page access patterns for effective page predictions and cache
prefetching. An integration model by Khalil et al (2009) combined clustering,
association rules and Markov models to achieve better prediction accuracy with
minimal state space complexity.
Chimphlee et al (2010) proposed an approach that combined the
strengths of Markov model, association rules and fuzzy adaptive resonance
theory to achieve higher accuracy, better coverage and overall performance while
keeping the number of computations to a minimum. Lee et al (2011) proposed
two-level prediction model (TLPM) that considered the natural hierarchical
property from web log data. TLPM could decrease the size of candidate set of
web pages and increase the prediction speed with adequate accuracy. Markov
model was used in level one to predict the categories and Bayesian model was
used in level two to predict the desired web pages in the predicted categories.
The prediction algorithms based on Markov models provide high
precision predictions but it requires intensive computation and memory
consumption.
26
2.3.3 PPM Models
Prediction-by-Partial-Matching (PPM) models were commonly used in
web prefetching for predicting the user’s next request by extracting useful
knowledge from historical user requests. Factors such as page access frequency,
prediction feedback, context length and conditional probability influence the
performance of PPM models in prefetching.
A proxy-initiated prefetching technique (Fan et al 1999) used PPM
algorithm to generate the predictions. Chen and Zhang (2003) used popularity of
URL access patterns to build a PPM model for generating accurate predictions by
efficiently managing the storage space. Ban et al (2007) implemented an online
PPM model based on non compact suffix tree that used maximum entropy
principle to improve the prefetching performance. PPM model based on
stochastic gradient descent was designed (Ban et al 2008) to describe node’s
prediction capability using a target function. It selected the node with maximum
function value to predict the next most probable page.
2.3.4 Web Mining models
Web mining applies data mining techniques to large amount of web
data for improving the web services. Mining web access sequences helps to
discover useful knowledge from web logs that can be applied to variety of
applications such as: navigation suggestion for users, customer classification and
efficient access across related web pages. Several research efforts in web usage
27
mining have focused on three main paradigms (association rules, sequential
patterns and clustering) for analyzing the web data and generating desired output.
Clustering of web user access patterns helps to build user profiles by capturing
common user interests that can be applied to applications such as web caching
and prefetching. Association rules help to optimize the organization and structure
of websites.
To capture the user’s navigational behavior patterns, a model based on
data mining was designed (Borges and Levene 1999), which used high
probability strings to represent the user’s preferred trails. A web usage mining
process (Pierrakos et al 2003) analyzed the data collection, data preprocessing
and pattern discovery mechanisms for supporting web personalization. A
prediction based proxy server was designed (Huang and Hsu 2008) to effectively
improve the hit ratios of accessed documents. It used three functional
components: log file filter, access sequence miner and prediction based buffer
manager. The log file filter removes irrelevant records from the log file and feed
the cleaned file as input to the access sequence miner. Sequence miner processes
the popular access sequences to generate rule table. Buffer manager decides
caching / prefetching or buffer size adjustment based on the buffer contents and
rule table.
An integrated approach (Pallis et al 2008) that effectively combined
caching and prefetching used web navigational graph to represent the user
requests. Its efficiency was tested using the developed simulation environment.
28
The statistical analysis and web usage mining techniques were combined
(Heydari et al 2009) to create a powerful method for evaluating the website
usage by considering the client side data. Browsing time (statistical analysis)
helps to effectively evaluate the website, and graph mining (web usage mining)
helps to discover user access patterns through complex browsing behavior.
Lee et al (2009) designed a prefetch scheme that decided the web
objects to be prefetched by considering the memory status of web cluster system.
It comprised of the following components: a) Double Prediction by Partial Match
(DPS) – adapted for the modern web framework b) Adaptive Rate Controller
(ARC) – determined prefetch rate based on the dynamic memory status and c)
Memory Aware Request Distribution (MARD) – distributed requests based on
available web processes and memory.
Ahmed et al (2011) proposed novel framework for mining high utility
web access sequences that efficiently handled both forward and backward
references. It could perform both static and incremental mining of web access
sequences. A web user clustering approach (Wan et al 2012) based on Random
Indexing (RI) was used to build user profiles for applications such as web
caching and prefetching. Random Indexing is an incremental vector space
technique that allows continuous web usage mining.

29
2.4 COMMERCIAL PRODUCTS
Several commercial products had attempted to incorporate prefetching
mechanisms in order to provide effective service to clients. In Google search, the
results may sometime include first page of the list as a hint embedded in the
HTML code. If the web browser has prefetching capabilities, then it can request
that page in advance. Packeteer SkyX Accelerator, a gateway designed to
accelerate connections in the local network used an undisclosed prefetching
method (it was discontinued in 2007).
Viking Server, a commercial product for Microsoft Windows
operating systems included a proxy with prefetching capabilities. Mozilla Firefox,
an open source web browser supports prefetching mechanism. Browsers such as
SeaMonkey, Netscape, Camino and Epiphany that are based on the Mozilla
Foundation technologies also included the prefetching capability. Google Web
Accelerator (Google 05), a free web browser extension available for Mozilla
Firefox and Microsoft Internet Explorer possessed web prefetching facility. It
prefetches hints included in the HTML body; also it prefetches all the links in
pages that are being visited if no hints are provided.
FasterFox, an open and free extension for Mozilla web browsers
(introduced in 2005) prefetches all the hyperlinks found in the current page
during browser idle time. PeakJet, a commercial product for end user, available
around 1998, included several tools to improve user access to the web. It
included web browser independent cache with prefetching capability based on

30
history or links. It could prefetch links on the current web page that were visited
by the user in the past or all links on the current web page. NetAccelerator, a
product commercialized between 1998 and 2005, prefetched all the links in the
page that are being visited and store the objects in browser cache. It could refresh
the contents of cache in order to avoid obsolete objects.
2.5 PERFORMANCE EVALUATION
The impact of web prefetching architecture in reducing the user
perceived latency was analyzed (Domenech et al 2006) to identify the best
architecture for performing prefetching and provide insight into the efficiency of
system. A cost-benefit analysis was carried out (Domenech et al 2007) to
compare the prefetching algorithms from user’s view point. A mathematical
model for web prefetching architecture (Balamash et al 2007) showed that
prefetching was profitable even with the presence of good caching system.
Performance of prediction algorithms measured using metrics that
quantify both the efficiency and efficacy of the approach. Domenech et al
(2006c) analyzed large set of key metrics used by various researchers to propose
taxonomy based on three main categories for better understanding and evaluation
of prefetching systems. The categories are: 1) Prediction related indexes 2)
Resource usage indexes and 3) End-to-End perceived latencies indexes.
Prediction indexes are used to quantify the efficiency and efficacy of prediction
algorithms. Resource indexes are used to quantify the additional cost incurred
due to prefetching. End-to-End latency indexes are used to highlight the system’s
31
performance from user’s view point. A statistical analysis was performed
(Domenech et al 2006d) to identify the situations that influence the outcome of
recall and byte recall indexes. Experimental results indicated that the user
available bandwidth and server processing time significantly influenced the
selection of appropriate index for evaluation.
Marquez et al (2008) proposed an intelligent web prefetching
mechanism that dynamically adjusted the aggressiveness of prediction algorithm
based on the system performance. To assist the proposed scheme, a traffic
estimation model was designed that used available information in the server to
accurately calculate the extra server load and network traffic generated by
prefetching. A global framework developed by Marquez et al (2008a) used
discrete-event based simulation for performance evaluation of caching and
prefetching in the web architecture. It also offered the flexibility to set prediction
and prefetching engine at any part (client, proxy, and server) of the web
architecture.
2.6 PERFORMANCE METRICS
Domenech et al (2006) identified various performance metrics for
evaluating the web prefetching techniques implemented in the web architecture.

32
Some of the metrics discussed are:
Precision (Pc)
It measures the ratio of objects that were predicted, prefetched and
then finally requested by user (prefetch hits) versus the total number of objects
that were predicted and prefetched.
Prefetch Hits
Pc =
Prefetchs
Recall (Rc)
It measures the ratio of user requested objects that were previously
predicted and prefetched.
Prefetch Hits
Rc =
User Requests
Resource Usage
Prefetching benefits are achieved at the expense of using additional
resources and it must be quantified because it can negatively impact the
performance.
Traffic Increase (∆TrB)
It quantifies the traffic increase (in bytes) due to unsuccessfully
prefetched documents. In prefetching, network traffic usually increases due to

33
two side effects: objects not used and overhead. When objects were not used, it
wastes network bandwidth because they were never requested by the user.
Objects Not UsedB + Network OverheadB+ User RequestsB

∆TrB =
User RequestsB
Object Traffic Increase (∆Trob)
It quantifies the increase in percentage of the number of documents
that clients will get when using prefetching. The index estimates the ratio of the
amount of prefetched objects never used with respect to the total user’s requests.
Objects Not Used + User Requests

∆Trob =
User Requests
Object Latency
It is obtained from the service time reported by the web server or it is
zero if the object is already in browser cache. De la Ossa et al (2010) discussed
object latency saving and page latency saving metrics.
Object latency saving is the ratio of the latency perceived using
prefetching to the latency without prefetching.
Average object latency with prefetch

ÑOL =
Average object latency no prefetch
34
Page Latency Saving
It is used as the main performance index for measuring prediction and
prefetching effectiveness in order to study the maximum benefit perceived by
web users.
Average page latency with prefetch
ÑPL =
Average page latency no prefetch
2.7 CACHE REPLACEMENT
Cache replacement plays significant role in improving the performance
of web caches, and there were several policies proposed in the literature
attempting to achieve good performance of caching mechanisms. When cache
capacity reaches its maximum limit, objects already stored in cache are purged to
store newly downloaded web objects. Decision about the objects to be purged
from cache is governed by replacement policy. The cache replacement policies
will achieve better performance when it receives stream of requests with high
popularity and less number of first-timer/one-timer requests (Benevenuto et al
2005).
Cheng and Kambayashi (2002) enhanced the functionality of web
proxy caching by integrating performance tuning techniques with content
management. Content aware replacement algorithm (LRU-SP+) provided 30%
improved caching performance over the content blind schemes. Few cache
replacement techniques depend on object grading mechanisms such as

35
popularity–rank of the object (Chen et al 2003), cost function for object retrieval
(Cao and Irani 2002), page grade (Bian and Chen 2008) to decide the
cacheability or purging of web objects from the cache. The grading mechanisms
tradeoff between Hit Ratio (HR) and Byte Hit Ratio (BHR) focusing to improve
the cache performance.
The performance of replacement policies for different document types
(applications, audio, images, text and video) was evaluated by Cañete et al
(2007) and they suggested policies that will provide good performance for each
document type. Wong (2006) suggested replacement policies for proxies with
different characteristics such as small cache, limited bandwidth and processing
power. They also analyzed policies that will be better for proxies at ISP and root
level. Romano and ElAarag (2008) considered factors such as: Frequency,
Recency and Frequency/Recency for quantitatively analyzing the performance of
cache replacement policies. The replacement algorithm chooses better victims for
eviction from the cache, when it considers several factors for making the
decision.
Geetha Krishnan et al (2011) designed a new model for client-side web
cache by fragmenting the cache into three slices: Sleep Slice (SS), Active Slice
(AS) and Trash Slice (TS). Based on the hit count, slicing was performed to
group cached pages that help in reducing the latency when they are retrieved.
One time hit pages were discriminated from other pages to ensure that hot pages
were made available when user requests them. Performance metrics such as File
36
Hit Ratio, Speedup, Delay Saving Ratio and Number of Evictions were used to
assess the performance of the model.
2.7.1 Machine Learning Techniques
In recent years, researchers have developed several intelligent
approaches (back-propagation neural network, fuzzy systems and evolutionary
algorithms) that are smart and adaptive to Web caching environment. A
nonlinear model designed by Koskela et al (2003) optimized the web cache
performance using object features such as HTTP responses of the server, access
log of the cache and HTML structure of the object. The drawbacks in the model
were: a) difficulty in collecting comprehensive data set b) computationally
intensive learning phase and c) more number of inputs to the model. Neural
network based web proxy cache replacement scheme was designed by Cobb and
ElAarag (2006, 2008) to classify web objects into cacheable or uncacheable
entities based on the frequency and recency information. A sliding window
mechanism introduced by Romano and ElAarag (2011) enhanced the cache
replacement scheme that estimated the frequency count and recency time within
the window boundary.
The use of backpropagation neural network in deciding the web
objects to be evicted from cache based on the inputs: frequency, recency, size
and delay time was discussed by Ali and Shamsuddin (2007). The effect of
artificial neural network (ANN) and particle swarm optimization (PSO) in
making decisions regarding the cacheability of web objects were studied by

37
Sulaiman et al (2008). To improve the performance of client-side web caching,
Ali and Shamsuddin (2009) proposed an approach that partitioned the client
cache into short-term and long-term cache. Short-term cache managed using
LRU algorithm and long term cache managed using neuro-fuzzy system. A
Support Vector Machine (SVM) based approach was designed by Ali et al (2011)
to predict the web object classes using frequency, recency, size and object type
as parameters. The metrics used for performance evaluation were Correct
Classification Rate (CCR), True Positive Rate (TPR), True Negative Rate (TNR)
and geometric mean (G mean).
A class based LRU (C-LRU) algorithm designed by Haverkort et al
(2003) balanced the large and small documents that existed in the cache. To
support the adaptive nature of C-LRU algorithm, Khayari et al (2009) used
neural networks to determine or recompute the optimal distribution parameters
for class boundaries and class fractions.
2.8 SUMMARY
This chapter discussed various prediction algorithms proposed in the
literature to perform Web prefetching. The algorithms were analyzed by
grouping them into two categories: a) algorithms that generate predictions based
on web content and b) algorithms that generate predictions based on user access
patterns. In chapter 3, we discuss content based web predictions that use Naïve
Bayes and Fuzzy Logic mechanisms for computing the priority values based on
which the predictions are generated. Graph based prediction model is

38
incorporated to generate web predictions by designing Precedence Graph, which
is discussed in chapter 4. Cache replacement algorithms that play a significant
role in improving the cache performance were analyzed. A new client-side
replacement scheme is explored in chapter 5, where it partitions the cache into
regular and prefetch cache for managing the objects. Regular cache managed
using FIS algorithm and prefetch cache managed using LRU algorithm.
39
CHAPTER 3
HYPERLINK BASED WEB PREDICTION
3.1 INTRODUCTION
The usage of Internet over the years has increased tremendously and
users are leveraging its benefits to access variety of services provided over the
network. Due to massive growth of Internet, network load and access time have
increased dramatically causing substantial delay in providing services to the user.
The user perceived latency when accessing the web pages is affected by the
following factors: a) bandwidth availability b) request processing time at the
server c) round trip time and d) object size. Implementing the caches either
remotely (in web server or proxy server) or locally (in browser’s cache or local
proxy server) significantly reduces the access latency. The usage of cache can be
improved by applying web prefetching mechanism that acquires web contents
with the anticipation that these contents will be requested by users in the near
future. Web prefetching exploits the spatial locality exhibited by users when
accessing the web objects.
Prefetching mechanism requires predicting the list of web objects to be
prefetched from server for satisfying the user requests. Web predictions can be
generated by analyzing information such as user access patterns, contents of web

40
pages and object popularity depending on the location (server, proxy or client) of
its implementation. The client decides to prefetch web objects based on the
following factors (Mogul 1998): a) object availability in cache and its current
timestamp b) idleness of user for more than the threshold interval c) network
bandwidth d) object size and user preferences. A personalized web prefetching
system implemented by Zhang et al (2003) generated set of URLs that the user
will visit next using history-based and content-based predictors.
Semantic prefetching strategies analyze contents of web page or its
metadata for predicting the probable pages that will be requested in future. These
strategies consider either hyperlinks contained in the web pages or keywords
attached to or extracted from the web pages as input for generating the
predictions. They should have provision of being enabled or disabled when user
enters or exits the web services, since they are domain specific and focus on
particular topic of interest. In our work, the focus is on generating web
predictions at client machine using the information associated with hyperlinks
that are used to access web objects across different web pages. When a web page
has higher usability, then prefetching it will improve the system performance.
User's navigation in a web site is influenced not only by their own
interests on a topic, but also the structure of web pages. For example, if user is
currently viewing page Pi then there is high probability of using the links
available in that page to visit page Pi+1. User navigates through pages by clicking
41
links based on the text anchored around them. It is assumed that there is a
relationship between the textual content of web pages and the user’s interests.
Web user’s browsing behavior was often guided by keywords in
hypertext of URL that refers a web object. Hyperlink (URL) represents relation
between two different web pages or two parts of the same web page. Hypertext
provides descriptive or contextual information to users about the contents
referred by hyperlink. This chapter discusses generation of web predictions based
on hypertext associated with hyperlinks by designing two approaches: a) Naïve-
Bayes and b) Fuzzy Logic approach. They are responsible for computing the
priority value of hyperlinks, which is used to decide the hyperlinks to be included
in the prediction list. Client is responsible for performing web prediction and
prefetching, where it prefetch the objects during browser idle time based on the
generated predictions. Predictions are generated dynamically for each new web
page visited by the user based on the information maintained in the repositories
that gets updated frequently based on the user access patterns.
Web objects embedded in main html pages are requested automatically
when user requests the main page. Requests for embedded objects are separated
from regular user requests. To avoid interference between the prefetch and
demand requests, any spare resources available on servers and network should be
utilized effectively by the web prefetching system (Kokku et al 2003).
Prefetching hit ratio and bandwidth overhead are the most popular parameters
used for evaluating the performance of prefetching system.

42
Based on the behavior of user when navigating the website, browser
idle time will vary; if user navigates too fast between the pages then web browser
will not have enough time to prefetch all the hints. It occurs even if the prediction
algorithm provides accurate hints to reach upper bound in latency savings. To
overcome this situation, it is important to provide the good hints in order so that
user can prefetch them to maximize latency savings.
3.2 NAÏVE BAYES APPROACH
The process of computing priority value of hyperlink involves
applying Naïve Bayes Classifier to find probability of each token in the hypertext
that gets added to finally produce the priority value of link. The use of Naïve
Bayes Classifier for computing the priority value are due to the fact that they are
fast and accurate, simple to implement and has the ability to dynamically learn
user access patterns. It can outperform more sophisticated classification methods
with its performance. Text classifier an automated means to determine metadata
of a document is applied to various issues such as spam filtering, category
suggestion for document indexing, sorting help desk requests.
The proposed approach generates predictions based on the following
steps:
a) Extract hyperlinks from a web page displayed in the browser
b) Select tokens (keywords) from each hypertext associated with the
hyperlink.
43
c) Compute probability for each token in the hypertext, which is
combined to generate priority value of hyperlink. Probability of
each token computed using Naïve Bayes Classifier.
d) Priority value used to decide the hyperlinks that will be added to
the prediction list (hints).
e) Prefetch web objects using hyperlinks in the prediction list.
User requests a web page by typing URL in the web browser or
clicking hyperlinks in the web page. When the cache contents are used to satisfy
the user requests, it significantly reduces the user access latency.
When user visits a web page and spends some time either reading or
exploring some valuable information, then the textual content of that page
reveals ‘region of interest’ that matches with the user’s interest. When users are
not visiting web pages according to a pattern, then it indicates that they are
randomly exploring the pages not looking for particular information.
3.2.1 Prediction/Prefetch Procedure
It is responsible for generating the web predictions that is used to
prefetch web objects for satisfying the user requests with minimal latency. It uses
client-side prefetching mechanism where the client directly prefetch web objects
from server and stores them in its local cache (prefetch cache) to serve the user
requests. Both prediction and prefetching engine are located in the client machine,
and the prediction engine uses components such as Tokenizer and Token
44
Repository for computing the priority value of hyperlinks. An on-demand request
is generated to the server only if the requested web page is not available in cache
or it has become invalid in the cache.
When user visits a web page, hyperlinks in that page forms a pool of
URL’s from which user selects a hyperlink that suits his/her interest to visit the
next page. The aim of proposed approach is to efficiently identify set of
hyperlinks in a web page that reflects user’s interest to create a prediction list that
is used for prefetching the web objects during browser idle time.
The process of collecting hyperlinks from a web page to create
prediction list for prefetching the web objects is shown in Figure 3.1. Procedural
steps for performing the prediction and prefetching (as in Figure 3.1) are
explained as follows:
1. User initially requests a new web page by typing its URL in the
web browser.
2. The requested web page is retrieved from server and displayed to
the user.
3. Process the displayed web page to extract hyperlinks and their
associated hypertexts for evaluation.
4. Each hypertext is processed to extract set of tokens, where ‘token’
represents meaningful word in the hypertext.

45
Web Browser
1
User enters the URL 5
2 User-Accessed
Repository
Web page displayed 6
3 Token Count
Hyperlinks 7
1. . . . . . . .
2. . . . . . . . Naïve Bayes Hyperlink
3. . . . . . . . Classifier Priority
4 8
Hypertext to Tokens Prediction List

1. . . . . . . .
2. . . . . . . .
3. . . . . . . .
10 9
Prefetch Cache
11
12
Display web page Internet
Figure 3.1 Prediction and Prefetching – Naïve Bayes Approach

46
5. Tokens of a hypertext are added to the user-accessed repository
when user visits a new web page by clicking particular hyperlink.
6. Update the token count if it already exists in the repository or else
add token as new entry in the repository.
7. Probability value for each token of hypertext computed using the
token count information maintained in the user-accessed repository.
Priority value for each hyperlink computed by combining the
probability values of its tokens. It uses Naïve Bayes to perform the
computation.
8. Hyperlinks are arranged based on its priority value to create a
prediction list (hints).
9. Web objects are prefetched during browser idle time using the
hyperlinks in prediction list and stored in prefetch cache.
10. When user requests new web page by either typing its URL in the
browser or clicking hyperlink in a page, the request is verified in
local cache (regular / prefetch cache) for its availability.
11. When the requested information is available in local cache, web
page is displayed with minimal latency.
12. The web page will be retrieved from server and displayed to the
user, when it is not available in local cache.

47
3.2.2 Implementation
The prediction and prefetching activity both occurs at the client
machine with prediction engine responsible for generating the predictions and
prefetching engine responsible for retrieving the web objects and storing it in
prefetch cache. A prediction is considered useless in reducing the user perceived
latency when the predicted object has already been demand requested by the user
and is waiting in the browser queue for connection to the web server.
3.2.2.1 Prediction Engine
It is responsible for computing the priority value of each hyperlink by
applying Naïve Bayes classifier on the set of tokens associated with each
hypertext. The advantages of using Naïve Bayes Classifier (Rish 2001) for
computing the probability value of tokens are: a) simple mechanism to compute
value for the specified data b) requires minimal storage, since it maintains only
the token count and c) performs incremental update whenever new data is
processed.
3.2.2.1.1 Tokenizer
When user visits a new web page, Tokenizer parses that page to extract
hyperlinks and its associated hypertexts. Each hypertext is analyzed to identify
meaningful keywords that act as tokens of that text. Hypertext refers to the text
that surrounds hyperlink definitions (hrefs) in web pages (Xu and Ibrahim 2004).
48
Hyperlink is contained in the source web page, and it is represented as:
<A HREF="http://www.ircache.net/"> Web Cache Information Site </A>

It refers the target page to be visited. Link’s anchor appears on the source page
with underlined text:
Web Cache Information Site
When user selects the anchor, the contents of target page will be displayed on the
browser.
In our approach, text between the tags <a> and </a> is used to
compute priority value of hyperlinks. When user clicks a hyperlink to visit new
web page, tokens of its hypertext are stored in user-accessed repository. When a
token has new entry in the user-accessed repository, it will have initial count
value of 1. For the tokens that exist already in the repository, its count value will
be incremented.
The Tokenizer analyzes each hypertext to remove the stop words,
since they do not provide any meaningful information to be considered for
computing the priority value of hyperlinks. To remove the stop words, tokens of
hypertext are compared with a database that contains commonly occurring stop
words such as the one shown in Figure 3.2. After removing the stop words,
tokens are further subjected to stemming. It is the process of converting the
words from its inflectional form or derivationally related form to their common
base form. Factors to be considered when stemming the words are: a) Different
words with same base meaning converted to same form and b) Words with
distinct meanings are kept separate. Porter stemming (Porter 1980) algorithm is
49
used to perform stemming operation, which is a simple utility to reduce the
english words to their word stems (without ‘ing’, ‘ings’, ‘s’).
able, about, above, again, after, and, any, back, be, been, before,
below, but, by, came, can, can't, did, do, each, edu, eg, even, ever, far,
for, few, get, go, gone, got, has, have, her, here, how, if, in, is, isn't,
keep, kept, last, less, little, like, let's, make, may, many, miss, more,
my, name, new, next, not, now, of, often, one once, only, over, plus,
per, please, quite, right, round, saw, say, seen, sent, shall, since, still,
take, than, that, this, there, thing, twice, two, use, us, via, want, was,
we way, when, where, who, why, yet, you, yours, zero
Figure 3.2 Some commonly used Stop Words
The removal of stop words helps to consider only meaningful words as
tokens and store it in user-accessed repository. Stemming minimizes the number
of unique entries for tokens in the repository.
3.2.2.1.2 User-Accessed Repository
It stores tokens of hypertexts associated with hyperlinks that are used
to visit the new web pages. Each token is stored with an initial count of 1, which
gets incremented when same token is added to the repository from hypertexts of
used hyperlinks. The tokens stored in the repository exhibits user’s browsing
interests and it is used for computing the priority value of hyperlinks. Repository
information reflects user and session characteristics, where session represents the
50
time interval between the start and end of user’s browsing instance. During a
browsing session, user clicks several hyperlinks to visit web pages of his interest.
Hypertexts associated with used hyperlinks are stored in the repository as
independent tokens (keywords). When user has long browsing session and surfs
the web focusing on same topic, then there will be saturation in identifying new
keywords to be added to the repository.
Table 3.1 User-Accessed Repository
a) Without Stemming b) With Stemming
Tokens Count Tokens Count

Academics 4 Academ 9
Academic 5
Academician 2 Academician 2
Admissions 2
Admission 1 Admiss 3
Applied 5
Appli 12
Apply 7
Computer 4 Comput 11
Computing 3
Computers 4 Cours 5
Courses 2
Course 3 Deadlin 2
Deadlines 1
Deadline 1 Engin 10
Engineering 6
Engineer 4 graduat 11
Graduate 2
Graduates 4 requir 6
Graduating 5
requirements 2
requirement 2
require 2
51
Table 3.1 represents the sample user-accessed repository (with and
without stemming) used for computing the priority value of hyperlinks. Each
entry in the repository contains token with its occurrence count. When stemming
is not applied, then words that are closely related to each other will have separate
entry in the repository with its occurrence count. It leads to the following
problems: a) increases the number of entries in the repository b) occurrence
count of related words spread across multiple entries. When stemming is applied,
related words are confined to a single base form that reduces the number of
entries in repository. It also improves the occurrence count, since the count
values of related words are combined into a single value.
When user performs browsing without focusing on specific topic of
interest, then most of the keywords generated would be trivial and cannot be used
for generating the predictions. The repository is of fixed size and new tokens are
added into it by eliminating old tokens when the repository size reaches its
maximum limit. Repository size should be carefully selected to avoid elimination
of legitimate tokens and to prevent trivial tokens from occupying the space for
longer time.
3.2.2.1.3 Computing Priority Value of Hyperlinks
Hypertext associated with each hyperlink present in a web page is
taken and its tokens are compared with tokens stored in user-accessed repository
to compute probability value of each token. The priority value of hyperlink is

52
obtained by multiplying the probability value of its tokens. Hyperlinks are then
sorted based on its priority value to create prediction list.
The priority value of each hyperlink is computed by applying Naïve
Bayes classification (Pop 2006) formula shown in Equation (3.1).
Pr (A|U) · Pr (U)
Pr (U|A) = (3.1)
Pr (A)
U = User-accessed Repository
A = Hypertext associated with Hyperlink
Pr (U | A) = probability that an hypertext is in user-accessed
repository
Pr (A | U) = probability that for given user-accessed repository the
tokens of a hypertext appears in that repository
Pr (U) = probability of user-accessed repository
Pr (A) = probability of occurrence of a particular hypertext
The value of Pr (U) will be 1, since it is the only repository used for
computation. The value of Pr (A) that acts as a scaling factor for Pr (U | A) will
be constant and is omitted during computation. Based on these factors, the
Equation (3.1) is simplified into Equation (3.2) as:
Pr (U|A) = Pr (A|U) · 1 (3.2)

53
Steps for computing the probability Pr (A | U) are:
1. Hypertext of each link represented as set of tokens
Hypertext = {T1, T2, T3 …… Tm}, T1to m = Tokens
2. Compute probability value of each token using Equation (3.3).
Count of Ti in U
Pr (Ti | U) = (3.3)
Total count of Tokens in U
where i = 1 to m
3. Compute probability value of hypertext using Equation (3.4) by
performing product of individual token probabilities.
m
Pr (A|U) = Π C + Pr (Ti | U) (3.4)
i=1
The probability value Pr (A | U) of hypertext will be the priority value
of its hyperlink. While computing probability Pr (A|U), a constant value ‘C’ (C
=1) is added to each token probability value irrespective of whether the token is
available or not in the user-accessed repository. When a token is not available in
the user-accessed repository, then its probability value will be zero. Reason for
adding ‘C’ to each token probability is to achieve the following two conditions:
1) probability value of hypertext should not be less than the individual token
probability 2) probability value of hypertext should not be zero due to absence of
few tokens in the user-accessed repository. When none of the tokens of a
hypertext is present in the user-accessed repository, then its probability value will
be 1 due to addition of ‘C’ to each token probability. The probability value of

54
hypertext will be greater than 1, if either few tokens or all the tokens of hypertext
are present in the user-accessed repository. Based on these factors, hyperlinks
having priority value greater than 1 are considered for inclusion in the prediction
list.
3.2.2.1.4 Prediction List
For each web page, based on the computed priority value of hyperlinks,
prediction list is created by including hyperlinks with good priority value. The
prediction list is implemented as a priority queue that always maintains high
priority links at the top of queue. Prefetching engine takes links from the top of
queue to prefetch web objects during browser’s idle time.
Hypertext
Hyperlink Prediction Engine (Computes Priority)
Hyperlink, Priority
Prediction List
Link1 Link 2 Link 3 Link 4 Link 5 Link 6 Link 7
1.786 1.783 1.69 1.6 1.567 1.54 1.34
Prefetch Engine
Figure 3.3 Prediction List for Prefetching
When user navigates to new web page, prediction list will be cleared
and filled with new set of hyperlinks based on the new web page. It helps to
55
eliminate prefetching of irrelevant links during user browsing session. Figure 3.3
represents the process of adding hyperlinks to the prediction list (maintained as
priority queue) that is used for prefetching the web objects.
3.2.2.2 Prefetching Engine
It is responsible for retrieving the web objects from server in advance
and store it in prefetch cache maintained at the client machine to serve user
requests with minimal latency. Web objects are prefetched using the hyperlinks
taken from the prediction list. It carries out prefetching only during browser’s
idle time. Prefetch requests are given low priority than regular user requests, so
whenever user makes a request the prefetching engine suspends ongoing
prefetching activity. The number of links that can be prefetched will vary
depending on the amount of time a user spends on each web page during its
browsing session. If user spends more time on a page, then more links can be
prefetched.
To eliminate the impact of caching due to temporal locality exhibited
in user access patterns, the client maintains prefetch cache separately from
browser’s in-built cache. When new web objects need to be stored in prefetch
cache and if it is full, then it selects objects not accessed for a long time to be
purged from cache to make space for storing newly downloaded objects.
Prefetching an object that expires very frequently or changes every time it is
requested will be useless and wastes resources.

56
3.3 FUZZY LOGIC APPROACH
Fuzzy Logic has been used over the years in several domains such as
expert systems, data mining and pattern recognition. It deals with fuzzy sets
(Zadeh 1965) that allow partial membership in a set represented by its degree of
relevance. Fuzzy Logic is capable of handling approximate or vague notions that
exist in several information retrieval (IR) tasks (Chris Tseng 2007) and helps to
establish meaningful and useful relationships among objects.
This approach also uses the information associated with hyperlinks in
a web page to predict web objects to be prefetched for satisfying user’s future
requests. The priority value of hyperlinks is computed by applying fuzzy logic
over the set of tokens associated with each hypertext and use it to generate the
prediction list. It uses information stored in two repositories: user-accessed and
predicted-unused for computing the priority value of hyperlinks. The use of
predicted-unused repository is to filter out hyperlinks that are of less or no
interest to the users.
3.3.1 Prediction/Prefetch Procedure
The prediction and prefetching engine are both implemented at the
client machine to perform prefetching activity that helps to minimize user
perceived latency. Prediction engine is efficiently designed to identify relevant
set of hyperlinks in a web page that reflects user interest. Prefetching engine uses
57
the prediction list to prefetch web objects and store it in prefetch cache before the
user demand requests those objects.
Figure 3.4 represents the process of applying fuzzy logic to decide the
set of hyperlinks in the prediction list, which will be used by prefetching engine
to download the web objects.
The procedural steps shown in Figure 3.4 are explained as follows:
1. User initially requests a web page by typing its URL in the web
browser.
2. The requested web page is displayed on the browser after its
contents is downloaded from web server.
3. Extract all the hyperlinks and its associated hypertexts from the
displayed web page for computing priority value of hyperlinks.
4. Process each hypertext to extract set of tokens, where ‘token’
represents meaningful word in the hypertext.
5. When user visits new web page by clicking hyperlink in the current
page, then tokens of that hypertext are added to user-accessed
repository.
6. Update the token count if it already exists in the repository or else
add token as new entry in the repository.

58
Web Browser
1
User enters the URL
2 5
User-Accessed Predicted -Unused
Display Web page Repository Repository
3 6 14
Extract Hyperlinks Token Count Token Count
1. . . . . . . .
2. . . . . . . . 7
3. . . . . . . .
13
4 Fuzzy Compute Priority value
Logic
Convert Hypertext
8
to Tokens
Prediction List
1. . . . . . . .
2. . . . . . . .
3. . . . . . . .
10 9
Prefetch Cache
11
12
Display web page Internet
Figure 3.4 Fuzzy based Prediction and Prefetching

59
7. Compute priority value of each hyperlink by applying fuzzy logic
over the set of tokens associated with each hypertext with reference
to user-accessed and predicted-unused repositories. Initially
predicted-unused repository will be empty and then it gets tokens
once prediction activity is started.
8. Based on the computed priority value, hyperlinks are selected to
form prediction list. Hyperlinks with high priority value remains at
the top of list.
9. Prefetch engine uses hyperlinks from the prediction list to
download web objects from server and store it in prefetch cache.
10. When user wish to visit new web page by either clicking hyperlink
in the current page or typing URL in the web browser, contents of
prefetch cache verified to find if it can satisfy the user request.
11. When prefetch cache is able to satisfy user request, then contents of
page gets displayed in the web browser with minimal latency.
12. When the requested contents not available in prefetch cache, then it
gets retrieved from web server and displayed to the user.
13. Tokens of hyperlinks in the prediction list that are not used by
users are moved to predicted-unused repository.

60
14. Count value of tokens gets incremented in predicted-unused
repository whenever tokens of unused hyperlinks are moved to the
repository. When user visits new web page, prediction list will be
cleared and then populated with new set of hyperlinks to support
prefetching activity.
3.3.2 Implementation
Similar to the Naïve Bayes approach, both prediction and prefetching
engine are implemented in the client machine. In fuzzy logic approach, two
repositories (user-accessed and predicted-unused) are used for computing the
priority of hyperlinks.
3.3.2.1 Prediction Engine
It computes priority value of hyperlinks by applying fuzzy logic over
the set of tokens related to hyperlinks. The role of Tokenizer and user-accessed
repository used in this approach are similar to that of Naïve Bayes Approach.
Tokenizer is responsible for parsing the web page to extract hyperlinks along
with its hypertexts, which is analyzed to form set of tokens from each hypertext
and stored in user-accessed repository when hyperlinks are used by users. User-
accessed repository is used to store tokens of hypertexts associated with
hyperlinks used by users to visit web pages. Additional repository called as
predicted-unused repository is used to filter out unwanted predictions.

61
3.3.2.1.1 Predicted- Unused Repository
The tokens of hypertexts associated with hyperlinks that are predicted
but not used are stored in this repository. It provides feedback to prediction
engine for tuning the generation of predictions from web pages. The repository
provides two main benefits: a) minimizes the influence of tokens in user-
accessed repository that are of less or no interest to the user when computing
priority value of hyperlinks b) minimizes number of predictions generated for
each web page. When ‘n’ number of predictions is generated for each web page,
only one in the prediction list may match with the hyperlink used by user to visit
the next page. The tokens of hyperlinks in prediction list that do not match with
user interests are stored in this repository. Tokens of hyperlink are subjected to
stop word removal and stemming before they are stored in this repository.
Steps for adding tokens to predicted-unused repository are:
1. Prediction engine recommends set of hyperlinks (predictions) for
each web page based on the computed priority values.
2. From the recommended set of hyperlinks, tokens are collected
and stored in a temporary buffer.
3. Check if hyperlink used by the user to visit next page matches
with any hyperlink in the prediction list. If there is no match, go
to step 5 else proceed to the next step.

62
4. The tokens of matched hyperlink available in the temporary
buffer are moved to user-accessed repository.
5. Tokens of unmatched hyperlinks available in the temporary
buffer are moved to predicted-unused repository. To create new
prediction list for the next web page, go to step 1.
3.3.2.1.2 Computing Priority Value of Hyperlinks
Each hypertext is represented as set of tokens {T1, T2, T3 . . . Tn} for
computing its priority value. Fuzzy logic is applied over the set of tokens by
associating it with fuzzy set (i.e. repository storing the tokens). The tokens are
related to fuzzy set with similarity degree in the range 0 to 1.
Let R1 represents the user-accessed repository and R2 represents the
predicted-unused repository. Membership value of each token (Ti) relative to
repository R1 is computed by dividing the token count in repository R1 with sum
of token count in both the repositories R1 and R2 as shown in Equation (3.5).
(TCi)R1
µR1(Ti ) = (3.5)
(TCi)R1 + (TCi)R2
Membership value of Ti relative to repository R2 is computed as shown

in Equation (3.6).
µR2(Ti) = 1 - µR1(Ti) [i = 1 to n | n = number of tokens] (3.6)

63
µR1(Ti) = Membership of Ti relative to repository R1
µR2(Ti) = Membership of Ti relative to repository R2
(TCi)R1 = Count of Ti in repository R1
(TCi)R2 = Count of Ti in repository R2
Membership value of Ti will be 1, if token is present only in a single
repository (i.e. R1 or R2). After computing the membership value of Ti relative to
repositories R1 and R2, they are compared to decide whether to include Ti for
computing the priority value of hyperlink. The Token Acceptance (TAi) of Ti will
be set to 1 if µR1 (Ti) > µR2 (Ti) i.e. Membership value of T i relative to R1 is
greater than that in R2; else TAi set to 0.
For Ti with its TAi set to 1, the token popularity (TPi) in repository R1
is computed by dividing its token count with maximum token count value in R1
as shown in Equation (3.7).
(TCi)R1
TPi = (3.7)
max [(TC)R1]
For Ti with its TAi =0, TPi will be set to zero.
The priority value (PV) of hyperlink is computed as shown in Equation

(3.8), where ‘n’ indicates the number of tokens in hypertext.
n
1 S TPi (3.8)
PV =
n i=1
64
3.3.2.1.3 Prediction List
It is implemented as a priority queue similar to that in Naïve Bayes
approach to maintain high priority links at the top of queue. The hyperlinks are
sorted based on the computed priority value that falls within the range of 0 to 1.
Figure 3.5 represents the prediction list that arranges hyperlinks based on the
computed priority values.
Hypertext
Hyperlink Prediction Engine (Computes Priority)
Hyperlink, Priority
Prediction List
Link1 Link 2 Link 3 Link 4 Link 5 Link 6 Link 7
0.98 0.93 0.86 0.85 0.75 0.71 0.65
Prefetching Engine
Figure 3.5 Prediction List based on Fuzzy Computations
3.3.2.2 Prefetching Engine
The hyperlinks in the prediction list are used by prefetching engine to
download web objects during browser idle time that helps to avoid interference
with regular user requests. Prefetching engine will not download all the predicted
web objects due to the following factors:

65
· Lack of idle time due to faster navigation between web pages by
users
· Few predicted objects may already exist in regular cache
· Few predicted objects may already be demand requested by users.
Prediction list should have hyperlinks that reflects user interests, else
prefetching of objects will waste user and server resources leading to
performance degradation. Web objects prefetched are stored in prefetch cache
and managed using LRU algorithm. If a demand requested web object resides in
prefetch cache, then it is moved to regular cache. A web object will not reside in
both the caches (regular and prefetch cache) at the same time.
3.4 EVALUATION
Trace based simulations cannot be used effectively for evaluating the
performance of Naïve Bayes and Fuzzy Logic approaches, since web traces do
not provide comprehensive view of client’s browsing interests. Effective way to
gather the required information is to capture user’s interest at client side by
analyzing browsing pattern in each session. User visits new web page either by
typing its URL in web browser or clicking hyperlink in a web page. In the
proposed approaches, user interests are obtained by tracking information
associated with hyperlinks used to visit web pages in browsing sessions.
Evaluation is carried out by performing web browsing that focuses on
information related to user’s interest. It uses open source browser (CxBrowser)

66
developed in c# language for performing the browsing sessions. Both the
prediction and prefetching engine implemented as add-on to the web browser
that allows user to configure prefetch settings based on the requirements. Each
browsing session has active and idle periods based on the access pattern of user.
Active period represents the phase where web objects are demand requested by
user and idle period represents the phase where the displayed web objects are
viewed by user. The retrieval of main html file and its embedded objects initiates
the active period. Idle period is used by prefetching engine to download web
objects using the hyperlinks in prediction list and store them in prefetch cache. A
log file is used to record user requests during browsing sessions that is used to
analyze the performance of proposed approaches.
Number of links prefetched depends on the amount of time a user
spends reading the displayed web page and it is considered an important attribute
in predicting user’s interests (Liang and Lai 2002; Gunduz and Ozsu 2003; Guo
et al 2007). The user navigation time was partitioned into four discrete intervals
(Xing and Shen 2004): passing, simple viewing, normal viewing and preferred
viewing. If user spends more time reading a web page (preferred viewing), it
increases browser idle time allowing many hyperlinks in the prediction list to be
prefetched. When user visits a web page that contains content irrelevant to his
interest, then prediction list for that page will have either zero or less number of
hyperlinks. When user demand requests a web object, its availability is first
checked in the local cache (regular or prefetch cache) before forwarding the
request to proxy or web server.

67
When user initially starts a browsing session, the repositories (user-
accessed and predicted-unused) remain empty and they cannot be used
immediately to compute the priority value of hyperlinks. User-accessed
repository will receive tokens once user starts using hyperlinks present in web
pages to visit new pages. Predicted-unused repository will receive tokens of
hyperlinks that are predicted but not utilized to prefetch web objects or does not
match with user’s browsing interest.
3.4.1 Experimental Results
The performance of proposed approaches (Naïve Bayes and Fuzzy
Logic) depends on the browsing interest of individual users and hence the results
from different users are incomparable. Results are obtained by analyzing the
browsing sessions carried out for a period of six weeks. To establish user access
patterns systematically, a topic of interest is specified to guide the user’s daily
browsing behaviors. The metrics used for evaluation are: Recall (hit rate) and
Precision (accuracy). Recall (Rc) shown in Equation (3.9) indicates the
percentage of user requests served using the contents of prefetch cache against
the total number of user requests. When recall is high, it effectively minimizes
the user perceived latency since most of the user requests are served from local
cache.
Prefetch Hits
Rc = (3.9)
Total User Requests
68
Precision (Pc) shown in Equation (3.10) indicates the percentage of
prefetched web pages requested by users against the total number of web pages
prefetched. It reflects the effectiveness in generating useful predictions during
browsing sessions.
Prefetch Hits
Pc = (3.10)
Total Prefetchs
3.4.1.1 Naïve Bayes Approach
The hyperlinks to be prefetched are predicted based on the priority
value computed using Naïve Bayes formula. Since prefetch operation is carried
out only during browser idle time, the number of hyperlinks prefetched will vary
dynamically depending on the user’s access pattern. Figure 3.6 represents the
hyperlinks that are predicted and prefetched for pages accessed by users (e.g.
user-A, user-B) in a browsing session. Initially few pages in the session have
least number of hyperlinks predicted, because the user-accessed repository
remains empty and it cannot be used to make predictions. When user starts
visiting new pages using hyperlinks in a session, then user-accessed repository is
filled with tokens that improve the predictions for the pages. When user visits the
pages quickly, then only few objects are predicted and prefetched. In some
instances, user spends more time in a particular page providing an opportunity to
predict and prefetch large number of hyperlinks.

69
User -A
20
18
16
14
No. of 12
Predicted
Pages 10
Prefetched
8
6
4
2
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Pages accessed in a session
User-B
35
30
25
No. of
Pages 20 Predicted
15 Prefetched
10
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Pages accessed in a session
Figure 3.6 Pages Predicted and Prefetched in a User Session

70
0.8
Recall
0.6 User-A
0.4 User-B
0.2
0
10 20 30 40 50 60 70
User access session
(in minutes)
Figure 3.7 Recall in Naïve Bayes Approach
Figure 3.7 represents the Recall (R c) achieved during different
browsing sessions based on the users’ access patterns. The percentage of Recall
varies between the users’ (e.g. user-A, user-B), due to the fact that the
prediction/prefetch activity depends on the individual browsing behavior during
each session. As shown in the graph, when user has long browsing sessions the
repositories will contain large number of tokens that helps to predict useful
hyperlinks to improve the performance. The number of links prefetched is based
on the browser idle time, and it varies depending on the user’s navigation pattern
across pages.
71
0.8
Precision
0.6 User-A
0.4 User-B
0.2
0
10 20 30 40 50 60 70
User access session
(in minutes)
Figure 3.8 Precision in Naïve Bayes Approach
Figure 3.8 represents the Precision (Pc) achieved during different
browsing sessions based on the access patterns of users. When user-accessed
repository collects tokens that effectively reflect the user interest, then the
prefetched pages will be useful to the user. During browsing sessions, users
searched for information related to a specific topic that helped to utilize the
repository effectively for generating the predictions. If user randomly access
pages not looking for specific information, then usage of prefetched pages will
reduce drastically leading to poor performance.
3.4.1.2 Fuzzy Logic Approach
The performance of fuzzy logic approach is also analyzed by
considering the metrics: Recall and Precision. This approach used two
repositories: user-accessed and predicted-unused to guide the prediction engine

72
in generating the predictions. Predicted-unused repository is used to provide
feedback to the system regarding the unused hyperlinks that helps to fine tune the
links suggested to be prefetched.
0.8
0.6
Recall
User-A
0.4 User-B
0.2
0
10 20 30 40 50 60 70
User access session
(in minutes)
Figure 3.9 Recall in Fuzzy Logic Approach
Figure 3.9 indicates the Recall (Rc) achieved during the browsing
sessions for fuzzy logic approach. As indicated in the graph, when user has long
browsing session the percentage of recall improves and helps to reduce the user
access latency. Since links are suggested based on the inference from both the
repositories (user-accessed and predicted-unused), majority of links prefetched is
useful in satisfying the user requests.

73
0.8
Precision
0.6 User-A
0.4 User-B
0.2
0
10 20 30 40 50 60 70
User access session
(in minutes)
Figure 3.10 Precision in Fuzzy Logic Approach
Figure 3.10 indicates the Precision (Pc) achieved during the browsing
sessions for fuzzy logic approach. The use of predicted-unused repository helps
to identify tokens of hypertext that are of less interest to users and eliminate them
from being used for computing the predictions. As a result, the hyperlinks
predicted and prefetched will closely match the user interests thus minimizing
the number of prefetched links left unused by users.
The performance of fuzzy logic approach is compared with various
approaches by altering the number of links to be prefetched during browser idle
time. Top-down approach by Markatos and Chronaki (1998) and Bigrams in link
approach by Georgakis and Li (2006) are used for comparison. It is also
compared with Naïve Bayes approach discussed in section 3.2.

74
0.8
Top-down
Recall 0.6
Naïve Bayes
0.4 Bigrams-Link
Fuzzy Logic
0.2
0
2 4 6 8 10
Number of links Prefetched
Figure 3.11 Comparison of Recall in various approaches
Figure 3.11 compares the Recall achieved in various approaches for
different number of prefetched hyperlinks. The number of links to be prefetched
during browser idle time is varied between 2 to 10 for analyzing the effectiveness
of each approach in satisfying the user requests. When more links are prefetched,
it can easily satisfy the user requests. As indicated in the graph, fuzzy logic
provides better performance over the other approaches in all the cases. Naïve
Bayes approach is better than Top-down approach and comes closer in
performance to the Bigrams in link approach. The reason for fuzzy logic
approach being able to produce better results are, it can refine the generation of
predictions based on user browsing pattern.

75
0.8
Top-down
Precision 0.6 Naïve Bayes
0.4 Bigrams-Link
Fuzzy Logic
0.2
0
2 4 6 8 10
Number of links Prefetched
Figure 3.12 Comparison of Precision in various approaches
Precision (Pc) achieved in various approaches is shown in Figure 3.12.
As shown in the graph, when minimal number of links is prefetched (i.e. 2 to 4)
it moderately satisfies the user requests, resulting in precision with a range of
(0.3 to 0.55) across various approaches. Precision in the range (0.4 to 0.65) is
achieved when more than four links are prefetched with an upper bound of up to
eight links. Prefetching more than eight links will satisfy the user requests but the
number of unused links will be more resulting in wastage of resources and poor
precision rate. Fuzzy logic approach provides better performance across all the
cases when compared to other approaches.

76
3.5 CONCLUSION
This chapter discusses web prefetching based on information
associated with hyperlinks present in web pages. Two approaches: Naïve Bayes
and Fuzzy Logic are designed to compute priority value of hyperlinks using
hypertext information. Naïve Bayes approach used only the token information
stored in user-accessed repository for computing the priority of hyperlinks.
Fuzzy logic approach used token information stored in user-accessed and
predicted-unused repository for computing the priority of hyperlinks. It helps to
generate effective predictions during browsing sessions, thereby efficiently
minimizing the user perceived latency.
The predictions are generated dynamically for each new web page
visited by the user. Experimental results indicate that the proposed approaches
generate efficient predictions to mitigate the latency. Recall and Precision
metrics clearly demonstrate the efficiency of fuzzy logic and naïve bayes
approaches over the existing algorithms.

77
CHAPTER 4
PRECEDENCE GRAPH BASED WEB PREDICTION
4.1 INTRODUCTION
Prefetch systems are designed to generate web predictions based on
criteria such as access patterns, object popularity and structure of accessed web
documents. This chapter discusses a prediction algorithm that builds Precedence
Graph (PG) by analyzing user access patterns to generate predictions that reflect
user’s future requests. It uses object URI and referrer in each user request to
build precedence relation between the requested web object and its source. The
algorithm differentiates the dependencies between objects of the same web page
and the objects of different web pages to incorporate the characteristics of current
websites when building the graph. It uses simple data structure for implementing
the graph that requires minimal memory and computational resources. When user
requests a new web page, it is satisfied using the contents of prefetch cache and it
reduces access latency observed by the users.
The prediction engine located at web server is responsible for building
the Precedence Graph and generating the predictions. It gathers information
related to user requests for the web pages stored in a server to build the graph.
The output of prediction engine is the hint list (predictions) with set of URI’s that
are likely to be requested by the user in near future. Prefetching engine located at
78
the client receives predictions from server and uses it to download web objects
and store them in prefetch cache maintained at the client. Precedence graph gets
updated dynamically with new nodes/arcs when user requests for new web
objects, which ensures that the predictions are generated based on the recent
information maintained in the graph. Aggressiveness of prefetching is controlled
by employing a ‘threshold’ parameter that is applied to the confidence value of
arcs in the graph when generating the predictions.
Precedence Graph is implemented as a directed graph to represent the
unidirectional relationship between the predecessor and successor nodes. The
predecessor-successor relationship is integrated into the graph on a per-request
basis by either inserting a new arc (when predecessor-successor relationship is
discovered for the first time) between the nodes or updating the occurrence count
of an existing arc (when relationship is already observed) in the graph.
Figure 4.1 represents the basic interaction between web client and web
server, where prefetching mechanism is enabled with server holding the
prediction engine and client performing the prefetching operation. When user
clicks on a URL from a displayed web page or types new URL in the web
browser, a HTTP GET request is sent by the browser to the server for fetching
the object. Web server on receiving the request establishes connection with the
prediction engine and requests it to generate the predictions. Prediction engine
performs predictions for the request and generates hint list, which the web server
79
sends it to the client through HTTP response by including the HTTP response
header (e.g. Link :< P2>; rel: prefetch) that indicates the link to be prefetched.
Web HTTP Web Local Prediction

Browser Protocol Server Connection Engine
Request uses
GET P1 Predict
P1
P1
Precedence
User Graph
P1 200 OK (P1) Hints: P2, P3
Link: <P2>; rel: prefetch Generates
Hints
Link: <P3>; rel: prefetch
GET P2
Browser
Idle
Time 200 OK (P2)
(Prefetch) P2
Request
P2 Prefetch
User Cache
P2
Figure 4.1 Browser & Server interaction with Prediction and Prefetching
80
When web browser is idle, it uses the hints received in HTTP response
to download the web objects and store it in prefetch cache. If user requests new
web page (e.g. P2), and if it is available in prefetch cache, then it is served
immediately without any latency.
The number of links that can be prefetched is bounded by the
following factors: a) availability of total bandwidth b) time the user spends in
each page and c) size of visited web pages. Prefetching algorithm should aim to
balance the cache-hit ratio and usefulness against the bandwidth overhead when
anticipating future requests.
4.2 PRECEDENCE GRAPH
The objectives of building Precedence Graph to generate the
predictions are:
· To build the graph with minimal number of arcs that will reduce
the computation time.
· To generate effective predictions in quick time.
· To manage the graph size within controllable limits by
periodically subjecting the graph to trimming of nodes and arcs.
4.2.1 Introduction
The prediction algorithm is designed to build Precedence Graph (PG)
that represents the user access patterns with nodes representing the web objects
and arcs representing the relation between web objects. Arcs are added between
81
the nodes when user request reports the requested object (successor node) and
source object (predecessor node) from where the request is generated. Each arc
has transition weight associated with it that represents the transition confidence
from predecessor to successor nodes.
When user requests a web object, the request information is used to
update the graph in two ways: 1) add a new node to represent the web object and
a new arc to represent the relationship between the new node and an existing
node in the graph 2) update the existing node and arc by incrementing the node
and arc occurrence count. The updated graph generates predictions for the user
request by analyzing nodes and arcs that are related to the node representing the
requested web object. Arcs with occurrence count greater than threshold value
are considered for generating the predictions. Prefetching engine uses the
generated predictions to download web objects from the server during browser
idle time.
Web access log files are filtered to select appropriate HTTP method
(i.e. GET) and HTTP response code (200 – OK, 304 – Not Modified, 206 –
Partial Content) before it is used for building the Precedence Graph. HTTP
headers offer accurate information, since they are explicitly provided by the web
client and server.

82
4.2.2 Building the Graph
The algorithm for building Precedence Graph is as follows:
Input:
· User requested object (URL)
· Referrer in the user request
· Object type (primary/secondary)
Output: updated Precedence Graph
Step 1: Adding new node (or) updating existing node
‘x’ → a node in the graph
Find ‘x’ that matches with the user requested object
If ‘x’ available, then
occurrence (x) ← occurrence (x) + 1 // count updated
Else
‘x’ newly created // represents the requested object
occurrence (x) =1
End if
Step 2: Adding new arc (or) updating existing arc
‘ y’ → a node in the graph
Find ‘y’ that matches with the referrer in user request
If ‘y’ available, then {
Find the arc ‘yx’ // transition from node y to x (y x)
If ‘yx’ available, then
occurrence (yx) ← occurrence (yx) + 1
Else
‘yx’ newly created // arc from y to x
occurrence (yx) = 1
End if
} Else
No arc gets added or updated in the graph
End if
Step 3: Compute transition confidence of all arcs from node ‘y’
arc transition confidence ← arc occurrence / ‘y’ occurrence
Return Precedence Graph
83
The algorithm analyzes the user access patterns to build Precedence
Graph for predicting user’s future requests. Each node in the graph represents
user requested web object with an initial count value of 1. When user requests
the same web object again, then node’s occurrence count get incremented. Each
arc in the graph represents transition from one web object into another object.
For each user requested web object, an arc will be drawn with an initial count
value of 1, if its source node (predecessor) is specified in the request. The arc
reflects precedence relation that exists between the source object (predecessor
node) and the newly requested object (successor node). When user repeats
request for the same web object through same source, then the existing arc that
reflects this relation get incremented.
Example: Consider a user requests new web object ‘B’ through source
object ‘A’. Precedence relation established between the objects (A and B) by
drawing an arc from A to B (i.e. object A precedes object B, A ® B) with its
initial count value as 1. When user again requests object B through object A, then
the existing arc get incremented by 1 for each reference.
The source of each requested web object is represented by HTTP
referrer that is recorded for each user request in the log file. Referrer information
is used to create an arc from the source object (predecessor) to the requested
object (successor).
84
P1.html Primary Node
Secondary Node
P2.html P1.gif
Primary Arc
Secondary Arc
P3.html
P2.jpg
Figure 4.2 Precedence Graph
Each web page contains a main object (html) referred as the primary
object and several embedded objects (jpg, png) referred as the secondary objects.
The primary nodes in graph represent main objects that are demand requested by
users, while secondary nodes represent embedded objects that are requested by
web browser.
Figure 4.2 represents sample Precedence Graph created with primary
and secondary nodes. Arcs connecting two primary nodes termed as primary arcs
and those connecting primary and secondary nodes termed as secondary arcs. As
shown in Figure 4.2, objects (P2.html, P1.gif) requested from P1.html, objects
(P2.jpg, P3.html) requested from P2.html, and P1.html requested from P3.html.
4.2.3 Updating the Graph
Initially Precedence Graph is empty and is built and updated through
continuous learning of user access patterns. Each node in the graph contains
85
object URI, node type (primary/secondary), occurrence count and list of
primary/secondary arcs. Each arc in the graph contains destination URI, arc type
(primary/secondary), occurrence count and transition confidence. Node
occurrence count represents the number of user requests to the web object
represented as a node in the graph. Arc occurrence count represents the number
of transitions from the predecessor to successor node. Arc transition confidence
is computed by dividing the arc occurrence count with predecessor node
occurrence count.
The graph is dynamically updated whenever user requests web objects
and it involves the following steps:
a) Node occurrence count is incremented if it represents the requested
web object; else new node is created and added to the graph with
an initial occurrence count of 1 to represent the web object.
b) Arc occurrence count is incremented if it represents the transition
from source object (predecessor) to the requested object
(successor); else new arc is created between the nodes and added to
the graph with an initial occurrence count of 1 to represent the
transition.
The graph will grow in size during its learning process and is
controlled by removing nodes and arcs that least represents the user interest and
does not influence the prediction process in generating the hints.

86
4.2.4 Predictions from the Graph
When user requests a web page through web browser, the primary
object of web page is first requested and then secondary objects are requested
either from server or local cache. For each web page, perfect prediction
algorithm (de la Ossa et al 2009) reports three types of hints: a) primary object of
next web page to be requested by user b) secondary objects associated with next
web page and c) further next pages. The proposed prediction algorithm generates
hints for a web object by analyzing nodes in the Precedence Graph. If a node
representing the requested web object is available in the graph, then its
associated arcs are analyzed to generate the predictions; else no predictions are
generated for the web object. The prediction engine needs to provide hints to the
browser as a sorted list according to their probability in order to prefetch relevant
pages.
Steps for generating the predictions are as follows:
1. For each user requested web object find a primary node in the
graph that represents the object.
2. If matching primary node is available, then analyze all the primary
arcs associated with that node. Select arcs having transition
confidence greater than or equal to the specified threshold.
3. Collect object URI’s stored in primary nodes that act as successor
nodes to the arcs selected in step 2. Arrange the object URI’s

87
based on their confidence value (highest to lowest) and add to the
prediction list.
4. Analyze secondary arcs associated with the primary nodes used in
step 3. Select arcs having transition confidence greater than or
equal to the specified threshold.
5. Collect object URI’s stored in secondary nodes that act as
successor nodes to the arcs selected in step 4. Arrange the object
URI’s based on their confidence value (highest to lowest) and add
to the prediction list.
6. Final prediction list contains object URI’s from both primary and
secondary nodes of graph. Prefetching engine uses the list to
download web objects from server. Go to Step 1 if user requests a
new web object.
Predictions (Hints) generated at server can be provided to the client in
three different ways:
1. In a response HTTP header
Link: <one.html>; rel = prefetch
2. In a ‘meta’ tag on the HTML header
< meta HTTP-EQUIV = “Link” CONTENT = “<one.html>;
rel= prefetch”>
88
3. In a ‘link’ tag on the HTML body
< link rel = “prefetch” href = “one.html”>
Figure 4.3 shows sample HTTP header that supplies the referrer
information to server. It is recorded in the access log file at server, which is used
by the prediction engine to build the graph.
Request URL : http://www.rediff.com/getahead
Request Method : GET

Status Code : HTTP/1.0 200 OK
Accept : text/html, application/xhtml + xml, application/xml
Host : www.rediff.com
Proxy-Connection : keep-alive
Referer of
Referer : http://www.rediff.com/sports requested URL
Figure 4.3 Sample HTTP header with referer information
Figure 4.4 shows sample HTTP response from server that includes the
link to be prefetched during idle time. It is used by the client to download the
web object.
4.2.5 Prefetching the Web Objects
Prefetching engine that is integrated into the web browser normally
prefetches URLs provided using HTTP protocol without any embedded objects.
It will not prefetch URLs that contain parameters (queries). For example, Mozilla
Firefox an open source web browser that has web prefetching capabilities
89
recognizes hints included in the response HTTP headers or embedded in the
HTML file to perform prefetching (Fisher 2003).
Content-Encoding : gzip
Content-Length : 10933
Content-Type : text/html
Date : Sat, 24 Nov 2012 08:16:14 GMT
Proxy-Connection : keep-alive
Server : Apache
Vary : Accept-Encoding
Via : 1.0 localhost: 8080 (squid/2.6.STABLE6)
X-Cache : MISS from localhost
Link to be prefetched
X-Cache-Lookup : MISS from localhost: 8080
Link : <http://www.rediff.com/getahead/img1.jpg>
rel = “prefetch”
Figure 4.4 Sample HTTP response with link to be prefetched
In our approach, prefetching engine located at client receives
predictions from server via HTTP response header and uses it to prefetch web
objects during browser idle time. It helps to avoid interference with regular user
requests. Object MIME type in HTTP response header is used to determine
whether a web object is eligible for caching. The downloaded objects are stored
in prefetch cache that is maintained separately from regular cache to improve hit
rate. Web object will not be prefetched if it already exists in either regular or
prefetch cache. When user demand requests a new web page, then prefetching
90
activity in progress will be terminated. When prefetching mechanism is not
available in client machine, then user requests will be served from either local
cache (regular) or web server.
Prefetching engine should not retrieve a web page that will be accessed
after a long time, since at the time of access the page may contain old data. User
perceived latency can be significantly reduced by prefetching more pages, but the
prefetch accuracy will diminish if the prefetched pages are not referenced by
users.
Table 4.1 Sample user requests in a session
Requested URL Referrer for the request

/P1.html -
/P1.gif /P1.html
/P1.jpg /P1.html
/P2.html /P1.html
/P2.jpg /P2.html
/P3.html /P2.html
/P3.jpg /P3.html
/P3.gif /P3.html
/P1.html /P3.html
/P1.gif /P1.html
/P1.jpg /P1.html
/P4.html /P1.html
/P4.png /P4.html
/P4.jpg /P4.html
/P5.html /P4.html
/P1.html -
/P1.gif /P1.html
/P1.jpg /P1.html
/P2.html /P1.html
91
4.2.6 Implementation Example
Table 4.1 shows sample user requests in a session that is used to
illustrate the working of proposed prediction algorithm. The web requests in
Table 4.1 are used to build Precedence Graph shown in Figure 4.5. Primary and
secondary nodes in the graph represented with object URI’s and their occurrence
count. Primary and secondary arcs in the graph represented with their occurrence
count.
P1.html 3 1
1 2 3 3
P2.html 2 P1.jpg 3 P1.gif 3 P4.html 1
1 1 1 1
1
P3.html 1 P2.jpg 1 P4.png 1 P4.jpg 1
1 1
P5.html 1
P3.jpg 1 P3.gif 1
Figure 4.5 Precedence Graph built using the user requests
Figure 4.6 represents the adjacency map implementation of Precedence
Graph in which the primary nodes of graph are stored as keys in the map and
primary/secondary arcs originating from each node are stored as list associated
with keys in the map. Secondary nodes of graph are not stored as keys in the map,
92
since in most cases they do not act as source of new web object. Each element in
the list is shown with three fields: object URI, arc occurrence count and arc
transition confidence.
Key Value
P1.html 3 P1.gif 3 1 P1.jpg 3 1 P2.html 2 0.6 P4.html 1 0.3
P2.html 2 P2.jpg 1 0.5 P3.html 1 0.5
P3.html 1 P3.jpg 1 1 P3.gif 1 1 P1.html 1 1
P4.htm1 1 P4.png 1 1 P4.jpg 1 1 P5.html 1 1
P5.htm1 1
Figure 4.6 Adjacency Map for the Precedence Graph
arc occurrence count

Arc Transition Confidence =
node occurrence count
Example:
Arc transition confidence of objects with reference to primary node
P1.html is:
P1.gif = 3/3 =1
P2.html = 2/3 = 0.6
P4.html = 1/3 = 0.3

93
Table 4.2 represents the hints generated for user requests based on the
information maintained in the Precedence Graph shown in Figure 4.5.
Table 4.2 Hints generated for user requests
User Requests Hints with Confidence value
/P2.html (0.6), /P2.jpg (0.3),

/P1.html
/P4.html (0.3) , /P4.png (0.3 ), /P4.jpg (0.3)
/P2.html /P3.html(0.5) , /P3.jpg (0.5), /P3.gif (0.5)
/P3.html /P1.html (1), /P1.gif (1), /P1.jpg (1)
/P4.html /P5.html (1)
The prediction algorithm provides both primary and secondary objects
as hints. But compared to secondary objects, the primary objects provide more
page latency savings due to the following factors:
a) Service time of primary objects is much longer than that of
secondary objects.
b) Web browsers use single connection to request primary objects,
whereas it uses two parallel connections simultaneously to request
secondary objects.
c) Secondary objects are requested by browser only after the primary
object is received and parsed.

94
4.3 GRAPH TRIMMING
The prediction algorithm dynamically builds Precedence Graph by
constantly updating it with request information whenever user accesses the web
pages. When the graph size increases over time, there will be increase in the
requirement for computational resources. The information stored in the graph
will become obsolete due to the following factors: a) change in access patterns of
the user due to change in topic of interest b) web pages previously accessed by
the users may be removed from the website or referred by a different URL. In
these cases, the occurrence count of nodes and arcs will not be updated and it
remains in their old count value. When graph contains such obsolete information,
it wastes memory and computational resources of the prediction engine by
generating useless predictions that degrades system performance. The problem
can be solved by periodically trimming the graph to remove nodes and arcs that
least represents the users’ interests.
When designing the algorithm to perform trimming operation, the
following issues need to be considered: a) it should not increase the resource
consumption of the prediction algorithm when trimming is performed and b) it
should not affect the prediction accuracy of the algorithm by removing useful
nodes and arcs from the graph. The trimming operation analyzes the entire graph
to cover all its nodes and arcs. The nodes are removed from the graph based on
its popularity (number of access) and access time. If a node does not reach its
minimum popularity or it is not accessed for a long time, then it will be removed
95
from the graph. Nodes having popularity greater than the prescribed threshold or
it has been accessed recently, then it will be retained in the graph. Table 4.3
represents the notations used in the trimming algorithm.
Table 4.3 Notations used in Trimming Algorithm
Variable Meaning
T_C Time Counter
T_Th Trimming Threshold

n_occ_th Node occurrence Threshold
arc_th Arc occurrence threshold
arc_occ Arc occurrence count
n_occ Node occurrence count
Time_Diff Threshold to decide removal of node from graph
n_a_t Node access time
4.3.1 Invoking Trimming Operation
The time counter and trimming threshold are used to decide the
invocation of trimming algorithm. The interval duration to be added with the
trimming threshold is set by the user and it can be decided such that it will not
affect the performance of the system. The procedure used for invoking the graph
trimming algorithm is as follows:

96
Initialization:
T_C = node access time
T_Th = T_C + Interval Duration
Step 1: Updating the Graph and node access time
While (T_C < T_Th) {
// add new node or arc; else increment count of node or arc
Update the graph {
Increment node/arc occurrence count; (OR)
Add new node/arc;
}
T_C = node access time; // Updated with new access time
}
Step 2: Invoking Trimming operation on the Graph
If (T_C ≥ T_Th) {
// Access time greater or equal to threshold, perform trimming
Invoke Trimming algorithm;
}
Step 3: Reset the counters
After trimming is completed, the counters are reset:
T_C = 0;
T_Th =0;
Access time in Node =0; // reset in all the nodes
Go to Step 1 to restart the activity with:
T_C = New access time
T_Th = New threshold value
97
Time Counter (T_C)
It is used to keep track of the current access time, whenever the graph
is updated with user request information. The graph gets updated in two ways: a)
addition of new node or arc to reflect the access information b) increment the
occurrence count of existing node or arc.
Trimming Threshold (T_Th)
It is used to decide when the graph is subjected to trimming operation.
During trimming, the nodes and arcs of graph will be analyzed and based on their
occurrence count and access time, suitable action will be taken.
T_Th = T_C (Initial) + Interval Duration
Trimming threshold value is the summation of the interval duration
specified by the user and the initial value that is set in the time counter before
starting the updating activity in the graph. Whenever time counter is initialized
with new access time, then trimming threshold will have new time that acts as
the timeline to decide the invocation of trimming operation. The interval duration
is configurable and it is set by the user depending on the requirement.
The value for interval duration needs to be selected properly in such a
way that it should not degrade the performance of the prediction algorithm. If the
interval duration is too long, then the graph will contain outdated information
that will be of no use to the user when they are given as predictions. If the
98
interval duration is too short, then it leads to frequent trimming of the graph that
will waste the computational resources and also it may remove useful
information from the graph.
4.3.2 Trimming Algorithm
It is invoked when the time counter (T_C) value reaches up to the
trimming threshold (T_Th) value.
Each node in the graph maintains node access time (n_a_t) that is
updated with the current access time to reflect the request information. The node
access time is compared with time counter value to find when the particular node
was recently accessed. It is compared with threshold value (Time_Diff) to decide
on the action that needs to be performed on a particular node.
T_C = Access Time (recent value)
n_a_t = node access time (last accessed value)
If [T_C – n_a_t] greater than or equal to the threshold (Time_Diff)
value, then the particular node need to be removed from the graph since it has not
been accessed for a long time.
If [T_C – n_a_t] less than the threshold (Time_Diff) value, then arcs
associated with the particular node will be analyzed. Based on the arc threshold,
its primary and secondary arcs will be selected for removal from the graph.
99
The algorithm that performs trimming on the graph is as follows:
If ([T_C – n_a_t] ≥ Time_Diff) {

// node not accessed for long time, need to be removed
If (node has no outgoing links)
Remove the particular node;
Else {
Remove all its secondary arcs and nodes;
Remove all its primary arcs;
Remove primary nodes if it meets criteria;
Remove the particular node;
}
Remove all incoming arcs to this node;
}
Else {
// node recently accessed, analyze its arcs
If (arc_occ < arc_th) {
If (secondary arc)
Remove secondary node and arc;
Else {
Remove primary arc;
Remove primary node if it meets criteria;
}
}
Else
Remove the particular node if its popularity is less;
}
100
Example:
For the illustration, consider the following initializations to the various
variables used in the algorithm.
Time Counter (T_C) = 0;
Interval Duration = 1 hr (3600 Sec)
Trimming Threshold (T_Th) = T_C + Interval Duration
= 0 + 3600
= 3600
Time Difference (Time_Diff) = 30 min (1800 Sec)
Node occurrence Threshold (n_occ_th) = 50
Arc occurrence Threshold (arc_th) = 20
The time counter is initialized to zero and it gets new access time
whenever the graph is accessed. Trimming operation will be performed in the
time interval of every one hour. Minimum number of node occurrences will be
50 and the arc occurrences will be 20.
In the graph shown in Figure 4.10, the meaning of following variables is:
· n_occ = node count
Incremented each time the node is accessed
· n_a_t = node access time (in seconds)
Updated with new access time when the node is accessed

101
P1.html
25 n_occ = 100 12
n_a_t = 3600
100 100
P2.html 10 P3.html
n_occ = 70
n_occ = 100
P1.gif P1.jpg n_a_t = 2500
n_a_t = 2854 n_occ = 100 n_occ = 100
70
100 60
P5.html
P2.gif n_occ = 20
P3.jpg
P4.html n_occ = 100
n_occ = 100
n_a_t = 1700
n_occ = 60
n_a_t = 2850
Figure 4.7 Precedence Graph before Trimming
Figure 4.7 represents the sample Precedence Graph considered for
demonstrating the trimming operation. Each primary node in the graph is
represented with information such as: object URI, node occurrence count and
current access time. Each secondary node is represented with information such
as: object URI and node occurrence count. Primary and secondary arcs
represented with its occurrence count. Access time of secondary objects will be
same as that of primary object, because they will be accessed in most cases
whenever primary object is requested by user.

102
Let us consider that the time counter (T_C) reached its threshold limit;
i.e. T_C = T_Th (T_C = 3600). The graph is now subjected to trimming
operation.
The trimming done on the graph is as follows:
1. The analysis starts with node containing P1.html.
n_occ > n_occ_th // node count greater than threshold
n_a_t = T_C // node access time equal to time counter
P1.html will be retained in the graph. Now analyze its primary and
secondary arcs for further action.
2. P1.html has two secondary arcs leading to P1.gif and P1.jpg. In both
the nodes,
arc_occ > arc_th // arc count greater than threshold
P1.gif and P1.jpg will be retained in the graph.
3. Consider primary arc from P1.html to P3.html.
arc_occ < arc_th // arc count less than threshold
Arc will be removed from the graph.
For the node P3.html, its
(T_C – n_a_t) < Time_Diff // difference in time less than Threshold
P3.html will be retained in the graph.

103
arc_occ < arc_th // arc count less than threshold
Arc will be removed from the graph.
n_occ < n_occ_th // node count less than threshold
(T_C – n_a_t) > Time_Diff // time difference greater than threshold
P5.html will be removed from the graph.
arc_occ > arc_th // arc count greater than threshold
Arc will be retained in the graph.
(T_C – n_a_t) < Time_Diff // difference in time less than Threshold
P2.html will be retained in the graph.
After the trimming operation is completed, the access time (n_a_t) of
all the nodes will be initialized to 0. The node and arc occurrence count reduced
to 10% of its original count value, so that in the next interval an accurate analysis
is carried out.
Figure 4.8 represents the Precedence Graph after trimming operation is
performed with access time of nodes set to 0 and the count values reduced to
10% of its original value.

104
P1.html
3 n_occ = 10 P3.html
n_a_t = 0 n_occ = 7
n_a_t = 0
10 10
P2.html 7
n_occ = 10
P1.gif P1.jpg
n_a_t = 0 n_occ = 10 n_occ = 10 P3.jpg
n_occ = 10
10 6
P2.gif
n_occ = 10 P4.html
n_occ = 6
n_a_t = 0
Figure 4.8 Precedence Graph after Trimming
4.4 EXPERIMENTAL ENVIRONMENT
This section discusses the experiments conducted for evaluating the
proposed algorithm and the workload characteristics used for building the graph.
4.4.1 Experimental Setup
The experimental setup comprises of web server with prediction
engine and client with prefetching engine. Web server builds Precedence Graph
using user access patterns and then generates predictions for user requests. Client
receives predictions from web server and uses it to download web objects during
105
browser idle time. To simulate group of users accessing the web server for
information, real web traces are fed to client that uses prefetching enabled web
browser. The time interval between two successive web requests computed using
timestamp value recorded in the log file to mimic actual client behavior. Each
user request and its response are recorded in a log file during simulation. The log
file will be analyzed after completion of the simulation to compute the
performance metrics (Precision and Recall) of the system. Prediction algorithm
constantly learns the user’s access patterns during experiments, thus guaranteeing
that the knowledge of algorithm gets updated whenever patterns change.
4.4.2 Training Data
The training data is crucial in correctly predicting the user requests,
since most of the web prefetching techniques use part of user access sequence in
constructing the prediction model before using it to generate the predictions. If
training data has few access requests, then relevant user requests will be missed
resulting in poor representation of browsing characteristics. If training data
includes excessive user accesses, then it may contain some outdated user access
patterns and browsing information.
Web access log files record users’ access patterns during website
navigation. The log files can be maintained at client, server or proxy in the web
architecture. Several research initiatives used log files maintained at web server
as its main data source for experimentation.

106
The log files for experimentation are collected from our institutional
web server that is maintained to provide academic related information such as
news articles, admission details, course details, examination details etc. to the
faculty members and students. Most web pages maintained in the server are static
and it has only minimal percentage of dynamic pages.
The log file includes information such as: requested URLs, request
time, object type, identifier assigned to IP address of user requesting the URL,
elapsed time for serving the request. Table 4.4 lists the important fields in a log
file entry with a description about its function.
Table 4.4 Important fields in a log file entry
Field Description
192.168.10.1 Client IP address that made the request
10/Oct/2011:10:12:45 Timestamp of visit as seen by the server
GET Request method
/logo.gif Requested object
HTTP/1.1 Protocol used for request and response
200 HTTP response code - OK
1345 Bytes transferred for the request
http://www.psgtech.edu/ Referrer URL – Source page from where

request was sent
107
4.4.2.1 Preprocessing Log Files
Web logs are preprocessed to reformat them for effectively identifying
web access sessions and use the information to build Precedence Graph for
generating the predictions. First task in preprocessing is to perform data cleaning
by removing redundant and useless entries from web log file and retain only
valid entries related to the visited web pages.
Entries that are removed from log file during data cleaning operation
are:
· Requests executed by automated programs such as web robots,
spiders and crawlers
· Requests with unsuccessful HTTP status codes
· Request methods other than “GET” method.
The second task in preprocessing is to perform session identification
by segmenting long sequence of web requests into individual user access
sessions. Each user session consists of sequence of web pages visited over a
period of time. When a user remains idle for more than 30 minutes without
making any request, then the next request from same user is considered as the
start of new access session.
4.5 RESULTS
This section discusses the performance of Precedence Graph in
generating predictions for the user requests by measuring Recall and Precision
108
metrics that quantifies the efficiency and usefulness of the generated predictions.
When the contents of prefetch cache are used to satisfy the user requests, then it
indicates prefetch hit else it is prefetch miss. The prediction algorithm needs to
generate useful hints for each web request to achieve high hit rate and reduction
in user access latency.
Recall (hit rate) represents the ratio of prefetch hits to the total number
of user requests, and it measures the usefulness of predictions. Precision
(accuracy) represents the ratio of prefetched pages requested by the user from
prefetch cache to the total prefetched pages.
Prefetch Hits Prefetch Hits

Recall = Precision =
User Requests Total Prefetchs
The Recall and Precision metrics are measured by varying the
prediction threshold that controls the number of predictions generated for the
user requests.
Figure 4.9 represents the Recall achieved for user requests with
different prediction threshold (0.2 to 0.5). When threshold is 0.5, the number of
links recommended as hints from the graph will be minimal resulting in medium
number of user requests being satisfied using the contents of cache. For threshold
of 0.2 the Recall achieved is very high, because the graph will generate more
predictions that allow it to satisfy more number of user requests.

109
1
0.9
0.8
0.7 Th=0.5
Recall 0.6
Th=0.4
0.5
0.4 Th=0.3
0.3 Th=0.2
0.2
0.1
0
5000 10000 15000 20000
No. of User Requests
Figure 4.9 Recall for user requests with different Thresholds
A smaller threshold value achieves higher hit rate but increases the
bandwidth consumption by prefetching more objects. If there is adequate
availability of bandwidth and hardware resources, smaller threshold value can be
used to achieve higher hit rate if waiting time is a critical requirement for users.
Figure 4.10 represents the Precision achieved for user requests with
different prediction threshold (0.2 to 0.5). When threshold is 0.2, it allows
predicting and prefetching of more objects to satisfy the user requests. But in real
situations, some of the prefetched objects remain unused by users resulting in
low precision. A balanced performance is achieved with threshold of 0.5, which
allows generation of accurate and moderate number of predictions per request.

110
0.8
Th=0.5
Precision
0.6 Th=0.4
0.4 Th=0.3
Th=0.2
0.2
0
5000 10000 15000 20000
Figure 4.10 Precision for user requests with different Thresholds
The performance of Precedence Graph is compared with existing
algorithms such as DG - Dependency Graph (Padmanabhan and Mogul 1996)
and DDG- Double Dependency Graph (Domenech et al 2006). Precedence Graph
(PG) provides performance similar to that of DG and DDG with less
computational requirements. Resource requirement for the graph will be based
on the number of nodes and arcs that constitute the graph. For each object
requested by the user, a node is created and added to the graph to represent that
object in all the algorithms (PG, DG and DDG). The difference now lies in the
number of arcs added between the nodes in the graph.
Figure 4.11 represents the number of arcs built in a graph by different
algorithms based on the user requests. It clearly indicates that the Precedence
Graph (PG) has less number of arcs compared to the existing methods (DG and
111
DDG). In case of PG, it adds an arc between the nodes based on the precedence
relation inferred from the request stored in the log file and it does not consider
the sequence of user requests. In case of DG and DDG, it adds an arc between
the nodes based on the sequence of user requests recorded in the log file.
15000
12000
No.of Arcs
9000 DG
DDG
6000 PG
3000
0
5000 10000 15000 20000
Figure 4.11 Number of Arcs in different Graphs
When implementing the Precedence Graph, only its primary nodes are
added as key values in the adjacency map. Secondary nodes of the graph are
added only as link elements to a particular key value in the map. It will avoid
wastage of key entries for secondary nodes that are not going to add any link
elements.
Predictions are generated from the graph by analyzing various arcs and
nodes associated with the node that represents the requested web object. The
time taken by an algorithm to generate predictions for a user request depends on

112
the total number of arcs in the graph. Precedence Graph is able to generate
predictions in quick time due to the fact that it has less number of arcs compared
to DG and DDG.
The results presented above are for the Precedence Graph that is built
normally using the user access patterns. It is not subjected to any trimming
operation, and the graph grew in size as it learns the user access patterns. When
trimming algorithm is applied over the graph, significant changes can be noticed
in the number of arcs and nodes the graph possesses after the operation. The size
of graph is significantly reduced after each trimming operation.
5000
4000
No.of Nodes
3000 PG
2000 PG + Trimming
1000
0
5000 10000 15000 20000
Figure 4.12 Number of Nodes in PG with/without Trimming
Figure 4.12 compares the number of nodes in the Precedence Graph
with and without trimming operation. As shown in the graph, with trimming
operation the number of nodes is very minimal when compared to the graph
113
without trimming. Similarly, Figure 4.13 indicates the reduction in number of
arcs when trimming is applied to the graph.
5000
4000
No.of Arcs
3000 PG
2000 PG + Trimming
1000
0
5000 10000 15000 20000
Figure 4.13 Number of Arcs in PG with/without Trimming
4.6 CONCLUSION
This chapter discusses a prediction algorithm that builds Precedence
Graph by learning the user access patterns to predict user’s future requests. The
algorithm differentiates relationship between primary objects (HTML) and
secondary objects (e.g., images) by considering two types of arcs (primary and
secondary) when constructing the graph. Precedence Graph is built with fewer
arcs than the existing approaches (DG and DDG), since it considers only the
precedence relation for the user request rather than the user access sequences
recorded in the log file. The graph structure gets updated dynamically with new
114
nodes/arcs based on the user requests, which ensures that the predictions
generated will reflect the latest requirements of the user.
Experimental results indicate that the Precedence Graph achieved good
Recall and Precision with minimal resource consumption (i.e. usage of memory
and computational resources). To effectively control the growth in graph size, a
trimming algorithm is designed to periodically remove the unwanted nodes and
arcs from the graph. It helps Precedence Graph to learn new user access patterns
without worrying about the size of graph.

115
CHAPTER 5
CACHE REPLACEMENT SCHEME TO ENHANCE WEB
PREFETCHING
5.1 INTRODUCTION
Web caching and prefetching techniques provide effective solution to
enhance the response time of end users. The web objects are stored at locations
closer to end users for serving their requests with minimal delay. Web caching
exploits the temporal locality and prefetching exploits the spatial locality that is
inherent in the user access patterns of web objects. Web caches are categorized
into: client cache, proxy cache and server cache depending on the location where
they are deployed in the web architecture (Zeng et al 2004). Server cache also
referred to as reverse or inverse cache handles web documents of a single web
server and reduces its workload. Proxy caches that are often located near network
gateways allow several users to share the resources and reduce bandwidth
required over expensive dedicated internet connections. Client cache also
referred to as browser cache are located close to the web users and provide short
response time if the requested object is available in cache. It enhances the web
access performance and is economical to manage due to its close proximity to
end users.
116
Web prefetching decides when and what web objects to be fetched
from web server during its operation. Two approaches for prefetching the objects
from server are: a) online approach - It fetches web objects during short pauses
that occur when user reads displayed page on screen b) offline approach – It
fetches web objects during off-peak periods or when user remains idle for certain
time period. When aggressive prefetching is employed it can create cache
pollution by replacing useful data with prefetched data in the cache. Similarly, if
web objects stored in the cache as part of web caching are not accessed
frequently, then it creates cache pollution that negatively affects system
performance. For effectively utilizing the limited cache capacity and to avoid
cache pollution, replacement algorithms are designed to manage the contents in
cache by effectively selecting the objects to be evicted from cache for storing
new objects. Cache replacement schemes need to implement algorithms that
don’t use complicated data structures to provide effective performance.
Several research work in recent years have applied intelligent
techniques such as back-propagation neural network (Cobb and ElAarag 2008),
fuzzy systems (Ali and Shamsuddin 2009) and evolutionary algorithms
(Sulaiman et al 2008, Ali et al 2011) to implement cache replacement schemes
in web caching and prefetching environment. The techniques reported in these
works indicate that the replacement activity based on intelligent approaches are
more efficient and adaptive to web caching environment compared to classical
replacement approaches (e.g. LRU, LFU).

117
This chapter discusses an efficient cache replacement scheme for
managing the client-side cache that is partitioned into regular and prefetch cache
for handling web caching and prefetching. Regular cache stores web objects
received from the following sources: a) objects that are demand requested by the
users and b) frequently accessed objects in prefetch cache that are transferred to
regular cache. Prefetch cache stores web objects downloaded based on the
predictions generated as part of web prefetching. The contents of regular cache
are managed using the replacement algorithm based on Fuzzy Inference System
(FIS). LRU algorithm is used to manage the contents of prefetch cache. The
proposed scheme is designed such that it retains the useful web objects for longer
time duration and removes the unwanted objects from cache for efficient
performance. Integration of prefetching in to the client cache system improves
the hit ratio because the prefetched objects are stored in prefetching cache
maintained independently from regular cache.
5.2 CACHE REPLACEMENT - OVERVIEW
Cache replacement algorithms are designed to effectively decide the
web objects to be evicted from cache for satisfying the following aspects:
· Effective utilization of available cache space
· Improving hit ratio
· Reducing network traffic
· Minimizing the load on origin web server

118
Replacement algorithm will compute priority of web objects stored in
the cache to select web objects to be evicted from cache. Factors considered for
computing the priority of web objects are: popularity (frequency), recency, object
size, popularity consistency, access latency (delay) and object type (html/text,
image/video, application). Web access latency (delay) represents the time
interval between sending the user request and receiving the last byte of requested
content as response. Recency represents the time when object was last referenced
and it reflects the temporal locality that exists in user access patterns. Web
objects are selected for eviction from the cache such that it has the lowest access
demand in the near future. Replacement policy is applied whenever cache
reaches its maximum limit or to evict objects that are not used for long duration.
Combining several factors to influence the replacement process in
deciding the web objects to be removed from cache is not an easy task as each
factor has its own significance in different situations. Locality of reference
characterizes the ability to predict future accesses to web objects based on the
past accesses to objects. Two main types of locality are: Temporal and Spatial.
Temporal locality indicates that recently accessed objects are likely to be
accessed again in the future. Spatial locality indicates that accesses to certain
objects can be used as a reference to predict future accesses to other objects.
Each web object is identified using different characteristics and among
them URL is the unique characteristic to identify the object. Most replacement
strategies use a combination of these characteristics to make their decisions.

119
Important characteristics of web objects are (Podlipnig et al 2003):
· Recency - Time when object was last requested
· Frequency - Number of requests to the object
· Size - Size of web object in bytes
· Cost - Cost involved in fetching object from origin
server
· Request value - Benefit gained from storing the object in
cache
· Expiration time - Time to Live (TTL) of the object
Factors such as object size, object type and access latency are static
and they are determined only once when the object is initially requested by the
user. Factors such as frequency, recency and popularity consistency are dynamic
and they are computed frequently till the object resides in cache.
Podlipnig et al (2003) categorized cache replacement algorithms as:
· Frequency based
Frequency of objects based on its popularity was analyzed and
used as the deciding factor for future actions on the web
objects.
· Recency based
It exploits the temporal locality seen in web requests patterns
and recency was used as the main deciding factor in selecting
the objects to be removed from cache.

120
· Frequency / Recency based
It combines both recency and frequency factors in making

decisions on the web objects stored in cache.
· Function based
It uses a general function to compute the value of an object
based on which the decisions are taken.
· Randomized
The objects are randomly selected for removal from the cache.
Temporal locality and document popularity influence the web request
sequences. Object size and cost of fetching an object from server, along with
temporal locality and long term popularity plays significant role in performance
of cache replacement schemes.
5.3 FUZZY INFERENCE SYSTEM
Fuzzy Inference System (FIS) shown in Figure 5.1 is a popular
computing framework based on the concept of fuzzy set theory, fuzzy if-then
rules and fuzzy reasoning. Fuzzy Inference is the process of formulating the
mapping from a given input to an output using fuzzy logic. The mapping
provides a basis from which decisions can be made or patterns discerned. FIS has
good function approximation capability that is reflected in various problems such
as control, modeling and classification. Fuzzification transforms the crisp input
into degree of match with linguistic values. Knowledge base comprises of two
components: Rule base and Database. The rule base contains various fuzzy if-
121
then rules and database defines the membership functions of fuzzy sets used in
fuzzy rules. Inference engine is responsible for making decision operation on
rules. Defuzzification transforms the fuzzy results into crisp output.
Knowledge Base
Database Rule base
Input Output
Fuzzification Defuzzification
Fuzzy Set Fuzzy Set

Inference Engine
Figure 5.1 Framework of Fuzzy Inference System
The time complexity of FIS system depends on the number of rules it
considers to make decision. Fewer rules in the database will result in better
system performance.
5.3.1 Membership Function
It provides a measure of the degree of similarity of an element to a
fuzzy set. It can be chosen either arbitrarily by the user based on his experience
or designed using machine learning methods. Different shapes of membership
function that exists are: triangular, trapezoidal, piecewise-linear, Gaussian and
bell-shaped.
122
· Gaussian Membership Function
- ( x -c ) 2
f ( x;s , c) = e 2s 2
It depends on two parameters σ and c.
· Trapezoidal Membership Function
ì 0, x£a ü
ïx-a ï
ï , a £ x £ bï
ïb - a ï
f ( x; a, b, c, d ) = í 1, b £ x £ cý
ïd - x ï
ïd - c , c £ x £ dï
ï 0, d £ x ïþ
î
It depends on four scalar parameters: a, b, c and d.
· Bell Function
1
f ( x; a, b, c) = 2b
x-c
1+
a
It depends on three parameters: a, b and c. The parameter ‘b’ is usually
positive. Parameter ‘c’ is used to locate the center of curve.

123
· Triangular Function
ì 0, x£a ü
ïx - a ï
ïï b - a , a £ x £ b ïï
f ( x; a, b, c) = í ý
c-x
ï , b £ x £ cï
ïc -b ï
ïî 0, c £ x ïþ
It is a function of vector ‘x’ and depends on three parameters: a, b, c.
Each membership function is assigned a linguistic term and it will map
the input parameters to the membership value in the range 0 to 1. The input space
is sometimes referred to as Universe of Discourse (Z).
Figure 5.2 Membership Functions for Recency

124
Consider the system takes four input parameters: Recency, Frequency,
Delay Time and Object Size. Each input parameter is associated with three
membership functions: {low, medium and high} that will map the input values to
the associated fuzzy sets in the degree of 0 to 1.
Figure 5.2 represents the Recency input being mapped to its
membership functions {low, medium and high} and it is illustrated using a bell
curve. Figure 5.3 represents the Frequency input being mapped to its
membership functions {low, medium and high}.
Figure 5.3 Membership Functions for Frequency
The input parameter Delay Time is mapped to its membership
functions {low, medium and high} as shown in Figure 5.4.

125
Figure 5.4 Membership Functions for Delay Time
Figure 5.5 represents the mapping of input ‘Object Size’ to its
membership functions {small, medium and large}.
Figure 5.5 Membership Functions for Object Size

126
5.3.2 Fuzzy Rules
The rules are linguistic IF-THEN statements that constitute a key
aspect in the performance of fuzzy inference system. It describes the relationship
between input and output values. IF part is called as “antecedent” and THEN part
is called as “consequent”.
Example:
IF {Frequency is low} THEN {Removal is high}
Antecedent Consequent
{Frequency, Removal} are linguistic variables.
{low, high} are linguistic terms that correspond to membership
function.
If antecedent of a rule has more than one part, then fuzzy operator
(AND) is applied to obtain a single value that represents the antecedent result for
that rule. The consequent is a fuzzy set represented by a membership function
and it can be reshaped using a function associated with the antecedent.
Decisions are taken by testing all the rules in a Fuzzy Inference
System, so the rules must be combined in order to generate the final output.
Aggregation is the process by which the fuzzy sets that represent the output of
each rule are combined into a single fuzzy set. It occurs only once for each
output variable before performing defuzzification.

127
5.3.3 Defuzzification
It takes the aggregated output fuzzy set as input and produces a single
output value (crisp data).
Commonly used methods for defuzzification as shown in Figure 5.6
are:
§ Centroid of Area (COA)
It is the most commonly used technique and is considered to
be more accurate. It returns the center of area under the curve.
òm A ( z ) zdz
zCOA = z
òm z
A ( z )dz
mA(z) is the aggregated output membership function.
§ Bisector of Area (BOA)
It will divide the region into two sub-regions of equal area
and it sometimes coincides with the centroid line.
Z BOA b
òm
a
A ( z )dz = ò m A ( z )dz
z BOA
where a = min {z; z ÎZ} and b = max {z; z ÎZ}.

128
§ Mean of Maximum (MOM)
ò zdz
z MOM = z'
,
ò dz
z'
where Z' = {z; mA(z) = m*}
z1 + z 2
if max m A ( z ) = [ z1 , z2 ] then z MOM =
2
§ Smallest of Maximum (SOM)
Amongst all z that belong to [z1, z2], the smallest is called zSOM
§ Largest of Maximum (LOM)
Amongst all z that belong to [z1, z2], the largest value is called
zLOM
Figure 5.6 Methods to perform Defuzzification

129
5.4 PROPOSED FRAMEWORK
The framework shown in Figure 5.7 manages the client-side cache by
partitioning them into two parts: regular cache and prefetch cache. Each part of
the cache has its own storage space and is managed independently using separate
replacement policy. Regular cache is managed using Fuzzy Inference System
(FIS) algorithm and prefetch cache is managed using LRU algorithm.
Purge Objects
– FIS Used
Client
Request
Regular Interaction
User
Browser
Cache
Web Response
Server Interaction
Prefetch User
Prefetch
Cache
Response
Purge Objects
– LRU Used
Figure 5.7 Framework for managing regular/prefetch cache
Regular cache stores web objects that are demand requested by users
and frequently accessed objects that are transferred from prefetch cache. Prefetch
cache stores web objects that are downloaded from server using the predictions
generated as part of web prefetching. When user frequently accesses the objects
130
stored in prefetch cache, they are moved to regular cache to ensure that the
popular objects reside in cache for longer time duration. The scheme effectively
removes the useless objects to alleviate cache pollution and maximize the hit
ratio.
When users’ requests are satisfied using the contents of either regular
or prefetch cache, then it indicates cache hit and the requests are not forwarded
to the web server. In case of cache miss, the requests are forwarded to server for
acquiring the required data.
When server receives the user request, it performs the following tasks:
· Records the details of user request in access log file
· Fetch the requested object and generate predictions for the
request
· Sends the requested object and its predictions to client.
Server analyzes the user requests stored in access log file to generate
the predictions and deliver it to client. Client on receiving the requested object
along with list of predictions from server performs the following tasks:
· Received web object stored in regular cache and displayed to user
· Prefetch web objects based on the prediction list during browser
idle time and store in prefetch cache.

131
Client Request
No Content
Cacheable
Yes
Content No Regular No
Prefetched Cache Full
Yes Yes
Purge objects
No Prefetch (FIS used)
Cache Full
Yes Store new

object in cache
Purge objects
(LRU used)
Store new
object in cache
Object Yes
frequently
accessed
No
Object available
to client access
Figure 5.8 Workflow of caching/prefetching system

132
The client can also take the responsibility of generating the predictions
on its own and use them to prefetch web objects from server. These downloaded
objects are then stored in prefetch cache. The web objects received based on
demand requests are then stored in regular cache.
The workflow of caching system that also incorporates prefetching
mechanism is illustrated in Figure 5.8. When a web object needs to be stored in
the cache, the caching system first verifies if the object is cacheable or not. If it is
cacheable, then it verifies whether it is a prefetched or demand requested object.
In case of prefetched object, it will be stored in prefetch cache by verifying
whether it is full or it has space to store the object. LRU algorithm is used to
purge objects from the prefetch cache. When objects residing in prefetch cache
are accessed frequently within a short time period, then they are moved to regular
cache. The regular cache is verified whether it can accommodate the objects
coming from prefetch cache or objects demand requested by user. When regular
cache is full, then objects are purged based on the outcome of FIS algorithm. The
objects stored in regular and prefetch cache is used to satisfy the client requests
with minimal latency. In case if the object is not cacheable, then they are
delivered directly to the client and displayed on the web browser.
The commonly used factors to determine the popularity of web objects
are: frequency, recency and object size. Object popularity is a good estimator for
verifying the cacheability of documents, since the objects that are more popular
133
have high probability of being referenced again by the user in near future
resulting in increased cache hit rate.
The requests are considered as cacheable based on the following factors:
· It must have a defined size in bytes that should be greater than
zero.
· It must use GET or HEAD method and the status code should be
200 (OK), 206 (Partial Content), 304 (Not Modified).
Dynamic requests are not cached, since they return unique objects
every time they are accessed by the user.
5.4.1 Fuzzy System - Input / Output
The input parameters to Fuzzy Inference System are labeled as {IP1 to
IP4} and the target output labeled as {OT}.
Table 5.1 Input Parameters to FIS
Variable Meaning
IP1 Recency of Web object
IP2 Frequency of Web object
IP3 Retrieval time of Web object
IP4 Size of Web object
Frequency and Recency for the objects are estimated based on the
sliding window mechanism discussed in (Romano and ElAarag 2011). Sliding
window of a request represents the time before and after the request is made.
134
Table 5.2 Symbols used with their meanings
Symbol Meaning
Oi requested object
∆Ti time period since object Oi was last requested
Fi Frequency of object Oi within sliding window
SWL Sliding Window length
OT Target Output
Recency (IP1) of object Oi computed as:
max (SWL, ∆Ti) if Oi was requested before

recency (Oi) =
SWL if Oi requested for the first time
If an object is requested for the first time, then its recency will be fixed
as SWL; else it will have the maximum value among SWL and ∆Ti.
Frequency (IP2) of object Oi computed as:
Fi +1 if ∆Ti £ SWL
frequency (Oi) =
Fi =1 if Oi accessed beyond SWL
Frequency of object (Oi) is incremented by 1 with respect to the
previous frequency value, if the request for Oi is within backward-looking SWL;
i.e. the time interval between the previous request and the new request is within
the bounds of backward-looking SWL. Else, the frequency value will be
reinitialized to 1.
135
Target output (OT) will be set to 1, if the object is re-requested again
within the forward looking sliding window; else, OT will be 0. The objective is to
use the information of web object requested in the past to predict its revisit in the
forward looking sliding window.
5.4.2 Managing Regular Cache
When a user requested object or an object transferred from prefetch
cache need to be stored in regular cache, it checks whether there is sufficient
storage space to accommodate the object. If storage space is available, then
object will be stored in the cache. Else, it decides to evict objects based on the
outcome generated by Fuzzy Inference System (FIS) algorithm. FIS takes the
input parameters of object and decides whether it can reside in the cache or it
should be purged. Input parameters are fuzzified by applying the bell
membership function and the aggregated output is defuzzified using the centroid
of area method.
When the object has high recency and frequency, then it has good
chance of residing in the cache. If the outcome from FIS has a value greater than
0.5, then it indicates that the object can reside in cache; else it can be purged. The
algorithm used for managing the contents of regular cache is as follows:

136
OP= object in prefetch cache

OR= object in regular cache
ON = New object
Begin
1. Object to be stored in regular cache.
2. Check if it is ON or OP
3. If (ON ) go to step 5
4. If (Op reference ≥ 2) in short duration, Op moved to regular cache.
5. If (size of ON / OP > available free space in regular cache) {
For each object OR in regular cache {
// apply FIS to find the popularity of object
If (popularity of OR ≥ 0.5)
OR.cache =1; // object resides in cache
else
OR.cache =0; // purge the object
}
Do {
Remove objects with OR.cache =0 from regular cache
} while (size of ON / OP > free space in regular cache)
6. Store ON / OP in regular cache;
Remove OP from prefetch cache;
End
137
5.5 IMPLEMENTATION
It discusses the training data used for simulation and the process
involved in extracting useful information to be given as input to the Fuzzy
Inference System.
5.5.1 Training Data
The BU Web trace containing the records of HTTP requests from
clients in Boston University Computer Science Department (BU Web Trace,
1995) is used for the simulation. The data collection consists of 9633 files
comprising 1,143,839 requests representing a population of about 762 different
users. The trace files contain sequence of web object requests that is served either
from local cache or from the network.
Each line in the log file represents unique URL requested by the user.
It consists of machine name, the timestamp when the request was made, User_ID
number, requested URL, size of the document and the object retrieval time in
seconds. If a log entry indicates the number of bytes as zero and the retrieval
delay as zero, then it indicates that the request was satisfied using the contents of
internal cache.
From the collection of requests representing a large number of users,
we randomly select the traces of 15 different users to be used in the simulation.

138
beaker 791129602 449542 "http://cs-www.bu.edu/" 2009 1.135154

beaker 791129603 747730 "http://cs-www.bu.edu/lib/pics/bu-logo.gif" 1805 0.301080
beaker 791129604 445737 "http://cs-www.bu.edu/lib/pics/bu-label.gif" 717 0.356171
beaker 791129611 867528 "http://cs-www.bu.edu/courses/Home.html" 3279 0.295710
beaker 791129660 367234 "http://cs-www.bu.edu/faculty/mcchen/cs320/" 721 0.470706
beaker 791129660 928492 "http://cs-www.bu.edu/icons/blank.xbm" 696 0.305217
beaker 791129661 485690 "http://cs-www.bu.edu/icons/back.xbm" 694 0.477853
beaker 791129662 205927 "http://cs-www.bu.edu/icons/text.xbm" 715 0.287997
beaker 791497224 96312 "http://cs-www.bu.edu/" 2087 0.774428
beaker 791497225 226976 "http://cs-www.bu.edu/lib/pics/bu-logo.gif" 1803 0.290951
beaker 791497225 996915 "http://cs-www.bu.edu/lib/pics/bu-label.gif" 715 0.357485
beaker 791497229 937950 "http://cs-www.bu.edu/faculty/Home.html" 1700 0.408853
beaker 791497232 451959 "http://cs-www.bu.edu/faculty/heddaya/Home.html" 1576 0.738620
beaker 791497233 294701

"http://cs-www.bu.edu/faculty/heddaya/Images/MyPhotos/recursive.gif" 13851 0.443401
beaker 791497243 48300 "http://cs-www.bu.edu/faculty/heddaya/navigation.html" 7131

0.322925
beaker 791497257 357990 "http://nearnet.gnn.com/gnn/arcade/comix/graphics/Dilbert.gif"

9893 4.110718
Figure 5.9 Sample Log File of a client used for Preprocessing
5.5.2 Data Preprocessing
The log files to be used for simulation undergo preprocessing to
extract useful information that reflects user navigational behavior. Figure 5.9
represents the sample log file that is used for preprocessing operation. The
processed file with valid information is then used for simulation.

139
Steps involved in preprocessing are:
· Parse the log file to identify distinct fields in each record entry and
to track the boundaries between successive records stored in the
file.
· Assign unique identifier (URL_ID) to each URL that helps to track
the events easily during simulation.
· Extract the useful fields from each line in the log file to be used for
analysis.
The output file generated after preprocessing the log file contains the
following fields for each request entry:
· Requested URL
· Unique ID assigned to each URL (URL_ID)
· Timestamp of the request
· Delay time
· Size of the requested object
Table 5.3 represents the sample preprocessed data created from the log
file that will be used to obtain the training data to be given as input to the Fuzzy
Inference System.
140
Table 5.3 Preprocessed data from the log file
URL URL_ID Timestamp Delay Size

Time (ms) (bytes)
http://cs-www.bu.edu/ 1 791129602 1135 2009
http://cs-www.bu.edu/ 1 791129673 367 2009
http://www.wired.com/ 2 791129783 837 941
http://www.wired.com/Images/spacer.gif 3 791129785 503 277
http://cs-www.bu.edu/ 1 791497224 774 2087
http://cs-www.bu.edu/pics/bu-logo.gif 4 791497225 290 1803
http://cs-www.bu.edu/pics/bu-label.gif 5 791497225 357 715
http://cs-www.bu.edu/faculty/home.html 6 791497229 408 1700
http://cs- 7 791497735 969 7966

www.bu.edu/faculty/best/BestWeb.html
http://cs-www.bu.edu/pics/bu-logo.gif 4 791497790 208 1803
http://cs-www.bu.edu/pics/bu-label.gif 5 791497825 317 715
http://www.wired.com/ 2 791429783 737 941
The information shown in Table 5.3 is further processed to create the
training data as shown in Table 5.4. The recency and frequency values are
assigned based on the sliding window mechanism discussed in section 5.4.1.
Time period for sliding window length (SWL) in both the forward and backward
scenario is taken as 20 minutes to simulate the user browsing patterns. Since the
141
user tends to change browsing patterns often and they may have short browsing
sessions, we fix SWL to be 20 minutes (i.e. 1200sec) for the simulation.
Table 5.4 Training data created from preprocessed file
Inputs Target
Recency Frequency Retrieval Size
Time (ms) (bytes)
1200 1 1135 2009 1

1200 2 367 2009 0
1200 1 837 941 0
1200 1 503 277 0
3675.5 1 774 2087 0
1200 1 290 1803 1
1200 1 357 715 1
1200 1 408 1700 0
1200 1 969 7966 0
1200 2 208 1803 0
1200 2 317 715 0
3000 1 737 941 0
When an object is requested for the first time or its re-request is within
the SWL length, then its recency will be set as 1200. If the time difference
between the new request and the previous request to an object is greater than the
SWL, then its recency will be the value representing the time difference.
142
The frequency for the object will be set to 1, if it is requested for the
first time. If object is re-requested within the SWL length, then its frequency is
incremented by 1; else, the frequency of object will be re-initialized to 1
irrespective of its previous value. The target output will be set to 1, if an object
has future reference within the forward SWL length; else its value is set to 0.
5.6 PERFORMANCE EVALUATION
Trace driven simulations are used to evaluate the performance of cache
replacement policies. The storage space allocated for cache in the client machine
will be distributed equally between the regular and prefetch cache; i.e. 50% of
the total capacity allocated to regular cache and remaining 50% to the prefetch
cache. To simulate the prefetching of objects, client based prediction and
prefetching discussed in chapter 3 is used. When user requested web page is
displayed on the browser, predictions are made for that page and objects are
prefetched and stored in prefetch cache. If the objects in prefetch cache are
requested frequently, then they are moved to the regular cache.
5.6.1 Performance Metrics
The effectiveness of replacement algorithm in improving the
performance of web caching and prefetching is evaluated using the metrics: Hit
Rate (HR) and Byte Hit Rate (BHR). HR represents the percentage of user
requests served using the objects available in cache. It characterizes
improvement in availability and minimization of user latency. BHR represents

143
the percentage of bytes served from cache against the total number of bytes
requested by users. It characterizes the reduction in network traffic and easing of
link congestion. Increase in HR significantly contributes to the improvement in
latency savings (Zhu and Hu 2007, Shi et al 2006).
Important point to note is that the Hit Rate and Byte Hit Rate cannot
be optimized for at the same time (podlipnig 2003). Strategies that optimize Hit
Rate give preference to smaller sized objects, which tend to decrease the Byte Hit
Rate by giving less preference to larger objects.
5.6.2 Experimental Results
The performance of proposed scheme (FIS-LRU) compared with most
common replacement policies: LRU and LFU in terms of HR and BHR. In LRU,
least recently used objects are removed first and it is a simple and efficient
scheme for uniform sized objects. In LFU, least frequently used objects are
removed first and its advantage is its simplicity. It is also compared with
NNPCR-2 (Romano and ElAarag 2011) an intelligent web caching approach that
uses Back-Propagation Neural Network (BPNN) in making replacement
decisions.
The algorithms are simulated by varying the cache size from 10MB to
100MB. Log files of 15 different users are split into three groups: user (1 to 5) in
Group A, user (6 to 10) in Group B, user (11 to 15) in Group C. HR and BHR
144
for each group are evaluated separately to analyze the behavior of algorithms that
uses the traces of different set of users in each group.
Figure 5.10 represents the hit rate of different polices using the traces
of Group-A (user 1 to 5). Figure 5.11 represents the hit rate using traces of
Group-B (user 6 to 10). Figure 5.12 represents hit rate using traces of Group-C
(user 11 to 15).
Group A
70
65
Hit Ratio (%)
LRU
60
LFU
55
NNPCR-2
50
FIS-LRU
45
40
10 20 40 60 80 100
Cache Size (MB)
Figure 5.10 Hit Ratio using Traces of Group-A (user 1 to 5)
When cache size increases, it improves the HR for all the replacement
policies. It is due to the fact that the cache can store large number of web objects
to satisfy the user requests. As observed in the graphs of different traces, the HR
of proposed scheme (FIS-LRU) is better than the other approaches. LFU
produces the least HR due to cache pollution. The performance of FIS-LRU is

145
better than NNPCR-2 in most cases and in few the results match with that of
NNPCR-2.
Group-B
70
65
Hit Ratio (%)
LRU
60
LFU
55
NNPCR-2
50 FIS-LRU
45
40
10 20 40 60 80 100
Cache Size (MB)
Figure 5.11 Hit Ratio using Traces of Group-B (user 6 to 10)
Group-C
70
65
Hit Ratio (%)
60 LRU
LFU
55
NNPCR-2
50
FIS-LRU
45
40
10 20 40 60 80 100
Cache Size (MB)
Figure 5.12 Hit Ratio using Traces of Group-C (user 11 to 15)

146
Group-A
50
Byte Hit Ratio (%) 45

40 LRU
LFU
35
NNPCR-2
30 FIS-LRU
25
20
10 20 40 60 80 100
Cache Size (MB)
Figure 5.13 Byte Hit Ratio using Traces of Group-A (user 1 to 5)
Group-B
50
45
Byte Hit Ratio (%)
LRU
40
LFU
35
NNPCR-2
30 FIS-LRU
25
20
10 20 40 60 80 100
Cache Size (MB)
Figure 5.14 Byte Hit Ratio using Traces of Group-B (user 6 to 10)
147
Group-C
50
Byte Hit Ratio (%) 45
40 LRU
LFU
35
NNPCR-2
30
FIS-LRU
25
20
10 20 40 60 80 100
Cache Size (MB)
Figure 5.15 Byte Hit Ratio using Traces of Group-C (user 11 to 15)
Byte Hit Rate of different policies using the traces of Group-A is
shown in Figure 5.13, using the traces of Group-B is shown in Figure 5.14 and
using the traces of Group-C is shown in Figure 5.15. As observed in these graphs,
BHR of the proposed scheme (FIS-LRU) is better in all the cases when compared
to other replacement policies.
5.7 CONCLUSION
This chapter discusses a cache replacement scheme that efficiently
manages the client-side cache, which is partitioned into regular and prefetch
cache for handling the web caching and prefetching. The proposed scheme uses
Fuzzy Inference System (FIS) based algorithm for managing the contents of
regular cache and LRU algorithm for managing the contents of prefetch cache.
When objects stored in prefetch cache are frequently accessed by users, then they
148
are moved to regular cache where they are managed efficiently based on the
outcome of FIS algorithm. The scheme helps to retain useful objects for longer
time period while effectively removing the unwanted objects from the cache.
The performance of proposed scheme (FIS-LRU) in terms of HR and
BHR is compared with various algorithms (LRU, LFU and NNPCR-2), where
LRU and LFU are basic algorithms and NNPCR-2 is an intelligent algorithm
based on back-propagation neural network. HR and BHR for the proposed
scheme are computed by considering both the regular and prefetch cache.
Results clearly indicate that the proposed scheme (FIS-LRU) outperforms other
algorithms in terms of HR and BHR.

149
CHAPTER 6
CONCLUSION
Web caching and prefetching techniques have been designed and used
primarily to reduce the user perceived latency. Web prefetching employs
prediction techniques to accurately speculate the user future requests in advance
and download the objects before user actually demand requests them. It alleviates
the problems encountered in web caching. Several researchers over the years
have investigated various issues associated with web prefetching to provide
solutions for reducing the latency. It has been observed that a fast and accurate
prediction is crucial for improving the prefetching performance. Prefetching
techniques could prefetch large number of web objects, when there is increase in
bandwidth availability for the users.
Contributions made in this thesis are:
In first contribution, the focus is on using the information associated
with hyperlinks embedded in web pages to generate the predictions. Both
prediction and prefetching engine are deployed in the client machine and the
access patterns are observed when user views web pages in the browser. Two
new approaches (Naïve Bayes and Fuzzy Logic) have been proposed to generate
the predictions. When user views a webpage, the hyperlinks in that page are
150
prioritized based on the computations by Naïve Bayes and Fuzzy Logic.
Hyperlinks with high priority value forms part of the prediction list (hints) that is
used by the prefetching engine to download web objects during browser idle time.
User accessed repository is used to store information of hyperlinks used to
navigate the web pages. Predicted unused repository is used to store information
of unused hyperlinks and it provides feedback to the prediction engine to fine
tune its predictions. Both the approaches could generate effective predictions to
minimize the access latency. The approaches would be effective when user has
focused browsing patterns looking for information related to specific topic
instead of randomly viewing unrelated web pages.
The second contribution is focused on server based predictions, where
the user access patterns recorded in server log files is used to build Precedence
Graph based on which the predictions are generated. Precedence Graph
effectively records the predecessor and successor relationship between the user
requests. It has less number of arcs in the graph compared to the existing
algorithms (DG and DDG). Graph trimming has been employed to keep the size
of graph to manageable limits and to avoid useless information residing in the
graph. Server intimates the predictions (hints) to the client through HTTP
response headers that is easily recognized by the browser. During idle time,
client uses the predictions to download web objects and store them in cache to
serve the users with minimal latency.

151
Final contribution is focused on cache replacement scheme to
effectively manage the client side cache that is used to support both caching and
prefetching. The cache is partitioned into two parts: regular cache (for caching)
and prefetch cache (for prefetching). Regular cache is managed using Fuzzy
Inference System (FIS) based replacement algorithm and prefetch cache is
managed using LRU algorithm. The objective is to retain the useful objects for
longer time duration and effectively remove the unwanted objects from cache to
improve caching and prefetching. Benefits of prefetching could be fully utilized
when the frequently accessed prefetched objects are properly retained in the
cache for longer time period to maximize the hit rate.
6.1 SUGGESTIONS FOR FUTURE WORK
o To investigate the semantic characteristics of web pages based on
the new HTML standard, which could be utilized to build effective
prediction system at the client for providing better personalized
services to the individual users.
o Exploring the different possibilities of effectively collaborating the
user access patterns with web page content to build simple and
efficient data structures that can be used to generate predictions. It
could cater to dynamically generated web pages, multimedia
contents that are increasingly used in the current web.

152
o Designing cache replacement policies with other machine learning
techniques that could effectively cater to the demands of
prefetching.
o Designing a feedback system that could inform the prediction
engine located in any part of the web architecture about the
prefetched objects being accessed by the user to improve its
performance.
o Methods discussed in the thesis could be tested for Big Data to
check the performance and adaptability of the proposed approaches.
Further, theoretical evaluation could be performed and its outcome
compared with the experimental results.

163
LIST OF PUBLICATIONS
INTERNATIONAL JOURNALS
1. Venketesh.P and Venkatesan.R, “A Survey on Applications of Neural

Networks and Evolutionary Techniques in Web Caching”, IETE Technical
Review, Vol.26, Issue 3, pp.171-180, 2009. (Impact Factor: 0.724)
2. Venketesh.P, Venkatesan.R and Arunprakash.L, “Semantic Web

Prefetching Scheme using Naïve Bayes Classifier”, International Journal of
Computer Science and Applications, Vol. 7, No. 1, pp. 66 – 78, 2010. (SJR-
SCImago Journal Rank: 0.029)
3. Venketesh.P and Venkatesan.R, “Graph based Prediction Model to

Improve Web Prefetching”, International Journal of Computer Applications,
Vol.36, No.10, pp.37-43, 2011.
4. Venketesh.P and Venkatesan.R, “Adaptive Web Prefetching Scheme using

Link Anchor Information”, International Journal of Applied Information
Systems, Vol.2, No.1, pp.39-46, 2012.
5. Venketesh.P and Venkatesan.R, “Effective Web Cache Replacement

Scheme to Support Caching and Prefetching”, International Journal of Web
Science, Inderscience Publishers (Communicated).
Low Latency via Redundancy
Ashish Vulimiri P. Brighten Godfrey Radhika Mittal

UIUC UIUC UC Berkeley
vulimir1@illinois.edu pbg@illinois.edu radhika@eecs.berkeley.edu
Justine Sherry Sylvia Ratnasamy Scott Shenker

UC Berkeley UC Berkeley UC Berkeley and ICSI
justine@eecs.berkeley.edu sylvia@eecs.berkeley.edu shenker@icsi.berkeley.edu
ABSTRACT Achieving consistent low latency is challenging. Modern

Low latency is critical for interactive networked applications. applications are highly distributed, and likely to get more
But while we know how to scale systems to increase capacity, so as cloud computing separates users from their data and
reducing latency — especially the tail of the latency distribu- computation. Moreover, application-level operations often
tion — can be much more difficult. In this paper, we argue require tens or hundreds of tasks to complete — due to many
that the use of redundancy is an effective way to convert ex- objects comprising a single web page [25], or aggregation of
tra capacity into reduced latency. By initiating redundant many back-end queries to produce a front-end result [2, 14].
operations across diverse resources and using the first result This means individual tasks may have latency budgets on
which completes, redundancy improves a system’s latency the order of a few milliseconds or tens of milliseconds, and
even under exceptional conditions. We study the tradeoff the tail of the latency distribution is critical. Such outliers
with added system utilization, characterizing the situations are difficult to eliminate because they have many sources in
in which replicating all tasks reduces mean latency. We then complex systems; even in a well-provisioned system where
demonstrate empirically that replicating all operations can individual operations usually work, some amount of uncer-
result in significant mean and tail latency reduction in real- tainty is pervasive. Thus, latency is a difficult challenge for
world systems including DNS queries, database servers, and networked systems: How do we make the other side of the
packet forwarding within networks. world feel like it is right here, even under exceptional condi-
tions?
One powerful technique to reduce latency is redundancy:
Categories and Subject Descriptors Initiate an operation multiple times, using as diverse re-
C.2.0 [Computer-Communication Networks]: General sources as possible, and use the first result which completes.
Consider a host that queries multiple DNS servers in paral-
Keywords lel to resolve a name. The overall latency is the minimum
of the delays across each query, thus potentially reducing
Latency; Reliability; Performance both the mean and the tail of the latency distribution. For
example, a replicated DNS query could mask spikes in la-
1. INTRODUCTION tency due to a cache miss, network congestion, packet loss,
Low latency is important for humans. Even slightly higher a slow server, and so on. The power of this technique is
web page load times can significantly reduce visits from users that it reduces latency precisely under the most challenging
and revenue, as demonstrated by several sites [28]. For ex- conditions—when delays or failures are unpredictable—and
ample, injecting just 400 milliseconds of artificial delay into it does so without needing any information about what these
Google search results caused the delayed users to perform conditions might be.
0.74% fewer searches after 4-6 weeks [9]. A 500 millisecond Redundancy has been employed to reduce latency in sev-
delay in the Bing search engine reduced revenue per user by eral networked systems: notably, as a way to deal with
1.2%, or 4.3% with a 2-second delay [28]. Human-computer failures in DTNs [21], in a multi-homed web proxy over-
interaction studies similarly show that people react to small lay [5], and in limited cases in distributed job execution
differences in the delay of operations (see [17] and references frameworks [4, 15, 32].
therein). However, these systems are exceptions rather than the
rule. Redundant queries are typically eschewed, whether
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
across the Internet or within data centers. The reason is
for profit or commercial advantage and that copies bear this notice and the full citation rather obvious: duplicating every operation doubles system
on the first page. Copyrights for components of this work owned by others than the utilization, or increases usage fees for bandwidth and com-
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or putation. The default assumption in system design is that
republish, to post on servers or to redistribute to lists, requires prior specific permission doing less work is best.
and/or a fee. Request permissions from permissions@acm.org. But when exactly is that natural assumption valid? De-
CoNEXT’13, December 9-12, 2013, Santa Barbara, California, USA.
Copyright is held by the owner/author(s). Publication rights licensed to ACM.
spite the fact that redundancy is a fundamental technique
ACM 978-1-4503-2101-3/13/12 ...$15.00. that has been used in certain systems to reduce latency, the
http://dx.doi.org/10.1145/2535372.2535392.
conditions under which it is effective are not well understood and file sizes are small, replication provides substan-
— and we believe as a result, it is not widely used. tial latency reduction of up to 2× in the mean and
up to 8× in the tail. As predicted by our analysis,
In this paper, we argue that redundancy is an effective mean latency is reduced up to a server-side threshold
general technique to achieve low latency in networked sys- load of 30-40%. We also show that when retrieved
tems. Our results show that redundancy could be used much files become large or the database resides in memory,
more commonly than it is, and in many systems represents replication does not offer a benefit. This occurs across
a missed opportunity. both a web service database and the memcached in-
Making that argument requires an understanding of when memory database, and is consistent with our analysis:
replication improves latency and when it does not. Con- in both cases (large or in-memory files), the client-side
sider a system with a fixed set of servers, in which queries cost of replication becomes significant relative to the
are relatively inexpensive for clients to send. If a single client mean query latency.
duplicates its queries, its latency is likely to decrease, but it
also affects other users in the system to some degree. If all • In-network packet replication. We design a simple
clients duplicate every query, then every client has the ben- strategy for switches, to replicate the initial packets of
efit of receiving the faster of two responses (thus decreasing a flow but treat them as lower priority. This offers
mean latency) but system utilization has doubled (thus in- an alternate mechanism to limit the negative effect of
creasing mean latency). It is not immediately obvious under increased utilization, and simulations indicate it can
what conditions the former or latter effect dominates. yield up to a 38% median end-to-end latency reduction
Our first key contribution is to characterize when such for short flows.
global redundancy improves latency. We introduce a queue-
In summary, as system designers we typically build scal-
ing model of query replication, giving an analysis of the ex-
able systems by avoiding unnecessary work. The significance
pected response time as a function of system utilization and
of our results is to characterize a large class of cases in which
server-side service time distribution. Our analysis and ex-
duplicated work is a useful and elegant way to achieve ro-
tensive simulations demonstrate that assuming the client-
bustness to variable conditions and thus reduce latency.
side cost of replication is low, there is a server-side threshold
load below which replication always improves mean latency.
We give a crisp conjecture, with substantial evidence, that 2. SYSTEM VIEW
this threshold always lies between 25% and 50% utilization In this section we characterize the tradeoff between the
regardless of the service time distribution, and that it can benefit (fastest of multiple options) and the cost (doing more
approach 50% arbitrarily closely as variance in service time work) due to redundancy from the perspective of a system
increases. Our results indicate that redundancy should have designer optimizing a fixed set of resources. We analyze this
a net positive impact in a large class of systems, despite the tradeoff in an abstract queueing model (§2.1) and evaluate
extra load that it adds. it empirically in two applications: a disk-backed database
While our analysis only addresses mean latency, we believe (§2.2) and an in-memory cache (§2.3). We then discuss a
(and our experimental results below will demonstrate) that setting in which the cost of overhead can be eliminated:
redundancy improves both the mean and the tail. a data center network capable of deprioritizing redundant
Our second key contribution is to demonstrate multiple traffic (§2.4).
practical application scenarios in which replication empiri- §3 considers the scenario where the available resources are
cally provides substantial benefit, yet is not generally used provisioned according to payment, rather than static.
today. These scenarios, along with scenarios in which repli-
cation is not effective, corroborate the results of our analysis.
2.1 System view: Queueing analysis
More specifically: Two factors are at play in a system with redundancy.
Replication reduces latency by taking the faster of two (or
• DNS queries across the wide area. Querying mul- more) options to complete, but it also worsens latency by in-
tiple DNS servers reduces the fraction of responses creasing the overall utilization. In this section, we study the
later than 500 ms by 6.5×, while the fraction later interaction between these two factors in an abstract queue-
than 1.5 sec is reduced by 50×, compared with a non- ing model.
replicated query to the best individual DNS server. Al- We assume a set of N independent, identical servers, each
though this incurs added load on DNS servers, replica- with the same service time distribution S. Requests arrive in
tion saves more than 100 msec per KB of added traffic, the system according to a Poisson process, and k copies are
so that it is more than an order of magnitude bet- made of each arriving request and enqueued at k of the N
ter than an estimated cost-effectiveness threshold [29, servers, chosen uniformly at random. To start with, we will
30]. Similarly, a simple analysis indicates that repli- assume that redundancy is “free” for the clients — that it
cating TCP connection establishment packets can save adds no appreciable penalty apart from an increase in server
roughly 170 msec (in the mean) and 880 msec (in the utilization. We consider the effect of client-side overhead
tail) per KB of added traffic. later in this section.
Figures 1(a) and 1(b) show results from a simulation of
• Database queries within a data center. We im- this queueing model, measuring the mean response time
plement query replication in a database system similar (queueing delay + service time) as a function of load with
to a web service, where a set of clients continually read two different service time distributions. Replication im-
objects from a set of back-end servers. Our results in- proves the mean, but provides the greatest benefit in the
dicate that when most queries are served from disk tail, for example reducing the 99.9th percentile by 5× under
Deterministic service time Pareto service time Pareto service time:
CDF at load 0.2
Mean response time (s)
Mean response time (s)

2.5
1.4 1
2
than threshold
1 copy
Fraction later
1.3 0.1
1.5 0.01 2 copies
1.2
1 0.001
1.1 1 copy 1 copy 0.0001
2 copies 0.5 2 copies 1e-05
1
0 1e-06
0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5 1 10 100 1000
Load Load Response time (s)
(a) Mean: deterministic (b) Mean: Pareto (c) CDF: Pareto
Figure 1: A first example of the effect of replication, showing response times when service time distribution
is deterministic and Pareto (α = 2.1)
Pareto service times. Note the thresholding effect: in both Theorem 1. Within the independence approximation, if
systems, there is a threshold load below which redundancy the service times at every server are i.i.d. exponentially dis-
always helps improve mean latency, but beyond which the tributed, the threshold load is 33%.
extra load it adds overwhelms any latency reduction that it
Proof. Assume, without loss of generality, that the mean
achieves. The threshold is higher — i.e., redundancy helps
service time at each server is 1 second. Suppose requests
over a larger range of loads — when the service time distri-
arrive at a rate of ρ queries per second per server.
bution is more variable.
Without replication, each server evolves as an M/M/1
The threshold load, defined formally as the largest uti-
queue with departure rate 1 and arrival rate ρ. The re-
lization below which replicating every request to 2 servers
sponse time of each server is therefore exponentially dis-
always helps mean response time, will be our metric of in-
tributed with rate 1 − ρ [6], and the mean response time is
terest in this section. We investigate the effect of the service 1
.
time distribution on the threshold load both analytically and 1−ρ
in simulations of the queueing model. Our results, in brief: With replication, each server is an M/M/1 queue with
departure rate 1 and arrival rate 2ρ. The response time of
1. If redundancy adds no client-side cost (meaning server- each server is exponentially distributed with rate 1 − 2ρ,
side effects are all that matter), there is strong evi- but each query now takes the minimum of two independent
dence to suggest that no matter what the service time samples from this distribution, so that the mean response
distribution, the threshold load has to be more than 1
time of each query is 2(1−2ρ) .
25%. Now replication results in a smaller response time if and
1 1
2. In general, the higher the variability in the service-time only if 2(1−2ρ) < 1−ρ , i.e., when ρ < 31 .
distribution, the larger the performance improvement
achieved. While we focus on the k = 2 case in this section, the
analysis in this theorem can be easily extended to arbitrary
3. Client-side overhead can diminish the performance im- levels of replication k.
provement due to redundancy. In particular, the thresh- Note that in this special case, since the response times are
old load can go below 25% if redundancy adds a client- exponentially distributed, the fact that replication improves
side processing overhead that is significant compared mean response time automatically implies a stronger distri-
to the server-side service time. butional dominance result: replication also improves the pth
If redundancy adds no client-side cost percentile response time for every p. However, in general,
an improvement in the mean does not automatically imply
Our analytical results rely on a simplifying approximation: stochastic dominance.
we assume that the states of the queues at the servers evolve
completely independently of each other, so that the average In the general service time case, two natural (service-time
response time for a replicated query can be computed by independent) bounds on the threshold load exist.
taking the average of the minimum of two independent sam- First, the threshold load cannot exceed 50% load in any
ples of the response time distribution at each server. This system. This is easy to see: if the base load is above 50%,
is not quite accurate because of the correlation introduced replication would push total load above 100%. It turns out
by replicated arrivals, but we believe this is a reasonable that this trivial upper bound is tight — there are fami-
approximation when the number of servers N is sufficiently lies of heavy-tailed high-variance service times for which the
large. In a range of service time distributions, we found that threshold load goes arbitrarily close to 50%. See Figures 2(a)
the mean response time computed using this approximation and 2(b).
was within 3% of the value observed in simulations with Second, we intuitively expect replication to help more as
N = 10, and within 0.1% of the value observed in simula- the service time distribution becomes more variable. Fig-
tions with N = 20. ure 2 validates this trend in three different families of distri-
We start with a simple, analytically-tractable special case: butions. Therefore, it is reasonable to expect that the worst-
when the service times at each server are exponentially dis- case for replication is when the service time is completely
tributed. A closed form expression for the response time deterministic. However, even in this case the threshold load
CDF exists in this case, and it can be used to establish the is strictly positive because there is still variability in the sys-
following result. tem due to the stochastic nature of the arrival process. With
Weibull service times Pareto service times Simple two-point service time distribution
0.5 0.5 0.5
0.4 0.4 0.4

Threshold load
Threshold load
Threshold load
0.3 0.3 0.3
0.2 0.2 0.2
0.1 0.1 0.1
0 0 0
0 2 4 6 8 10 12 14 16 18 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Inverse shape parameter 𝛾 Inverse scale paramemter 𝛽 p
(a) Weibull (b) Pareto (c) Two-point discrete distribution
Figure 2: Effect of increasing variance on the threshold load in three families of unit-mean distributions:
Pareto, Weibull, and a simple two-point discrete distribution (service time = 0.5 with probability p, 1−0.5p
1−p
with probability 1 − p). In all three cases the variance is 0 at x = 0 and increases along the x-axis, going to
infinity at the right edge of the plot.
the Poisson arrivals that we assume, the threshold load with 0.5
deterministic service time turns out to be slightly less than Conjectured lower bound
0.4 Uniform
26% — more precisely, ≈ 25.82% — based on simulations
Dirichlet
Threshold load
of the queueing model, as shown in the leftmost point in
Figure 2(c). 0.3
We conjecture that this is, in fact, a lower bound on the
threshold load in an arbitrary system. 0.2
Conjecture 1. Deterministic service time is the worst 0.1

case for replication: there is no service time distribution in
which the threshold load is below the (≈ 26%) threshold when 0
the service time is deterministic. 1 2 4 8 16 32 64 128 256 512
Size of distribution support
The primary difficulty in resolving the conjecture is that
general response time distributions are hard to handle an- Figure 3: Randomly chosen service time distribu-
alytically, especially since in order to quantify the effect of tions
taking the minimum of two samples we need to understand
the shape of the entire distribution, not just its first few mo-
ments. However, we have two forms of evidence that seem Myers and Vernon [23], the threshold load is minimized when
to support this conjecture: analyses based on approxima- the service time distribution is deterministic.
tions to the response time distribution, and simulations of
the queueing model. The heavy-tail approximation by Olvera-Cravioto et al. [24]
The primary approximation that we use is a recent re- applies to arbitrary regularly varying service time distribu-
sult by Myers and Vernon [23] that only depends on the tions, but for our analysis we add an additional assumption
first two moments of the service time distribution. The ap- requiring that the service time be sufficiently heavy. For-
proximation seems to perform fairly well in numerical eval- mally, we require that the service time distribution have a
uations with light-tailed service time distributions, such as higher coefficient of variation than the exponential distribu-
the Erlang and hyperexponential distributions (see Figure 2 tion, which
√ amounts to requiring that the tail index α be
in [23]), although no bounds on the approximation error are < 1 + 2. (The tail index is a measure of how heavy a
available. However, the authors note that the approxima- distribution is: lower indices mean heavier tails.)
tion is likely to be inappropriate when the service times are Theorem 3. Within the independence approximation and
heavy tailed. the approximation due to Olvera-Cravioto et al. [24], if the
As a supplement, therefore, in the heavy-tailed case, we service time
√ distribution is regularly varying with tail index
use an approximation by Olvera-Cravioto et al. [24] that α < 1 + 2, then the threshold load is > 30%.
is applicable when the service times are regularly varying1 .
Heavy-tail approximations are fairly well established in queue- Simulation results also seem to support the conjecture.
ing theory (see [26, 33]); the result due to Olvera-Cravioto We generated a range of service time distributions by, for
et al. is, to the best of our knowledge, the most recent (and various values of S, sampling from the space of all unit-mean
most accurate) refinement. discrete probability distributions with support {1, 2, ..., S}
The following theorems summarize our results for these in two different ways — uniformly at random, and using a
approximations. We omit the proofs due to space constraints. symmetric Dirichlet distribution with concentration param-
eter 0.1 (the Dirichlet distribution has a higher variance and
Theorem 2. Within the independence approximation and generates a larger spread of distributions than uniform sam-
the approximation of the response time distribution due to pling). Figure 3 reports results when we generate a 1000
1
The class of regularly varying distributions is an important different random distributions for each value of S and look
subset of the class of heavy-tailed distributions that includes at the minimum and maximum observed threshold load over
as its members the Pareto and the log-Gamma distributions. this set of samples.
0.5 ate requests according to identical Poisson processes. Each
Pareto request downloads a file chosen uniformly at random from
0.4 Exponential the entire collection. We only test read performance on a
Deterministic
Threshold load
static data set; we do not consider writes or updates.

0.3 Figure 5 shows results for one particular web-server con-
figuration, with
0.2
• Mean file size = 4 KB
0.1
• File size distribution = deterministic, 4 KB per file
0
0 0.2 0.4 0.6 0.8 1 • Cache:disk ratio = 0.1
Extra latency per request added by replication • Server/client hardware = 4 servers and 10 clients, all
(as fraction of mean service time) identical single-core Emulab nodes with 3 GHz CPU,
2 GB RAM, gigabit network interfaces, and 10k RPM
Figure 4: Effect of redundancy-induced client-side disks.
latency overhead, with different server service time
distributions. Disk is the bottleneck in the majority of our experiments –
CPU and network usage are always well below peak capacity.
The threshold load (the maximum load below which repli-
Effect of client-side overhead cation always helps) is 30% in this setup — within the 25-
50% range predicted by the queueing analysis. Redundancy
As we noted earlier, our analysis so far assumes that the reduces mean latency by 33% at 10% load and by 25% at
client-side overhead (e.g. added CPU utilization, kernel pro- 20% load. Most of the improvement comes from the tail.
cessing, network overhead) involved in processing the repli- At 20% load, for instance, replication cuts 99th percentile
cated requests is negligible. This may not be the case when, latency in half, from 150 ms to 75 ms, and reduces 99.9th
for instance, the operations in question involve large file percentile latency 2.2×.
transfers or very quick memory accesses. In both cases, the The experiments in subsequent figures (Figures 6-11) vary
client-side latency overhead involved in processing an addi- one of the above configuration parameters at a time, keeping
tional replicated copy of a request would be comparable in the others fixed. We note three observations.
magnitude to the server latency for processing the request. First, as long as we ensure that file sizes continue to re-
This overhead can partially or completely counteract the main relatively small, changing the mean file size (Figure 6)
latency improvement due to redundancy. Figure 4 quanti- or the shape of the file size distribution (Figure 7) does not
fies this effect by considering what happens when replication siginificantly alter the level of improvement that we observe.
adds a fixed latency penalty to every request. These results This is because the primary bottleneck is the latency in-
indicate that the more variable distributions are more for- volved in locating the file on disk — when file sizes are small,
giving of overhead, but client side overhead must be at least the time needed to actually load the file from disk (which
somewhat smaller than mean request latency in order for is what the specifics of the file size distribution affect) is
replication to improve mean latency. This is not surprising, negligible.
of course: if replication overhead equals mean latency, repli- Second, as predicted in our queueing model (§2.1), in-
cation cannot improve mean latency for any service time creasing the variability in the system causes redundancy to
distribution — though it may still improve the tail. perform better. We tried increasing variability in two dif-
ferent ways — increasing the proportion of access hitting
2.2 Application: disk-backed database disk by reducing the cache-to-disk ratio (Figure 8), and run-
Many data center applications involve the use of a large ning on a public cloud (EC2) instead of dedicated hardware
disk-based data store that is accessed via a smaller main- (Figure 9). The increase in improvement is relatively minor,
memory cache: examples include the Google AppEngine although still noticeable, when we reduce the cache-to-disk
data store [16], Apache Cassandra [10], and Facebook’s Haystack ratio. The benefit is most visible in the tail: the 99.9th per-
image store [7]. In this section we study a representative centile latency improvement at 10% load goes up from 2.3×
implementation of such a storage service: a set of Apache in the base configuration to 2.8× when we use the smaller
web servers hosting a large collection of files, split across the cache-to-disk ratio, and from 2.2× to 2.5× at 20% load.
servers via consistent hashing, with the Linux kernel man- The improvement is rather more dramatic when going
aging a disk cache on each server. from Emulab to EC2. Redundancy cuts the mean response
We deploy a set of Apache servers and, using a light-weight time at 10-20% load on EC2 in half, from 12 ms to 6 ms
memory-soaking process, adjust the memory usage on each (compare to the 1.3 − 1.5× reduction on Emulab). The tail
server node so that around half the main memory is avail- improvement is even larger: on EC2, the 99.9th percentile
able for the Linux disk cache (the other half being used by latency at 10-20% load drops 8× when we use redundancy,
other applications and the kernel). We then populate the from around 160 ms to 20 ms. It is noteworthy that the
servers with a collection of files whose total size is chosen worst 0.1% of outliers with replication are quite close to the
to achieve a preset target cache-to-disk ratio. The files are 12 ms mean without replication!
partitioned across servers via consistent hashing, and two Third, as also predicted in §2.1, redundancy ceases to help
copies are stored of every file: if the primary is stored on when the client-side overhead due to replication is a signif-
server n, the (replicated) secondary goes to server n + 1. We icant fraction of the mean service time, as is the case when
measure the response time when a set of client nodes gener- the file sizes are very large (Figure 10) or when the cache
Mean response time 99.9th %ile response time Load 0.2: CDF
Response time (ms)
Response time (ms)

40 1
than threshold
Fraction later
30 600 0.1
400 0.01
20
1 copy 1 copy 0.001 1 copy
10 200 0.0001
2 copies 2 copies 2 copies
0 0 1e-05
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 10 100 1000
Load Load Response time (ms)
Figure 5: Base configuration
Response time (ms)
Response time (ms)

40 1
than threshold
Fraction later
30 600 0.1
20 400 0.01
1 copy 200 1 copy 1 copy
10 0.001
0 0 0.0001
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 10 100 1000
Figure 6: Mean file size 0.04 KB instead of 4 KB
Response time (ms)
Response time (ms)
40 1
than threshold
Fraction later
30 600 0.1
20 400 0.01
1 copy 200 1 copy 1 copy
10 0.001
0 0 0.0001
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 10 100 1000
Figure 7: Pareto file size distribution instead of deterministic
Response time (ms)
Response time (ms)
40 1
than threshold
Fraction later
30 600 0.1
400 0.01
20
1 copy 1 copy 0.001 1 copy
10 200 0.0001
0 0 1e-05
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 10 100 1000
Figure 8: Cache:disk ratio 0.01 instead of 0.1. Higher variability because of the larger proportion of accesses
hitting disk. Compared to Figure 5, 99.9th percentile improvement goes from 2.3× to 2.8× at 10% load, and
from 2.2× to 2.5× at 20% load.
Fraction later than threshold
Mean response time 99.9th %ile response time Rate 1000 queries/sec/node: CDF
Response time (ms)
Response time (ms)
40 1
600 1 copy
30 0.1
2 copies
20 400 0.01
1 copy 200 1 copy
10 0.001
2 copies 2 copies
0 0 0.0001
0 2 4 6 8 10 12 0 2 4 6 8 10 12 10 100 1000
Arrival rate (queries/sec/node) Arrival rate (queries/sec/node) Response time (ms)
Figure 9: EC2 nodes instead of Emulab. x-axis shows unnormalised arrival rate because maximum throughput
seems to fluctuate. Note the much larger tail improvement compared to Figure 5.
Response time (ms)
Response time (ms)

100 1
than threshold
Fraction later
80 600 0.1
60 400 0.01
40 1 copy 1 copy 1 copy
20 200 0.001
0 0 0.0001
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 10 100 1000
Figure 10: Mean file size 400 KB instead of 4 KB
Response time (ms)
Response time (ms)
2 1
than threshold
Fraction later
60 0.1
1.5
0.01
1 40 0.001
1 copy 20 1 copy 0.0001 1 copy
0.5
2 copies 2 copies 1e-05 2 copies
0 0 1e-06
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0.1 1 10 100 1000
Figure 11: Cache:disk ratio 2 instead of 0.1. Cache is large enough to store contents of entire disk
Response time (ms)
Response time (ms)
0.5 2 1
than threshold
1 copy
Fraction later
0.4 0.1
1.5
0.3 0.01 2 copies
1 0.001
0.2 1 copy 1 copy 0.0001
0.1 0.5
2 copies 2 copies 1e-05
0 0 1e-06
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0.1 1 10 100
Figure 12: memcached

1 overhead was mitigated by the response time reduction it
Fraction later than threshold 1 copy: real achieved). We now consider a setting in which this over-
0.1 2 copies: real
1 copy: stub
head can be essentially eliminated: a network in which the
0.01 2 copies: stub switches are capable of strict prioritization.
Specifically, we consider a data center network. Many
0.001 data center network architectures [2, 18] provide multiple
0.0001
equal-length paths between each source-destination pair, and
assign flows to paths based on a hash of the flow header [20].
1e-05 However, simple static flow assignment interacts poorly with
the highly skewed flow-size mix typical of data centers: the
1e-06
majority of the traffic volume in a data center comes from a
1e-05 0.0001 0.001 0.01 0.1
small number of large elephant flows [2, 3], and hash-based
Response time (s) flow assignment can lead to hotspots because of the possi-
bility of assigning multiple elephant flows to the same link,
Figure 13: memcached: stub and normal version which can result in significant congestion on that link. Re-
response times at 0.1% load cent work has proposed mitigating this problem by dynam-
ically reassigning flows in response to hotspots, in either a
centralized [1] or distributed [31] fashion.
is large enough that all the files fit in memory (Figure 11). We consider a simple alternative here: redundancy. Ev-
We study this second scenario more directly, using an in- ery switch replicates the first few packets of each flow along
memory distributed database, in the next section. an alternate route, reducing the probability of collision with
an elephant flow. Replicated packets are assigned a lower
2.3 Application: memcached (strict) priority than the original packets, meaning they can
We run a similar experiment to the one in the previous never delay the original, unreplicated traffic in the network.
section, except that we replace the filesystem store + Linux Note that we could, in principle, replicate every packet —
kernel cache + Apache web server interface setup with the the performance when we do this can never be worse than
memcached in-memory database. Figure 12 shows the ob- without replication — but we do not since unnecessary repli-
served response times in an Emulab deployment. The results cation can reduce the gains we achieve by increasing the
show that replication seems to worsen overall performance amount of queueing within the replicated traffic. We repli-
at all the load levels we tested (10-90%). cate only the first few packets instead, with the aim of reduc-
To understand why, we test two versions of our code at ing the latency for short flows (the completion times of large
a low (0.1%) load level: the “normal” version, as well as a flows depend on their aggregate throughput rather than in-
version with the calls to memcached replaced with stubs, dividual per-packet latencies, so replication would be of little
no-ops that return immediately. The performance of this use).
stub version is an estimate of how much client-side latency We evaluate this scheme using an ns-3 simulation of a
is involved in processing a query. common 54-server three-layered fat-tree topology, with a full
Figure 13 shows that the client-side latency is non-trivial. bisection-bandwidth fabric consisting of 45 6-port switches
Replication increases the mean response time in the stub ver- organized in 6 pods. We use a queue buffer size of 225
sion by 0.016 ms, which is 9% of the 0.18 ms mean service KB and vary the link capacity and delay. Flow arrivals
time. This is an underestimate of the true client-side over- are Poisson, and flow sizes are distributed according to a
head since the stub version, which doesn’t actually process standard data center workload [8], with flow sizes varying
queries, does not measure the network and kernel overhead from 1 KB to 3 MB and with more than 80% of the flows
involved in sending and receiving packets over the network. being less than 10 KB.
The client-side latency overhead due to redundancy is thus Figure 14 shows the completion times of flows smaller than
at least 9% of the mean service time. Further, the service 10 KB when we replicate the first 8 packets in every flow.
time distribution is not very variable: although there are Figure 14(a) shows the reduction in the median flow com-
outliers, more than 99.9% of the mass of the entire distri- pletion time as a function of load for three different delay-
bution is within a factor of 4 of the mean. Figure 4 in §2.1 bandwidth combinations (achieved by varying the latency
shows that when the service time distribution is completely and capacity of each link in the network). Note that in all
deterministic, a client-side overhead greater than 3% of the three cases, the improvement is small at low loads, rises un-
mean service time is large enough to completely negate the til load ≈ 40%, and then starts to fall. This is because at
response time reduction due to redundancy. very low loads, the congestion on the default path is small
In our system, redundancy does not seem to have that ab- enough that replication does not add a significant benefit,
solute a negative effect – in the “normal” version of the code, while at very high loads, every path in the network is likely
redundancy still has a slightly positive effect overall at 0.1% to be congested, meaning that replication again yields lim-
load (Figure 13). This suggests that the threshold load is ited gain. We therefore obtain the largest improvement at
positive though small (it has to be smaller than 10%: Fig- intermediate loads.
ure 12 shows that replication always worsens performance Note also that the performance improvement we achieve
beyond 10% load). falls as the delay-bandwidth product increases. This is be-
cause our gains come from the reduction in queuing delay
2.4 Application: replication in the network when the replicated packets follow an alternate, less con-
gested, route. At higher delay-bandwidth products, queue-
Replication has always added a non-zero amount of over-
ing delay makes up a smaller proportion of the total flow
head in the systems we have considered so far (even if that
% improvement in median flow completion time 99th %ile flow completion time CDF: Load 0.4
40 12 1
35 5 Gbps, 2 us per hop No replication 0.9 No replication
Completion time (ms)

10 Gbps, 2 us per hop 10 Replication 0.8 Replication
30
% improvement
than threshold
0.7
Fraction later
10 Gbps, 6 us per hop 8
25 0.6
20 6 0.5
15 0.4
4 0.3
10
2 0.2
5 0.1
0 0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Total load Total load Flow completion time (ms)
Figure 14: Median and tail completion times for flows smaller than 10 KB
completion time, meaning that the total latency savings servers and the clients against the economic value of the
achieved is correspondingly smaller. At 40% network load, latency improvement that would be achieved. In our eval-
we obtain a 38% improvement in median flow completion uation we find that the latency improvement achieved by
time (0.29 ms vs. 0.18 ms) when we use 5 Gbps links with redundancy is orders of magnitude larger than the required
2 us per-hop delay. The improvement falls to 33% (0.15 ms threshold in both the applications we consider here.
vs. 0.10 ms) with 10 Gbps links with 2 us per-hop delay,
and further to 19% (0.21 ms vs. 0.17 ms) with 10 Gbps links 3.1 Application: Connection establishment
with 6 us per-hop delay. We start with a simple example, demonstrating why repli-
Next, Figure 14(b) shows the 99th percentile flow comple- cation should be cost-effective even when the available choices
tion times for one particular delay-bandwidth combination. are limited: we use a back-of-the-envelope calculation to
In general, we see a 10-20% reduction in the flow comple- consider what happens when multiple copies of TCP-handshake
tion times, but at 70-80% load, the improvement spikes to packets are sent on the same path. It is obvious that this
80-90%. The reason turns out to be timeout avoidance: at should help if all packet losses on the path are independent.
these load levels, the 99th percentile unreplicated flow faces In this case, sending two back-to-back copies of a packet
a timeout, and thus has a completion time greater than the would reduce the probability of it being lost from p to p2 .
TCP minRTO, 10 ms. With redundancy, the number of In practice, of course, back-to-back packet transmissions are
flows that face timeouts reduces significantly, causing the likely to observe a correlated loss pattern. But Chan et
99th percentile flow completion time to be much smaller al. [11] measured a significant reduction in loss probabil-
than 10 ms. ity despite this correlation. Sending back-to-back packet
At loads higher than 80%, however, the number of flows pairs between PlanetLab hosts, they found that the aver-
facing timeouts is high even with redundancy, resulting in a age probability of individual packet loss was ≈ 0.0048, and
narrowing of the performance gap. the probability of both packets in a back-to-back pair being
Finally, Figure 14(c) shows a CDF of the flow completion dropped was only ≈ 0.0007 – much larger than the ∼ 10−6
times at one particular load level. Note that the improve- that would be expected if the losses were independent, but
ment in the mean and median is much larger than that in still 7× lower than the individual packet loss rate.2
the tail. We believe this is because the high latencies in the As a concrete example, we quantify the improvement that
tail occur at those instants of high congestion when most of this loss rate reduction would effect on the time required to
the links along the flow’s default path are congested. There- complete a TCP handshake. The three packets in the hand-
fore, the replicated packets, which likely traverse some of shake are ideal candidates for replication: they make up
the same links, do not fare significantly better. an insignificant fraction of the total traffic in the network,
Replication has a negligible impact on the elephant flows: and there is a high penalty associated with their being lost
it improved the mean completion time for flows larger than (Linux and Windows use a 3 second initial timeout for SYN
1 MB by a statistically-insignificant 0.12%. packets; OS X uses 1 second [12]). We use the loss prob-
ability statistics discussed above to estimate the expected
3. INDIVIDUAL VIEW latency savings on each handshake.
We consider an idealized network model. Whenever a
The model and experiments of the previous section in- packet is sent on the network, we assume it is delivered suc-
dicated that in a range of scenarios, latency is best opti- cessfully after (RT T /2) seconds with probability 1 − p, and
mized in a fixed set of system resources through replication. lost with probability p. Packet deliveries are assumed to be
However, settings such as the wide-area Internet are better independent of each other. p is 0.0048 when sending one
modeled as having elastic resources: individual participants copy of each packet, and 0.0007 when sending two copies
can selfishly choose whether to replicate an operation, but of each packet. We also assume TCP behavior as in the
this incurs an additional cost (such as bandwidth usage or Linux kernel: an initial timeout of 3 seconds for SYN and
battery consumption). In this section, we present two exam- SYN-ACK packets and of 3 × RT T for ACK packets, and
ples of wide-area Internet applications in which replication exponential backoff on packet loss [12].
achieves a substantial improvement in latency. We argue With this model, it can be shown that duplicating all three
that the latency reduction in both these applications out- packets in the handshake would reduce its expected comple-
weighs the cost of the added overhead by comparing against
a benchmark that we develop in a companion article [29]. 2
It might be possible to do even better by spacing the trans-
The benchmark establishes a cost-effectiveness threshold by missions of the two packets in the pair a few milliseconds
comparing the cost of the extra overhead induced at the apart to reduce the correlation.
tion time by approximately (3+3+3×RT T )×(4.8−0.7) ms, 1
Fraction later than threshold

which is at least 25 ms. The benefit increases with RT T ,
and is even higher in the tail: duplication would improve 0.1
the 99.9th percentile handshake completion time by at least
880 ms. 0.01
Is this improvement worth the cost of added traffic? Qual- 1 server
itatively, even 25 ms is significant relative to the size of the 0.001 2 servers
5 servers
handshake packets. Quantitatively, a cost-benefit analysis 10 servers
is difficult since it depends on estimating and relating the 0.0001
direct and indirect costs of added traffic and the value to 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
humans of lower latency. While an accurate comparison is Response time threshold (s)
likely quite difficult, the study referenced at the beginning of

this section [29, 30] estimated these values using the pricing Figure 15: DNS response time distribution.
of cloud services, which encompasses a broad range of costs,
including those for bandwidth, energy consumption, server 70
utilization, and network operations staff, and concluded that 60
in a broad class of cases, reducing latency is useful as long
% latency reduction
50
as it improves latency by 16 ms for every KB of extra traf-
fic. In comparison, the latency savings we obtain in TCP 40
connection establishment is more than an order of magni- 30

Mean
tude larger than this threshold in the mean, and more than 20 Median
two orders of magnitude larger in the tail. Specifically, if we 10 95th %ile
assume each packet is 50 bytes long then a 25-880 ms im- 99th %ile
0
provement implies a savings of around 170-6000 ms/KB. We 2 3 4 5 6 7 8 9 10
caution, however, that the analysis of [29, 30] was necessar- Number of copies of each query
ily imprecise; a more rigorous study would be an interesting
avenue of future work. Figure 16: Reduction in DNS response time, aver-
aged across 15 PlanetLab servers.
3.2 Application: DNS
An ideal candidate for replication is a service that in-
volves small operations and which is replicated at multiple reduction with just 2 DNS servers in all metrics, improving
locations, thus providing diversity across network paths and to 50-62% reduction with 10 servers. Finally, we compared
servers, so that replicated operations are quite independent. performance to the best single server in retrospect, i.e., the
We believe opportunities to replicate queries to such services server with minimum mean response time for the queries to
may arise both in the wide area and the data center. Here, individual servers in Stage 2 of the experiment, since the
we explore the case of replicating DNS queries. best server may change over time. Even compared with this
We began with a list of 10 DNS servers3 and Alexa.com’s stringent baseline, we found a result similar to Fig. 16, with
list of the top 1 million website names. At each of 15 Plan- a reduction of 44-57% in the metrics when querying 10 DNS
etLab nodes across the continental US, we ran a two-stage servers.
experiment: (1) Rank all 10 DNS servers in terms of mean How many servers should one use? Figure 17 compares the
response time, by repeatedly querying a random name at a marginal increase in latency savings from each extra server
random server. Note that this ranking is specific to each against the 16 ms/KB benchmark [29, 30] discussed earlier
PlanetLab server. (2) Repeatedly pick a random name and in this section. The results show that what we should do de-
perform a random one of 20 possible trials — either querying pends on the metric we care about. If we are only concerned
one of the ten individual DNS servers, or querying anywhere with mean performance, it does not make economic sense to
from 1 to 10 of the best servers in parallel (e.g. if sending 3
copies of the query, we send them to the top 3 DNS servers
in the ranked list). In each of the two stages, we performed
one trial every 5 seconds. We ran each stage for about a Incremental Improvement
week at each of the 15 nodes. Any query which took more 1000
99th %ile
Latency savings (ms/KB)
than 2 seconds was treated as lost, and counted as 2 sec

when calculating mean response time. Mean
100 Break-even point
Figure 15 shows the distribution of query response times
across all the PlanetLab nodes. The improvement is sub-
stantial, especially in the tail: Querying 10 DNS servers, the
10
fraction of queries later than 500 ms is reduced by 6.5×, and
the fraction later than 1.5 sec is reduced by 50×. Averaging
over all PlanetLab nodes, Figure 16 shows the average per- 1
cent reduction in response times compared to the best fixed 2 3 4 5 6 7 8 9 10
DNS server identified in stage 1. We obtain a substantial Number of DNS servers
3
The default local DNS server, plus public servers from
Level3, Google, Comodo, OpenDNS, DNS Advantage, Nor- Figure 17: Incremental latency improvement from
ton DNS, ScrubIT, OpenNIC, and SmartViper. each extra server contacted
contact any more than 5 DNS servers for each query, but if did not study the systems view of optimizing a fixed set of
we care about the 99th percentile, then it is always useful resources.
to contact 10 or more DNS servers for every query. Note Most importantly, unlike all of the above work, our goal is
also that the absolute (as opposed to the marginal) latency to demonstrate the power of redundancy as a general tech-
savings is still worthwhile, even in the mean, if we contact nique. We do this by providing a characterization of when
10 DNS servers for every query. The absolute mean latency it is (and isn’t) useful, and by quantifying the performance
savings from sending 10 copies of every query is 0.1 sec / improvement it offers in several use cases where it is appli-
4500 extra bytes ≈ 23 ms/KB, which is more than twice the cable.
break-even latency savings. And if the client costs are based
on DSL rather than cell service, the above schemes are all 5. CONCLUSION
more than 100× more cost-effective. We studied an abstract characterization of the tradeoff
Querying multiple servers also increases caching, a side- between the latency reduction achieved by redundancy and
benefit which would be interesting to quantify. the cost of the overhead it induces to demonstrate that re-
Prefetching — that is, preemptively initiating DNS lookups dundancy should have a net positive impact in a large class
for all links on the current web page — makes a similar of systems. We then confirmed empirically that redundancy
tradeoff of increasing load to reduce latency, and its use is offers a significant benefit in a number of practical appli-
widespread in web browsers. Note, however, that redun- cations, both in the wide area and in the data center. We
dancy is complementary to prefetching, since some names believe our results demonstrate that redundancy is a pow-
in a page will not have been present on the previous page erful technique that should be used much more commonly
(or there may not be a previous page). in networked systems than it currently is. Our results also
will guide the judicious application of redundancy within
4. RELATED WORK only those cases where it is a win in terms of performance
Replication is used pervasively to improve reliability, and or cost-effectiveness.
in many systems to reduce latency. Distributed job exe-
cution frameworks, for example, have used task replication Acknowledgements
to improve response time, both preemptively [4, 15] and to We would like to thank our shepherd Sem Borst and the
mitigate the impact of stragglers [32]. anonymous reviewers for their valuable suggestions. We
Within networking, replication has been explored to re- gratefully acknowledge the support of NSF grants 1050146,
duce latency in several specialized settings, including repli- 1149895, 1117161 and 1040838.
cating DHT queries to multiple servers [22] and replicat-
ing transmissions (via erasure coding) to reduce delivery
time and loss probability in delay-tolerant networks [21, 27].
6. REFERENCES
[1] M. Al-Fares, S. Radhakrishnan, B. Raghavan,
Replication has also been suggested as a way of providing
N. Huang, and A. Vahdat. Hedera: dynamic flow
QoS prioritization and improving latency and loss perfor- scheduling for data center networks. In Proceedings of
mance in networks capable of redundancy elimination [19]. the 7th USENIX conference on Networked systems
Dean and Barroso [13] discussed Google’s use of redun-
design and implementation, NSDI’10, pages 19–19,
dancy in various systems, including a storage service similar
Berkeley, CA, USA, 2010. USENIX Association.
to the one we evaluated in §2.2, but they studied specific sys-
[2] M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye,
tems with capabilities that are not necessarily available in
P. Patel, B. Prabhakar, S. Sengupta, and
general (such as the ability to cancel outstanding partially-
M. Sridharan. Data center TCP (DCTCP). In
completed requests), and did not consider the effect the total
SIGCOMM, 2010.
system utilization could have on the efficacy of redundancy.
In contrast, we thoroughly evaluate the effect of redundancy [3] M. Alizadeh, S. Yang, S. Katti, N. McKeown,
at a range of loads both in various configurations of a de- B. Prabhakar, and S. Shenker. Deconstructing
ployed system (§2.2, §2.3), and in a large space of synthetic datacenter packet transport. In Proceedings of the 11th
scenarios in an abstract system model (§2.1). ACM Workshop on Hot Topics in Networks,
Andersen et al. [5]’s MONET system proxies web traf- HotNets-XI, pages 133–138, New York, NY, USA,
fic through an overlay network formed out of multi-homed 2012. ACM.
proxy servers. While the primary focus of [5] is on adapt- [4] G. Ananthanarayanan, A. Ghodsi, S. Shenker, and
ing quickly to changes in path performance, they replicate I. Stoica. Why let resources idle? Aggressive cloning of
two specific subsets of their traffic: connection establish- jobs with Dolly. In USENIX HotCloud, 2012.
ment requests to multiple servers are sent in parallel (while [5] D. G. Andersen, H. Balakrishnan, M. F. Kaashoek,
the first one to respond is used), and DNS queries are repli- and R. N. Rao. Improving web availability for clients
cated to the local DNS server on each of the multi-homed with MONET. In USENIX NSDI, pages 115–128,
proxy server’s interfaces. We show that replication can be Berkeley, CA, USA, 2005. USENIX Association.
useful in both these contexts even in the absence of path di- [6] S. Asmussen. Applied Probability and Queues. Wiley,
versity: a significant performance benefit can be obtained by 1987.
sending multiple copies of TCP SYNs to the same server on [7] D. Beaver, S. Kumar, H. C. Li, J. Sobel, and
the same path, and by replicating DNS queries to multiple P. Vajgel. Finding a needle in haystack: facebook’s
public servers over the same access link. photo storage. In Proceedings of the 9th USENIX
In a recent workshop paper [30] we advocated using re- conference on Operating systems design and
dundancy to reduce latency, but it was preliminary work implementation, OSDI’10, pages 1–8, Berkeley, CA,
that did not characterize when redundancy is helpful, and USA, 2010. USENIX Association.
[8] T. Benson, A. Akella, and D. A. Maltz. Network traffic [22] J. Li, J. Stribling, R. Morris, and M. Kaashoek.
characteristics of data centers in the wild. In IMC, Bandwidth-efficient management of DHT routing
pages 267–280, New York, NY, USA, 2010. ACM. tables. In NSDI, 2005.
[9] J. Brutlag. Speed matters for Google web search, June [23] D. S. Myers and M. K. Vernon. Estimating queue
2009. http://services.google.com/fh/files/ length distributions for queues with random arrivals.
blogs/google_delayexp.pdf. SIGMETRICS Perform. Eval. Rev., 40(3):77–79, Jan.
[10] Apache Cassandra. http://cassandra.apache.org. 2012.
[11] E. W. Chan, X. Luo, W. Li, W. W. Fok, and R. K. [24] M. Olvera-Cravioto, J. Blanchet, and P. Glynn. On
Chang. Measurement of loss pairs in network paths. In the transition from heavy-traffic to heavy-tails for the
IMC, pages 88–101, New York, NY, USA, 2010. ACM. m/g/1 queue: The regularly varying case. Annals of
[12] J. Chu. Tuning TCP parameters for the 21st century. Applied Probability, 21:645–668, 2011.
http://www.ietf.org/proceedings/75/slides/ [25] S. Ramachandran. Web metrics: Size and number of
tcpm-1.pdf, July 2009. resources, May 2010. https://developers.google.
[13] J. Dean and L. A. Barroso. The tail at scale. com/speed/articles/web-metrics.
Commun. ACM, 56(2):74–80, Feb. 2013. [26] K. Sigman. Appendix: A primer on heavy-tailed
[14] P. Dixon. Shopzilla site redesign – we get what we distributions. Queueing Systems, 33(1-3):261–275,
measure, June 2009. 1999.
http://www.slideshare.net/shopzilla/ [27] E. Soljanin. Reducing delay with coding in (mobile)
shopzillas-you-get-what-you-measure-velocity-2009. multi-agent information transfer. In Communication,
[15] C. C. Foster and E. M. Riseman. Percolation of code Control, and Computing (Allerton), 2010 48th Annual
to enhance parallel dispatching and execution. IEEE Allerton Conference on, pages 1428–1433. IEEE, 2010.
Trans. Comput., 21(12):1411–1415, Dec. 1972. [28] S. Souders. Velocity and the bottom line.
[16] Google AppEngine datastore: memcached cache. http://radar.oreilly.com/2009/07/
https://developers.google.com/appengine/docs/ velocity-making-your-site-fast.html.
python/memcache/usingmemcache#Pattern. [29] A. Vulimiri, P. B. Godfrey, and S. Shenker. A
[17] W. Gray and D. Boehm-Davis. Milliseconds matter: cost-benefit analysis of low latency via added
An introduction to microstrategies and to their use in utilization, June 2013. http://web.engr.illinois.
describing and predicting interactive behavior. Journal edu/~vulimir1/benchmark.pdf.
of Experimental Psychology: Applied, 6(4):322, 2000. [30] A. Vulimiri, O. Michel, P. B. Godfrey, and S. Shenker.
[18] A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, More is less: Reducing latency via redundancy. In
C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and Eleventh ACM Workshop on Hot Topics in Networks
S. Sengupta. VL2: a scalable and flexible data center (HotNets-XI), October 2012.
network. In ACM SIGCOMM, pages 51–62, New [31] X. Wu and X. Yang. Dard: Distributed adaptive
York, NY, USA, 2009. ACM. routing for datacenter networks. In Proceedings of the
[19] D. Han, A. Anand, A. Akella, and S. Seshan. RPT: 2012 IEEE 32nd International Conference on
re-architecting loss protection for content-aware Distributed Computing Systems, ICDCS ’12, pages
networks. In Proceedings of the 9th USENIX 32–41, Washington, DC, USA, 2012. IEEE Computer
conference on Networked Systems Design and Society.
Implementation, NSDI’12, pages 6–6, Berkeley, CA, [32] M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and
USA, 2012. USENIX Association. I. Stoica. Improving MapReduce performance in
[20] C. Hopps. Computing TCP’s retransmission timer heterogeneous environments. In USENIX OSDI, pages
(RFC 6298), 2000. 29–42, Berkeley, CA, USA, 2008.
[21] S. Jain, M. Demmer, R. Patra, and K. Fall. Using [33] A. P. Zwart. Queueing Systems With Heavy Tails.
redundancy to cope with failures in a delay tolerant PhD thesis, Technische Universiteit Eindhoven,
network. In ACM SIGCOMM, 2005. September 2001.
164
CURRICULUM VITAE
Mr.P.Venketesh was born at Coimbatore on 11th April 1979. He
completed his schooling from G.D Matriculation School, Coimbatore in 1996.
He received his B.Sc Computer Technology degree from P.S.G
College of Technology, Coimbatore in May 1999. He obtained his M.Sc
Computer Technology degree from the same institution in May 2001. He also
obtained his M.S (By Research) degree from Anna University, Chennai in April
2008. He is currently working as Assistant Professor (Senior Grade) in the
Department of Computer and Information Sciences, PSG College of Technology,
Coimbatore.
He is a life member of ISTE, ACCS and ISSS. His areas of interest
include Content Distribution Networks, Web Caching and Prefetching,
Distributed Computing and Web Mining.

Web Prefetching

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Web Prefetching

Uploaded by

Copyright:

Available Formats

NEW APPROACHES IN WEB PREFETCHING TO

IMPROVE CONTENT ACCESS BY END-USERS

in partial fulfilment for the requirement of award of the degree

FACULTY OF SCIENCE AND HUMANITIES

The growth of Internet at a rapid pace with enormous number of users

resulting in poor quality of service (availability, reliability) and latency perceived

by the users. Web caching and prefetching provides effective mechanisms to

prefetching mechanism and suggesting new approaches for effective prefetching

in web environment. The goal of web prefetching is to download (prefetch) the

The thesis primarily focuses on two key aspects:

· Methods to generate predictions that improve prefetching

· Mechanism to effectively manage the contents of cache (regular

and prefetch) by designing cache replacement scheme

Web predictions can be generated at server, proxy or client using

variety of information depending on the location where they are implemented.

file to generate predictions. Client based predictions consider contents of web

pages accessed by a user to generate predictions.

The major contributions of this research work are as follows:

The first part of thesis focuses on improving the client based

predictions by designing two approaches: Naïve-Bayes and Fuzzy Logic, that

uses hyperlinks accessed by users’ to generate the predictions. The hypertext

prefetching engine are implemented in the client machine focusing on browsing

behavior of single or multiple users.

The second part of thesis discusses a prediction algorithm designed to

are then generated by analyzing the arcs in graph.

Finally, third part of thesis discusses cache replacement scheme that

plays significant role in effectively maintaining the contents of cache to achieve

regular cache for its extended presence to serve user requests.

The performance of proposed approaches has been evaluated using

recall and precision metrics. Experimental results indicate the effectiveness of

proposed approaches over existing schemes.

CHAPTER No. TITLE PAGE No.

LIST OF TABLES xiii

LIST OF FIGURES xiv

LIST OF ABBREVIATIONS xvii

1.1 WEB PREFETCHING 3

CHAPTER No. TITLE PAGE No.

2.4 COMMERCIAL PRODUCTS 29

3. HYPERLINK BASED WEB PREDICTION 39

CHAPTER No. TITLE PAGE No.

3.3.2.1.2 Computing priority value 62

4. PRECEDENCE GRAPH BASED WEB PREDICTION 77

CHAPTER No. TITLE PAGE No.

4.6 CONCLUSION 113

5. CACHE REPLACEMENT SCHEME TO

6.1 SUGGESTIONS FOR FUTURE WORK 151

LIST OF PUBLICATIONS 163

CURRICULUM VITAE 164

TABLE No. TITLE PAGE No.

3.1 User-Accessed Repository 50

4.1 Sample user requests in a session 90

4.2 Hints generated for user requests 93

4.3 Notations used in Trimming algorithm 95

4.4 Important fields in a log file entry 106

5.1 Input Parameters to FIS 133

5.2 Symbols used with their meanings 134

5.3 Preprocessed data from the log file 140

5.4 Training data created from preprocessed file 141

FIGURE No. TITLE PAGE No.

1.1 Prefetching Web pages between user requests 4

1.2 Server based Prefetching 6

1.3 Client based Prefetching 7

1.4 Proxy based Prefetching 8

2.1 Page access without Prefetching 13