You are on page 1of 182

NEW APPROACHES IN WEB PREFETCHING TO

IMPROVE CONTENT ACCESS BY END-USERS

A THESIS

Submitted by

VENKETESH P

in partial fulfilment for the requirement of award of the degree


of
DOCTOR OF PHILOSOPHY

FACULTY OF SCIENCE AND HUMANITIES


ANNA UNIVERSITY
CHENNAI 600 025

SEPTEMBER 2013
ii
iii
iv
v

ABSTRACT

The growth of Internet at a rapid pace with enormous number of users

and web services constantly demands good infrastructure to deliver the web

contents to users with minimal delay. Tremendous increase in the global traffic

due to demands from large number of users strains the servers and network,

resulting in poor quality of service (availability, reliability) and latency perceived

by the users. Web caching and prefetching provides effective mechanisms to

mitigate the user perceived latency. This thesis is focused on the study of existing

prefetching mechanism and suggesting new approaches for effective prefetching

in web environment. The goal of web prefetching is to download (prefetch) the

contents and store it in local cache before user actually requests them. It

minimizes the latency time perceived by users when accessing the content.

The thesis primarily focuses on two key aspects:

· Methods to generate predictions that improve prefetching

activity

· Mechanism to effectively manage the contents of cache (regular

and prefetch) by designing cache replacement scheme

Web predictions can be generated at server, proxy or client using

variety of information depending on the location where they are implemented.

Server based predictions consider access history of several users stored in a log
vi

file to generate predictions. Client based predictions consider contents of web

pages accessed by a user to generate predictions.

The major contributions of this research work are as follows:

The first part of thesis focuses on improving the client based

predictions by designing two approaches: Naïve-Bayes and Fuzzy Logic, that

uses hyperlinks accessed by users’ to generate the predictions. The hypertext

information associated with each hyperlink is used to compute its priority and

then sorted (highest to lowest) to create prediction (hint) list. Both prediction and

prefetching engine are implemented in the client machine focusing on browsing

behavior of single or multiple users.

The second part of thesis discusses a prediction algorithm designed to

build Precedence Graph by analyzing the user access patterns from log files. It

considers the object URI and referrer recorded for each request to conceive

precedence relation for adding arcs between nodes to build a graph. Predictions

are then generated by analyzing the arcs in graph.

Finally, third part of thesis discusses cache replacement scheme that

plays significant role in effectively maintaining the contents of cache to achieve

good hit rate. Client-side cache is partitioned into regular and prefetch cache to

enhance the services of web caching and prefetching. LRU algorithm is used to

manage the contents of prefetch cache. Fuzzy Inference System (FIS) based

algorithm is used to manage the contents of regular cache. When web objects
vii

available in prefetch cache are frequently accessed by user, they are moved to

regular cache for its extended presence to serve user requests.

The performance of proposed approaches has been evaluated using

recall and precision metrics. Experimental results indicate the effectiveness of

proposed approaches over existing schemes.


ix

TABLE OF CONTENTS

CHAPTER No. TITLE PAGE No.

ABSTRACT v

LIST OF TABLES xiii

LIST OF FIGURES xiv

LIST OF ABBREVIATIONS xvii

1. INTRODUCTION 1

1.1 WEB PREFETCHING 3


1.1.1 Server Based Prefetching 6
1.1.2 Client Based Prefetching 7
1.1.3 Proxy Based Prefetching 7
1.2 OBJECTIVES OF THE THESIS 9
1.3 ORGANIZATION OF THE THESIS 10

2. LITERATURE SURVEY 12

2.1 INTRODUCTION 12
2.2 CONTENT BASED PREDICTION 15
2.3 ACCESS PATTERN BASED PREDICTION 18
2.3.1 Graph Models 20
2.3.1.1 Dependency Graph 21
2.3.1.2 Double Dependency Graph 22
2.3.2 Markov Models 24
2.3.3 PPM Models 26
2.3.4 Web Mining Models 26
x

CHAPTER No. TITLE PAGE No.

2.4 COMMERCIAL PRODUCTS 29


2.5 PERFORMANCE EVALUATION 30
2.6 PERFORMANCE METRICS 31
2.7 CACHE REPLACEMENT 34
2.7.1 Machine Learning Techniques 36
2.8 SUMMARY 37

3. HYPERLINK BASED WEB PREDICTION 39

3.1 INTRODUCTION 39
3.2 NAÏVE BAYES APPROACH 42
3.2.1 Prediction/Prefetch Procedure 43
3.2.2 Implementation 47
3.2.2.1 Prediction Engine 47
3.2.2.1.1 Tokenizer 47
3.2.2.1.2 User-Accessed repository 49
3.2.2.1.3 Computing priority value 51
of Hyperlinks
3.2.2.1.4 Prediction List 54
3.2.2.2 Prefetching Engine 55
3.3 FUZZY LOGIC APPROACH 56
3.3.1 Prediction/Prefetch Procedure 56
3.3.2 Implementation 60
3.3.2.1 Prediction Engine 60
3.3.2.1.1 Predicted- Unused 61
Repository
xi

CHAPTER No. TITLE PAGE No.

3.3.2.1.2 Computing priority value 62


of Hyperlinks
3.3.2.1.3 Prediction List 64
3.3.2.2 Prefetching Engine 64
3.4 EVALUATION 65
3.4.1 Experimental Results 67
3.4.1.1 Naïve Bayes Approach 68
3.4.1.2 Fuzzy Logic Approach 71
3.5 CONCLUSION 76

4. PRECEDENCE GRAPH BASED WEB PREDICTION 77


4.1 INTRODUCTION 77
4.2 PRECEDENCE GRAPH 80
4.2.1 Introduction 80
4.2.2 Building the Graph 82
4.2.3 Updating the Graph 84
4.2.4 Predictions from the Graph 86
4.2.5 Prefetching the web objects 88
4.2.6 Implementation Example 91
4.3 GRAPH TRIMMING 94
4.3.1 Invoking Trimming operation 95
4.3.2 Trimming Algorithm 98
4.4 EXPERIMENTAL ENVIRONMENT 104
4.4.1 Experimental Setup 104
4.4.2 Training Data 105
4.4.2.1 Preprocessing Log Files 107
4.5 RESULTS 107
xii

CHAPTER No. TITLE PAGE No.

4.6 CONCLUSION 113

5. CACHE REPLACEMENT SCHEME TO


ENHANCE WEB PREFETCHING 115
5.1 INTRODUCTION 115
5.2 CACHE REPLACEMENT- OVERVIEW 117
5.3 FUZZY INFERENCE SYSTEM 120
5.3.1 Membership Function 121
5.3.2 Fuzzy Rules 126
5.3.3 DeFuzzification 127
5.4 PROPOSED FRAMEWORK 129
5.4.1 Fuzzy System – Input /Output 133
5.4.2 Managing Regular Cache 135
5.5 IMPLEMENTATION 137
5.5.1 Training Data 137
5.5.2 Data Preprocessing 138
5.6 PERFORMANCE EVALUATION 142
5.6.1 Performance Metrics 142
5.6.2 Experimental Results 143
5.7 CONCLUSION 147
6. CONCLUSION 149

6.1 SUGGESTIONS FOR FUTURE WORK 151

REFERENCES 153

LIST OF PUBLICATIONS 163

CURRICULUM VITAE 164


xiii

LIST OF TABLES

TABLE No. TITLE PAGE No.

3.1 User-Accessed Repository 50

4.1 Sample user requests in a session 90

4.2 Hints generated for user requests 93

4.3 Notations used in Trimming algorithm 95

4.4 Important fields in a log file entry 106

5.1 Input Parameters to FIS 133

5.2 Symbols used with their meanings 134

5.3 Preprocessed data from the log file 140

5.4 Training data created from preprocessed file 141


xiv

LIST OF FIGURES

FIGURE No. TITLE PAGE No.

1.1 Prefetching Web pages between user requests 4

1.2 Server based Prefetching 6

1.3 Client based Prefetching 7

1.4 Proxy based Prefetching 8

2.1 Page access without Prefetching 13

2.2 Page access with Prefetching 13

2.3 Dependency Graph (window size = 2) 22

2.4 Double Dependency Graph (window size = 2) 23

3.1 Prediction and Prefetching – Naïve Bayes Approach 45

3.2 Some commonly used Stop Words 49

3.3 Prediction List for Prefetching 54

3.4 Fuzzy based Prediction and Prefetching 58

3.5 Prediction List based on Fuzzy Computations 64

3.6 Pages Predicted and Prefetched in a User Session 69

3.7 Recall in Naïve Bayes Approach 70

3.8 Precision in Naïve Bayes Approach 71

3.9 Recall in Fuzzy Logic Approach 72


xv

FIGURE No. TITLE PAGE No.

3.10 Precision in Fuzzy Logic Approach 73

3.11 Comparison of Recall in various approaches 74

3.12 Comparison of Precision in various approaches 75

4.1 Browser & Server interaction with Prediction and 79

Prefetching

4.2 Precedence Graph 84

4.3 Sample HTTP header with referer information 88

4.4 Sample HTTP response with link to be prefetched 89

4.5 Precedence Graph built using the user requests 91

4.6 Adjacency Map for the Precedence Graph 92

4.7 Precedence Graph before Trimming 101

4.8 Precedence Graph after Trimming 104

4.9 Recall for user requests with different Thresholds 109

4.10 Precision for user requests with different Thresholds 110

4.11 Number of Arcs in different Graphs 111

4.12 Number of Nodes in PG with/without Trimming 112

4.13 Number of Arcs in PG with/without Trimming 113

5.1 Framework of Fuzzy Inference System 121


xvi

FIGURE No. TITLE PAGE No.

5.2 Membership Functions for Recency 123

5.3 Membership Functions for Frequency 124

5.4 Membership Functions for Delay Time 125

5.5 Membership Functions for Object Size 125

5.6 Methods to perform Defuzzification 128

5.7 Framework for managing regular/prefetch cache 129

5.8 Workflow of caching/ prefetching system 131

5.9 Sample Log File of a client used for preprocessing 138

5.10 Hit Ratio using traces of Group-A (user 1 to 5) 144

5.11 Hit Ratio using traces of Group-B (user 6 to 10) 145

5.12 Hit Ratio using traces of Group-C (user 11 to 15) 145

5.13 Byte Hit Ratio using traces of Group-A (user 1 to 5) 146

5.14 Byte Hit Ratio using traces of Group-B (user 6 to 10) 146

5.15 Byte Hit Ratio using traces of Group-C (user 11 to 15) 147
xvii

LIST OF ABBREVIATIONS

CDN - Content Distribution Networks

DDG - Double Dependency Graph

DG - Dependency Graph

HTML - Hyper Text Markup Language

HTTP - Hyper Text Transfer Protocol

PG - Precedence Graph

PPM - Prediction by Partial Match

RG - Referrer Graph

RTT - Round Trip Time

URI - Uniform Resource Identifier

URL - Uniform Resource Locator


1

CHAPTER 1

INTRODUCTION

The enormous growth of World Wide Web and the development of

new applications and services have dramatically increased the number of users,

resulting in increased global traffic that degrades the Quality of Service

(availability, reliability, security) and latency perceived by the users. The

problem of tackling web access latency challenges the researchers even with the

availability of high bandwidth connections, fast processors and large amount of

storage space. Downloading a web object from server involves two main

components: a) time taken by request to reach the server plus the time taken by

response to reach the client (i.e. RTT – Round Trip Time) and b) object transfer

time (depends on bandwidth between client and server). Web caching provides

solution for improving the web performance by reducing user perceived latency

through usage of local storage (cache) and effective management of web objects

stored in the cache. The limitations of web caching is handled using web

prefetching technique that fetches the web documents from server even before

the user actually requests them. Web prefetching has been proposed as a

complementary mechanism to web caching.

Web request is a reference made through HTTP protocol to a web

object that is referenced by its Uniform Resource Identifier (URI). Web object is
2

a term used for all possible objects (HTML pages, images, videos etc) that can be

transferred using HTTP protocol and stored in a cache.

Locality of reference exhibited in user access patterns to the web has

three important properties (Bestavros 1997): temporal, geographical and spatial.

Web caching benefits from the temporal locality, where the objects that are

recently referenced will be accessed again in the near future. Replication

techniques (such as CDN – Content Distribution Network) benefit from

geographical locality, where the objects referenced by a user are likely to be

accessed by users in the same geographical area. Spatial locality indicates that

object closer to the currently accessed object has high probability of being

accessed in the near future. Web prefetching exploits the spatial locality property

to predict future requests.

Web applications in current scenario focus more on providing

personalized browsing experience to the users. Web latency perceived by user

when accessing web pages is represented as the time interval between issuing a

request for web page and actual display of page in the browser window. Web

prefetching is a widely used technique to reduce the user perceived latency by

exploiting the spatial locality inherent in the user’s accesses to web objects.

Prefetching mechanism implements a prediction algorithm to process variety of

user information for generating the predictions (hint list), which is used to

prefetch (download) the web objects and store it in cache before they are actually

requested by the users. The benefits of prefetching were constrained in the past
3

due to limitations in the availability of user’s bandwidth, since prefetching could

increase network traffic when its predictions are not accurate enough. In current

scenario, with vast improvement in the bandwidth availability, it provides

prefetching with new opportunities to improve the web performance at

reasonable cost.

1.1 WEB PREFETCHING

It provides effective solution to mitigate the user-perceived latency.

Prefetching can be termed as proactive caching scheme, since it caches the web

pages prior to receiving the requests for those pages from users. The

implementation of web prefetching in the basic web architecture requires two

main components: prediction engine and prefetching engine. Prediction engine

implements an algorithm to predict the user’s next request and provide these

predictions as hints to the prefetching engine. Prefetching engine on receiving the

hints decides to prefetch (download) the web objects when it has the available

bandwidth and idle time. Prefetching of web objects in advance reduces the user

perceived latency when these objects were actually requested by the users. The

prediction algorithms were used in various domains such as recommendation

systems, e-commerce, content personalization, web prefetching and cache

prevalidation. Prediction algorithms had to generate accurate predictions to avoid

performance degradation of the system, since prefetching of mispredicted web

objects waste client, server and network resources.


4

The prediction and prefetching engine can be located in any part of the

web architecture: client, proxy and server. Prefetching engine acts independently

from the prediction engine and can be placed in any element of the web

architecture to receive the hint list. The common trend is to place the prefetching

engine at client to effectively reduce user perceived latency. Commercial

products (Mozilla Firefox, Google Web Accelerator) perform client-side

prefetching to maximize its performance. The prefetched objects are stored in

cache until they are demanded by user or evicted from cache. The system needs

to predict and prefetch only the web objects that can be stored in the cache.

Prefetching algorithms should be designed carefully to avoid adverse effect it

imposes on the network architecture and its performance. Aggressive prefetching

can lower the actual cache hit rate by prefetching useless web objects and storing

it in cache by removing useful objects.

User access User access


page Pi page Pi+1
Page view time (VT)
(Browser Idle)

Prefetch ‘N’ pages, store in cache If Pi+1 in cache,


Receive ‘Prefetch hit’
Predictions

Start Prefetch

Figure 1.1 Prefetching web pages between user requests


5

Figure 1.1 represents prefetching of web pages during idle time

between the user requests. The value of ‘N’ depends on page view time (VT). If

user spends more time in a page, then more number of pages can be prefetched;

else minimal number of pages gets prefetched. A web resource will be

prefetched if it is cacheable and its retrieval is safe to the system. Web objects

that can be prefetched include documents such as jpg, gif, png, html, htm, asp,

php and pdf.

The cached or prefetched page can be reused if it is valid at the time of

access by a user. To achieve good performance, it is important to minimize the

impact of generating unwanted prefetching requests in order to serve the on-

demand web page requests from users. Web prefetching is used to enhance

several web services such as: a) accessing static web objects and dynamically

generated web objects b) web search engines and c) Content Distribution

Networks (CDN). Prefetching needs to address two important issues: a) which

page to be prefetched (prediction problem) and b) when to prefetch a page

(timing problem).

Prediction approach requires in-depth knowledge about users’

information needs as well as the contents of relevant web pages in order to

achieve high predictive performance. Prediction algorithms are categorized based

on the type of information used to generate the predictions. Based on the location

of prediction engine in the web architecture, the amount and scope of

information used by the algorithm to generate the predictions varies. Prefetching


6

mechanism classified into three types: Server-based, Client-based and Proxy-

based prefetching based on the location of prediction engine in the web

architecture.

1.1.1 Server Based Prefetching

In this mechanism, prediction engine is placed at the web server and

prefetching engine at the client as shown in Figure 1.2. Prediction engine uses

the access pattern of all the users to a specific web server for generating the

predictions. Prefetching engine receives the hints (predictions) from web server

and use it to prefetch the web objects during idle time. This approach has been

widely explored in the literature because of its prediction accuracy and its

potential use in real scenarios. Web server is able to observe every client access

and provide accurate access information when proxies are not involved during

the web access.

Request Client
Prediction
Engine Response + Hints
Hints
Prefetch
Web Server Prefetch
Response Engine

Figure 1.2 Server based Prefetching


7

1.1.2 Client Based Prefetching

It deploys both the prediction and prefetching engine at client as

shown in Figure 1.3, where the prediction engine analyzes the navigational

pattern of individual user to generate the predictions. Prefetching engine located

at the client receives the hints (predictions) and use it to prefetch web objects

during idle time. The mechanism covers the usage behavior of single or few

users across different web servers. When client based prediction model is built

using individual client access information, it provides opportunity to make

predictions that are highly personalized and thus reflect the behavior patterns of

individual user.

Client
Request
Prediction
Response Engine
Web
Server Hints
Prefetch
Prefetch
Response Engine

Figure 1.3 Client based Prefetching

1.1.3 Proxy Based Prefetching

The proxy server that sits between the web server and client holds both

prediction and prefetching engine to perform the following tasks: a) prefetch web
8

objects on its own and stores them in its cache b) provide hints to the client about

the web objects it can prefetch in its idle time. The mechanism covers different

users accessing different servers. It takes advantage of multi-user, multi-server

information to generate the predictions.

Proxy based prefetching as shown in Figure 1.4 provides advantages

such as: a) allows users of non-prefetching browsers to benefit from the server

provided prefetch hints b) ability to perform complex and precise predictions due

to availability of information from several sources (clients, servers).

Web Request Proxy Request


Client Server
Web
Prefetch / Server
Prefetch Response
Prediction
Engine + Response
Engine
Hints

Prefetch Prefetch

Figure 1.4 Proxy based Prefetching

In coordinated proxy-server prefetching mechanism, it effectively

utilizes the access information available in both proxy and server to generate the

predictions and coordinates the prefetching activity. The access information

available in proxies will serve data prefetching for group of clients sharing

common surfing interests. Access information in the web server will be used
9

only when the predictions cannot be generated using information available in

proxies. Client and proxy side prefetching provides greater geographic and IP

proximity to the client by separating caching from HTTP server and placing it

closer to the clients.

1.2 OBJECTIVES OF THE THESIS

The thesis aims to improve web prediction and prefetching to

effectively mitigate the user access latency. Contributions made in this thesis are:

· Designed methods that use hypertext associated with hyperlink to

generate the predictions. In the first method, predictions are

generated based on the computations that use Naïve-Bayes

classifier. In the second method, predictions are generated based

on the computations that use Fuzzy logic. Performance of both the

approaches compared with respect to the quality of predictions that

are generated.

· Designed method to build Precedence Graph (PG) by analyzing

the user access requests stored in log files at web server.

Predictions are generated based on the information maintained in

the graph.

· Designed cache replacement scheme to manage the client-side

cache that has been partitioned into two parts: regular cache (for

web caching) and prefetch cache (for web prefetching).


10

1.3 ORGANIZATION OF THE THESIS

This thesis is organized as follows:

Chapter 2 discusses the related work carried out in web prefetching. It

analyzes several web prediction algorithms found in the literature that generates

web predictions based on web content and user access patterns. Detailed

discussion is carried out for different techniques in each category. Cache

replacement policies that improves caching and prefetching mechanisms is also

discussed.

Chapter 3 discusses the usage of hypertext to suggest list of

hyperlinks that can be prefetched during browser idle time. It discusses two

techniques: Naïve Bayes and Fuzzy Logic for generating the predictions. In these

schemes, both the prediction and prefetching engine are located at the client

machine. The user’s browsing behavior is monitored through web browser and

based on it the predictions are generated.

Chapter 4 introduces a new prediction algorithm that uses Precedence

Graph to generate the predictions. In this scheme, prediction engine is located at

the web server and prefetching engine at client machine. It uses the access

patterns of users stored in log files at the server to build Precedence Graph that

get updated dynamically based on the user requests. Graph trimming is

periodically performed to manage the growth in graph size.


11

Chapter 5 discusses cache replacement scheme that is designed to

enhance the performance of web prefetching. The client-side cache is partitioned

into two parts: regular cache (to support web caching) and prefetch cache (to

support web prefetching). The contents of regular cache are managed using

Fuzzy Inference System (FIS) algorithm and contents of the prefetch cache

managed using LRU algorithm. Frequently accessed contents in prefetch cache

will be moved to the regular cache for effectively satisfying the user requests.

Finally, Chapter 6 presents the conclusion and directions for future

research work.
12

CHAPTER 2

LITERATURE SURVEY

2.1 INTRODUCTION

There has been significant amount of research work carried out in the

past for enhancing the performance of web prefetching. Several techniques were

designed to be used at client-side, server-side and hybrid client/server for

enhancing the delivery of web pages to the users. The browsing behavior of users

was analyzed to identify interests on specific domain for supporting services like

web personalization and prefetching. This chapter discusses various prediction

algorithms found in the literature to support web prefetching in providing

efficient service to the clients. Prediction algorithms can be implemented in

different parts of the web architecture: client, proxy and server. The algorithms

are categorized based on the type of information used to generate the predictions.

We discuss the algorithms by categorizing them into two types: a) algorithms

that make prediction by analyzing the content of recently visited web pages and

b) algorithms that predict future accesses based on the past user access patterns.

Prediction algorithms discussed in the literature used different data structure and

computation resources with variations in prediction accuracy. To maintain


13

acceptable prediction accuracy, most prefetching algorithms limit the number of

pages to be prefetched for satisfying the user requests.

Client request Client receives


page Pi Pi from server

T0 T1 T2
(T2 - T1 = page access latency)

Figure 2.1 Page access without Prefetching

Client request Client receives Pi


page Pi from cache

T0 T1 Tp T2

Prefetch Pi Pi stored in cache

Tp-T1 = page access latency (zero or negligible)


Tp < T2
Tp-T1 < T2-T1

Figure 2.2 Page access with Prefetching

Figure 2.1 and 2.2 represents the page access without and with

prefetching mechanism. As shown in Figure 2.1, when client requests page Pi it

is retrieved from server, which consumes time resulting in noticeable delay when
14

user accesses the page. In case of page access with prefetching, Pi is prefetched

in advance with the anticipation that it will be accessed in future. When user

requests Pi and if it is in cache, then it is served with zero or negligible latency. It

reflects the advantages and need for applying prefetching in the web architecture.

The main benefit of employing prefetching will be it prevents bandwidth

underutilization and reduces user perceived latency.

Web cache replacement is required to effectively utilize the storage

space in cache, which helps to satisfy large number of user requests with minimal

access latency using the web objects stored in the cache. We discuss several

cache replacement schemes proposed in the literature for improving the cache

usage to satisfy the user requests. The performance of web prefetching can be

fine tuned by using efficient replacement algorithm for managing the prefetch

cache apart from the regular cache used for web caching. Prefetching is

beneficial to the system only if the prefetched pages are really requested by users

before they become invalid or purged from cache. Otherwise, resources are

wasted on fetching unwanted pages from server that degrades the overall system

performance.

Prefetching mechanism can encounter two forms of interference: self-

interference and cross-interference, which needs to be handled effectively to get

the benefits of prefetching. Self-interference occurs when prefetching hurts its

own performance by interfering with demand requests. Cross-interference occurs

when prefetching hurts the performance of other applications on the client.


15

Cost function based prefetching approaches depend on factors such as

popularity and lifetime of web objects in deciding the objects to be prefetched.

Jiang et al (2002) suggested Prefetch by Lifetime approach in which ‘n’ objects

that have longest lifetime was selected to minimize the bandwidth requirement.

Object lifetime reflects the average time interval between consecutive updates to

the object. Prefetching algorithm should consider objects with longer lifetime,

since they are the best candidates to minimize extra bandwidth consumption that

is required to update the objects residing in cache.

Long term prefetching mechanism allows clients to subscribe the web

objects in order to increase cache hit rate and reduce user latency. Selection of

objects to be prefetched will be based on long term characteristics such as access

frequency, update intervals and size. Web servers proactively ‘push’ fresh copies

of subscribed objects into web caches whenever objects are updated.

Venkataramani et al (2002) suggested good fetch approach that attempted to

balance object popularity and object update rate to achieve good hit rate

improvement with minimal bandwidth.

2.2 CONTENT BASED PREDICTION

The content based algorithms analyze information such as: hyperlinks,

text surrounding the hyperlinks and labels with metadata information to generate

the predictions. Anchor text will be one of the major resources for getting

semantics of the target page. An automatic resource compilation system

(Chakrabarti et al 1998) performed analysis of text and links for determining the
16

web resources that are suitable for a particular topic. Davison (2000) conducted

an analysis that focused on examining the descriptive quality of web pages and

the presence of textual overlap in web pages. The text in and around the

hypertext anchors of selected web pages were used (Davison 2002) to determine

the user’s interest in accessing the web pages. Craswell et al (2001) indicated that

the anchor texts were highly useful in site finding based on the analysis of link

and content based ranking methods in finding the web sites. A framework that

used link analysis algorithm was designed (Chen et al 2002) to exploit the

explicit (hyperlinks embedded in web page) and implicit (imagined by end-users)

link structures.

The keyword-based semantic prefetching approach (Xu and Ibrahim

2004) used neural networks to predict the future requests based on semantic

preferences of past retrieved web documents. Topical locality assumes that pages

connected by links are more likely about the same topic that the user is interested

with. Pons (2006) proposed a methodology to prefetch web objects of slower

loading web pages by semantically bundling it with the faster loading web pages.

Semantic link prefetcher (Pons 2006a) was used to predict and prefetch the web

objects during the limited view time interval of web pages. A transparent and

speculative algorithm designed (Georgakis and Li 2006) for content based web

page prefetching indicate that the textual information in both the visited pages

and followed links were influential in determining the preferences of a user.


17

Eirinaki and Vazirgiannis (2005) presented a personalization algorithm

that combined usage data and link analysis techniques for ranking and

recommending the web pages to end user. Web pages of different categories

were analyzed (Chauhan and Sharma 2007) to suggest usage of cohesive and

non-cohesive text present near the anchor text for extracting information about

the target web page. Georgakis (2007) presented a client side algorithm that

learnt and predicted user requests based on user behavior profile that was built

using the user’s web surfing behavior. It used part-of-speech tagger to filter

useful user keywords. Tagging was used to identify the lexical or linguistic

category for individual words. Dutta et al (2009) proposed web page prediction

approach through linear regression that depended on the transition probability

and ranking of links in the current web page for prediction accuracy.

In chapter 3, we discuss generation of web predictions based on the

content of hypertext associated with hyperlink. Naïve Bayes and Fuzzy logic

approaches were used to generate the predictions, which proved to be effective in

reducing the access latency with less system complexity. Bayesian network

classifiers are popular machine learning algorithms that have received

considerable attention from scientists and engineers across various fields such as

medicine, military applications, forecasting, control, statistics and cognitive

science. Naive Bayes a simple Bayesian network has been applied successfully in

many domains. It has gained popularity in solving various classification

problems (Fan et al 2009).


18

In Naive Bayes all the attributes are assumed to be conditionally

independent and it ignores any correlation among them. It has been used

extensively in applications such as: email spamming, mining log files for system

management, semantic web, document ranking by text classification, and

hierarchical text categorization. Naive Bayes classifiers are very fast and they

have very low storage requirements. They are very good in domains with many

equally important features. It acts as a good dependable baseline for text

classification. Naive Bayes depends on probability estimations called as posterior

probability to assign a class to an observed pattern.

2.3 ACCESS PATTERN BASED PREDICTION

The path profiles of users stored in web logs provide useful

information for predicting the user’s future requests. Several techniques were

explored in the literature that predicted future requests based on the past

sequence of user requests. The information available in web access logs varies

depending on the format of logs and log data selections made by administrators.

Insufficient log data leads to inaccuracies in predictions. Page popularity ranking

is a log data analysis procedure that is used to determine the pages that are most

likely to be requested next by users.

The top-10 prefetching approach (Markatos and Chronaki 1998)

allowed servers to push popular documents to proxies at regular intervals based

on the client’s aggregated access profiles. Schechter et al (1998) built a sequence

prefix tree (path profile) based on the requests in server logs that used longest
19

matched most-frequent sequences to predict user’s next requests. A quantitative

model (Cooley et al 2000) based on support logic used information such as usage,

content and structure to automatically identify the interesting knowledge from

web access patterns. The n-gram model by Su et al (2000) compressed the

prediction model size that fits in to main memory with improved prediction

accuracy and moderate decrease in applicability. Web pages were clustered into

different categories based on their access patterns (Mukhopadhyay et al 2006).

Pages were categorized into levels based on their page rank and those pages at

the top levels had higher probability of being predicted and prefetched.

Fisher and Saksena (2004) designed a server-driven link prefetching

mechanism and implemented it in Mozilla web browser. It depended on the

origin server or intermediate proxy server to provide the list of web documents to

be prefetched. A coordinated proxy-server prefetching (Chen and Zhang 2005)

utilized the access information adaptively by managing prefetching at both proxy

and web servers. The access information stored in proxies served data

prefetching for clients sharing common surfing interests. Web server access

information was utilized for data objects that were not qualified for proxy based

prefetching.

The history based prefetching algorithm (Liu and Oba 2008) achieved

high prediction accuracy with limited memory by storing only the useful request

sequences and discarding those that will not yield useful predictions.

Dimopoulos et al (2010) modeled users’ navigation history and web page content
20

with weighted suffix trees to support web page usage prediction. Based on the

access time of user requests, the access sequences were partitioned into different

data blocks (Ban and Bao 2011). They used a decision method to select the

training data based on the prediction precision.

2.3.1 Graph Models

A server side prefetching approach proposed by Padmanabhan and

Mogul (1996) built Dependency Graph (DG) for representing the access patterns

of users. Predictions were generated based on the graph that was updated

dynamically by server based on client access patterns. Dependency Graph

achieved acceptable performance when it was proposed but it did not consider

the structure of current web pages (i.e. HTML object with several embedded

objects), which reduced its effectiveness in the current web. Domenech et al

(2006a, 2010) improved the web prefetching performance by designing Double

Dependency Graph (DDG) that considered the characteristics of current web

sites when generating the predictions. It differentiated the dependencies between

objects of the same page and objects of different pages.

The prediction algorithms (DG, DDG) learn user patterns from

sequence of accesses. It considers that two objects are related if they are being

requested by the same user in close time. De la Ossa et al (2007) proposed

Prediction at Prefetch mechanism that allowed the prediction algorithm (used DG

and DDG) to provide hints for both the standard object requests and prefetch

requests. A web prediction algorithm that built Referrer Graph (RG) based on
21

object URI and its referrer was designed by De la Ossa et al (2010). The graph

possessed minimal number of arcs when compared to DG and DDG algorithms.

2.3.1.1 Dependency Graph

Dependency Graph had a node for every object that had been accessed

by the user. An arc is drawn between the nodes A and B if at some point in time

the client accessed node B within ‘w’ accesses after A was accessed, where ‘w’

represents the lookahead window size. The confidence of each arc will be the

ratio of number of accesses to B within a window after A to the number of

accesses to A. It represented the confidence level of transition between nodes.

Figure 2.3 represents the Dependency Graph constructed with a lookahead

window size of 2 using the access patterns of two users. The access sequence of

user1 will be: {HTM1, IMG1, HTML2, IMG2, HTML4, IMG4} and that of user

2 will be: {HTML1, IMG1, HTML3, IMG2}.Aggressiveness of prefetching

controlled by applying a cutoff threshold parameter to the weight of arcs.

Each node is represented with its object and occurrence count. Each

arc represented with pair of values {arc count, arc confidence}, e.g. {1, 0.5}

between HTML1 and HTML3 where arc confidence computed as arc count /

source node count i.e. 1 / 2= 0.5.


22

HTML2 1, 1 IMG
1 2
1, 1 1
1, 0.5
1, 0.5 1, 1
2, 1
HTML1 HTML4
2, 1 IMG1 1, 1
2 1
2

1, 1 1, 1
1, 0.5
1, 0.5

HTML3 IMG4
1 1

Figure 2.3 Dependency Graph (window size = 2)

2.3.1.2 Double Dependency Graph

Double Dependency Graph (DDG) is similar to DG algorithm, but

distinguishes two classes of dependencies: dependencies to object of same page

and dependencies to object of another page. The graph had node for every object

that had been accessed, with an arc from node A to B if client accessed B within

‘w’ accesses to A. The arc is termed primary if A and B are objects of different

pages, i.e. either B is an HTML object or user accessed one HTML object

between A and B. The arc is termed secondary if there are no HTML accesses

between A and B. The graph had same order of complexity as that of DG, but it

distinguishes two classes of arcs.


23

HTML2 1, 1 IMG
1 2
1, 1 1
1, 0.5
1, 0.5 1, 1
2, 1
HTML1 HTML4
2, 1 IMG1 1, 1
2 1
2

1, 1 1, 1
1, 0.5
1, 0.5

HTML3 IMG4
1 1

Figure 2.4 Double Dependency Graph (window size = 2)

Figure 2.4 represents DDG constructed with a lookahead window size

of 2 using the access patterns of two users. The access sequence of user-1 will

be: {HTM1, IMG1, HTML2, IMG2, HTML4, IMG4} and that of user-2 will be:

{HTML1, IMG1, HTML3, IMG2}. Primary arcs represented with continuous

lines and secondary arcs with dashed lines. Predictions are obtained by applying

threshold to both primary and secondary arcs in graph.

Chapter 4 discusses the proposed Precedence Graph built based on the

requested object and its referrer for generating the web predictions. Precedence

Graph had less number of arcs when compared to DG and DDG, which helps it

to provide effective predictions with less memory requirement compared to other

algorithms.
24

2.3.2 Markov Models

Markov models were effectively used in web prefetching by utilizing

the information gathered from web logs. They focus on minimizing the system

latency or improving the web server efficiency. The precision of Markov models

comes from the consideration of consecutive orders of preceding pages. The goal

is to build effective user behavioral models that can be used to predict web pages

that user will most likely access in the future. The order of Markov model

indicates how many past user accesses were used to define the context in a node.

Low order Markov models lack web page prediction accuracy due to minimal

usage of pages in history and high order Markov models suffer from high state

space complexity.

A probabilistic sequence generation model (Sarukkai 2000) using

Markov chains predicted the next request based on the history of user access

requests. Markov predictors were used (Nanopoulos et al 2003) to design web

prefetching algorithm. Models based on Markov probabilistic techniques

(Davison 2004) used information from user access history and web page content

to accurately predict the user’s next request. Deshpande and Karypis (2004)

presented Markov model with reduced state complexity and improved prediction

accuracy. It applied different techniques to intelligently select the parts of

different order Markov models. Three pruning schemes (support, confidence and

error) were presented to prune the states of All-Kth order markov model.
25

Markov–Knapsack approach (Pons 2005) enhanced the web page

rendering performance by combining Multi-Markov web-application centric

prefetch model with Knapsack web object selector. Integration of semantic

information into Markov models for prediction (Mabroukeh and Ezeife 2009)

allowed low order Markov models to make intelligent accurate predictions with

less complexity than higher order models. Feng et al (2009) constructed Markov

tree using web page access patterns for effective page predictions and cache

prefetching. An integration model by Khalil et al (2009) combined clustering,

association rules and Markov models to achieve better prediction accuracy with

minimal state space complexity.

Chimphlee et al (2010) proposed an approach that combined the

strengths of Markov model, association rules and fuzzy adaptive resonance

theory to achieve higher accuracy, better coverage and overall performance while

keeping the number of computations to a minimum. Lee et al (2011) proposed

two-level prediction model (TLPM) that considered the natural hierarchical

property from web log data. TLPM could decrease the size of candidate set of

web pages and increase the prediction speed with adequate accuracy. Markov

model was used in level one to predict the categories and Bayesian model was

used in level two to predict the desired web pages in the predicted categories.

The prediction algorithms based on Markov models provide high

precision predictions but it requires intensive computation and memory

consumption.
26

2.3.3 PPM Models

Prediction-by-Partial-Matching (PPM) models were commonly used in

web prefetching for predicting the user’s next request by extracting useful

knowledge from historical user requests. Factors such as page access frequency,

prediction feedback, context length and conditional probability influence the

performance of PPM models in prefetching.

A proxy-initiated prefetching technique (Fan et al 1999) used PPM

algorithm to generate the predictions. Chen and Zhang (2003) used popularity of

URL access patterns to build a PPM model for generating accurate predictions by

efficiently managing the storage space. Ban et al (2007) implemented an online

PPM model based on non compact suffix tree that used maximum entropy

principle to improve the prefetching performance. PPM model based on

stochastic gradient descent was designed (Ban et al 2008) to describe node’s

prediction capability using a target function. It selected the node with maximum

function value to predict the next most probable page.

2.3.4 Web Mining models

Web mining applies data mining techniques to large amount of web

data for improving the web services. Mining web access sequences helps to

discover useful knowledge from web logs that can be applied to variety of

applications such as: navigation suggestion for users, customer classification and

efficient access across related web pages. Several research efforts in web usage
27

mining have focused on three main paradigms (association rules, sequential

patterns and clustering) for analyzing the web data and generating desired output.

Clustering of web user access patterns helps to build user profiles by capturing

common user interests that can be applied to applications such as web caching

and prefetching. Association rules help to optimize the organization and structure

of websites.

To capture the user’s navigational behavior patterns, a model based on

data mining was designed (Borges and Levene 1999), which used high

probability strings to represent the user’s preferred trails. A web usage mining

process (Pierrakos et al 2003) analyzed the data collection, data preprocessing

and pattern discovery mechanisms for supporting web personalization. A

prediction based proxy server was designed (Huang and Hsu 2008) to effectively

improve the hit ratios of accessed documents. It used three functional

components: log file filter, access sequence miner and prediction based buffer

manager. The log file filter removes irrelevant records from the log file and feed

the cleaned file as input to the access sequence miner. Sequence miner processes

the popular access sequences to generate rule table. Buffer manager decides

caching / prefetching or buffer size adjustment based on the buffer contents and

rule table.

An integrated approach (Pallis et al 2008) that effectively combined

caching and prefetching used web navigational graph to represent the user

requests. Its efficiency was tested using the developed simulation environment.
28

The statistical analysis and web usage mining techniques were combined

(Heydari et al 2009) to create a powerful method for evaluating the website

usage by considering the client side data. Browsing time (statistical analysis)

helps to effectively evaluate the website, and graph mining (web usage mining)

helps to discover user access patterns through complex browsing behavior.

Lee et al (2009) designed a prefetch scheme that decided the web

objects to be prefetched by considering the memory status of web cluster system.

It comprised of the following components: a) Double Prediction by Partial Match

(DPS) – adapted for the modern web framework b) Adaptive Rate Controller

(ARC) – determined prefetch rate based on the dynamic memory status and c)

Memory Aware Request Distribution (MARD) – distributed requests based on

available web processes and memory.

Ahmed et al (2011) proposed novel framework for mining high utility

web access sequences that efficiently handled both forward and backward

references. It could perform both static and incremental mining of web access

sequences. A web user clustering approach (Wan et al 2012) based on Random

Indexing (RI) was used to build user profiles for applications such as web

caching and prefetching. Random Indexing is an incremental vector space

technique that allows continuous web usage mining.


29

2.4 COMMERCIAL PRODUCTS

Several commercial products had attempted to incorporate prefetching

mechanisms in order to provide effective service to clients. In Google search, the

results may sometime include first page of the list as a hint embedded in the

HTML code. If the web browser has prefetching capabilities, then it can request

that page in advance. Packeteer SkyX Accelerator, a gateway designed to

accelerate connections in the local network used an undisclosed prefetching

method (it was discontinued in 2007).

Viking Server, a commercial product for Microsoft Windows

operating systems included a proxy with prefetching capabilities. Mozilla Firefox,

an open source web browser supports prefetching mechanism. Browsers such as

SeaMonkey, Netscape, Camino and Epiphany that are based on the Mozilla

Foundation technologies also included the prefetching capability. Google Web

Accelerator (Google 05), a free web browser extension available for Mozilla

Firefox and Microsoft Internet Explorer possessed web prefetching facility. It

prefetches hints included in the HTML body; also it prefetches all the links in

pages that are being visited if no hints are provided.

FasterFox, an open and free extension for Mozilla web browsers

(introduced in 2005) prefetches all the hyperlinks found in the current page

during browser idle time. PeakJet, a commercial product for end user, available

around 1998, included several tools to improve user access to the web. It

included web browser independent cache with prefetching capability based on


30

history or links. It could prefetch links on the current web page that were visited

by the user in the past or all links on the current web page. NetAccelerator, a

product commercialized between 1998 and 2005, prefetched all the links in the

page that are being visited and store the objects in browser cache. It could refresh

the contents of cache in order to avoid obsolete objects.

2.5 PERFORMANCE EVALUATION

The impact of web prefetching architecture in reducing the user

perceived latency was analyzed (Domenech et al 2006) to identify the best

architecture for performing prefetching and provide insight into the efficiency of

system. A cost-benefit analysis was carried out (Domenech et al 2007) to

compare the prefetching algorithms from user’s view point. A mathematical

model for web prefetching architecture (Balamash et al 2007) showed that

prefetching was profitable even with the presence of good caching system.

Performance of prediction algorithms measured using metrics that

quantify both the efficiency and efficacy of the approach. Domenech et al

(2006c) analyzed large set of key metrics used by various researchers to propose

taxonomy based on three main categories for better understanding and evaluation

of prefetching systems. The categories are: 1) Prediction related indexes 2)

Resource usage indexes and 3) End-to-End perceived latencies indexes.

Prediction indexes are used to quantify the efficiency and efficacy of prediction

algorithms. Resource indexes are used to quantify the additional cost incurred

due to prefetching. End-to-End latency indexes are used to highlight the system’s
31

performance from user’s view point. A statistical analysis was performed

(Domenech et al 2006d) to identify the situations that influence the outcome of

recall and byte recall indexes. Experimental results indicated that the user

available bandwidth and server processing time significantly influenced the

selection of appropriate index for evaluation.

Marquez et al (2008) proposed an intelligent web prefetching

mechanism that dynamically adjusted the aggressiveness of prediction algorithm

based on the system performance. To assist the proposed scheme, a traffic

estimation model was designed that used available information in the server to

accurately calculate the extra server load and network traffic generated by

prefetching. A global framework developed by Marquez et al (2008a) used

discrete-event based simulation for performance evaluation of caching and

prefetching in the web architecture. It also offered the flexibility to set prediction

and prefetching engine at any part (client, proxy, and server) of the web

architecture.

2.6 PERFORMANCE METRICS

Domenech et al (2006) identified various performance metrics for

evaluating the web prefetching techniques implemented in the web architecture.


32

Some of the metrics discussed are:

Precision (Pc)

It measures the ratio of objects that were predicted, prefetched and

then finally requested by user (prefetch hits) versus the total number of objects

that were predicted and prefetched.

Prefetch Hits
Pc =
Prefetchs

Recall (Rc)

It measures the ratio of user requested objects that were previously

predicted and prefetched.

Prefetch Hits
Rc =
User Requests
Resource Usage

Prefetching benefits are achieved at the expense of using additional

resources and it must be quantified because it can negatively impact the

performance.

Traffic Increase (∆TrB)

It quantifies the traffic increase (in bytes) due to unsuccessfully

prefetched documents. In prefetching, network traffic usually increases due to


33

two side effects: objects not used and overhead. When objects were not used, it

wastes network bandwidth because they were never requested by the user.

Objects Not UsedB + Network OverheadB+ User RequestsB


∆TrB =
User RequestsB

Object Traffic Increase (∆Trob)

It quantifies the increase in percentage of the number of documents

that clients will get when using prefetching. The index estimates the ratio of the

amount of prefetched objects never used with respect to the total user’s requests.

Objects Not Used + User Requests


∆Trob =
User Requests

Object Latency

It is obtained from the service time reported by the web server or it is

zero if the object is already in browser cache. De la Ossa et al (2010) discussed

object latency saving and page latency saving metrics.

Object latency saving is the ratio of the latency perceived using

prefetching to the latency without prefetching.

Average object latency with prefetch


ÑOL =
Average object latency no prefetch
34

Page Latency Saving

It is used as the main performance index for measuring prediction and

prefetching effectiveness in order to study the maximum benefit perceived by

web users.
Average page latency with prefetch
ÑPL =
Average page latency no prefetch

2.7 CACHE REPLACEMENT

Cache replacement plays significant role in improving the performance

of web caches, and there were several policies proposed in the literature

attempting to achieve good performance of caching mechanisms. When cache

capacity reaches its maximum limit, objects already stored in cache are purged to

store newly downloaded web objects. Decision about the objects to be purged

from cache is governed by replacement policy. The cache replacement policies

will achieve better performance when it receives stream of requests with high

popularity and less number of first-timer/one-timer requests (Benevenuto et al

2005).

Cheng and Kambayashi (2002) enhanced the functionality of web

proxy caching by integrating performance tuning techniques with content

management. Content aware replacement algorithm (LRU-SP+) provided 30%

improved caching performance over the content blind schemes. Few cache

replacement techniques depend on object grading mechanisms such as


35

popularity–rank of the object (Chen et al 2003), cost function for object retrieval

(Cao and Irani 2002), page grade (Bian and Chen 2008) to decide the

cacheability or purging of web objects from the cache. The grading mechanisms

tradeoff between Hit Ratio (HR) and Byte Hit Ratio (BHR) focusing to improve

the cache performance.

The performance of replacement policies for different document types

(applications, audio, images, text and video) was evaluated by Cañete et al

(2007) and they suggested policies that will provide good performance for each

document type. Wong (2006) suggested replacement policies for proxies with

different characteristics such as small cache, limited bandwidth and processing

power. They also analyzed policies that will be better for proxies at ISP and root

level. Romano and ElAarag (2008) considered factors such as: Frequency,

Recency and Frequency/Recency for quantitatively analyzing the performance of

cache replacement policies. The replacement algorithm chooses better victims for

eviction from the cache, when it considers several factors for making the

decision.

Geetha Krishnan et al (2011) designed a new model for client-side web

cache by fragmenting the cache into three slices: Sleep Slice (SS), Active Slice

(AS) and Trash Slice (TS). Based on the hit count, slicing was performed to

group cached pages that help in reducing the latency when they are retrieved.

One time hit pages were discriminated from other pages to ensure that hot pages

were made available when user requests them. Performance metrics such as File
36

Hit Ratio, Speedup, Delay Saving Ratio and Number of Evictions were used to

assess the performance of the model.

2.7.1 Machine Learning Techniques

In recent years, researchers have developed several intelligent

approaches (back-propagation neural network, fuzzy systems and evolutionary

algorithms) that are smart and adaptive to Web caching environment. A

nonlinear model designed by Koskela et al (2003) optimized the web cache

performance using object features such as HTTP responses of the server, access

log of the cache and HTML structure of the object. The drawbacks in the model

were: a) difficulty in collecting comprehensive data set b) computationally

intensive learning phase and c) more number of inputs to the model. Neural

network based web proxy cache replacement scheme was designed by Cobb and

ElAarag (2006, 2008) to classify web objects into cacheable or uncacheable

entities based on the frequency and recency information. A sliding window

mechanism introduced by Romano and ElAarag (2011) enhanced the cache

replacement scheme that estimated the frequency count and recency time within

the window boundary.

The use of backpropagation neural network in deciding the web

objects to be evicted from cache based on the inputs: frequency, recency, size

and delay time was discussed by Ali and Shamsuddin (2007). The effect of

artificial neural network (ANN) and particle swarm optimization (PSO) in

making decisions regarding the cacheability of web objects were studied by


37

Sulaiman et al (2008). To improve the performance of client-side web caching,

Ali and Shamsuddin (2009) proposed an approach that partitioned the client

cache into short-term and long-term cache. Short-term cache managed using

LRU algorithm and long term cache managed using neuro-fuzzy system. A

Support Vector Machine (SVM) based approach was designed by Ali et al (2011)

to predict the web object classes using frequency, recency, size and object type

as parameters. The metrics used for performance evaluation were Correct

Classification Rate (CCR), True Positive Rate (TPR), True Negative Rate (TNR)

and geometric mean (G mean).

A class based LRU (C-LRU) algorithm designed by Haverkort et al

(2003) balanced the large and small documents that existed in the cache. To

support the adaptive nature of C-LRU algorithm, Khayari et al (2009) used

neural networks to determine or recompute the optimal distribution parameters

for class boundaries and class fractions.

2.8 SUMMARY

This chapter discussed various prediction algorithms proposed in the

literature to perform Web prefetching. The algorithms were analyzed by

grouping them into two categories: a) algorithms that generate predictions based

on web content and b) algorithms that generate predictions based on user access

patterns. In chapter 3, we discuss content based web predictions that use Naïve

Bayes and Fuzzy Logic mechanisms for computing the priority values based on

which the predictions are generated. Graph based prediction model is


38

incorporated to generate web predictions by designing Precedence Graph, which

is discussed in chapter 4. Cache replacement algorithms that play a significant

role in improving the cache performance were analyzed. A new client-side

replacement scheme is explored in chapter 5, where it partitions the cache into

regular and prefetch cache for managing the objects. Regular cache managed

using FIS algorithm and prefetch cache managed using LRU algorithm.
39

CHAPTER 3

HYPERLINK BASED WEB PREDICTION

3.1 INTRODUCTION

The usage of Internet over the years has increased tremendously and

users are leveraging its benefits to access variety of services provided over the

network. Due to massive growth of Internet, network load and access time have

increased dramatically causing substantial delay in providing services to the user.

The user perceived latency when accessing the web pages is affected by the

following factors: a) bandwidth availability b) request processing time at the

server c) round trip time and d) object size. Implementing the caches either

remotely (in web server or proxy server) or locally (in browser’s cache or local

proxy server) significantly reduces the access latency. The usage of cache can be

improved by applying web prefetching mechanism that acquires web contents

with the anticipation that these contents will be requested by users in the near

future. Web prefetching exploits the spatial locality exhibited by users when

accessing the web objects.

Prefetching mechanism requires predicting the list of web objects to be

prefetched from server for satisfying the user requests. Web predictions can be

generated by analyzing information such as user access patterns, contents of web


40

pages and object popularity depending on the location (server, proxy or client) of

its implementation. The client decides to prefetch web objects based on the

following factors (Mogul 1998): a) object availability in cache and its current

timestamp b) idleness of user for more than the threshold interval c) network

bandwidth d) object size and user preferences. A personalized web prefetching

system implemented by Zhang et al (2003) generated set of URLs that the user

will visit next using history-based and content-based predictors.

Semantic prefetching strategies analyze contents of web page or its

metadata for predicting the probable pages that will be requested in future. These

strategies consider either hyperlinks contained in the web pages or keywords

attached to or extracted from the web pages as input for generating the

predictions. They should have provision of being enabled or disabled when user

enters or exits the web services, since they are domain specific and focus on

particular topic of interest. In our work, the focus is on generating web

predictions at client machine using the information associated with hyperlinks

that are used to access web objects across different web pages. When a web page

has higher usability, then prefetching it will improve the system performance.

User's navigation in a web site is influenced not only by their own

interests on a topic, but also the structure of web pages. For example, if user is

currently viewing page Pi then there is high probability of using the links

available in that page to visit page Pi+1. User navigates through pages by clicking
41

links based on the text anchored around them. It is assumed that there is a

relationship between the textual content of web pages and the user’s interests.

Web user’s browsing behavior was often guided by keywords in

hypertext of URL that refers a web object. Hyperlink (URL) represents relation

between two different web pages or two parts of the same web page. Hypertext

provides descriptive or contextual information to users about the contents

referred by hyperlink. This chapter discusses generation of web predictions based

on hypertext associated with hyperlinks by designing two approaches: a) Naïve-

Bayes and b) Fuzzy Logic approach. They are responsible for computing the

priority value of hyperlinks, which is used to decide the hyperlinks to be included

in the prediction list. Client is responsible for performing web prediction and

prefetching, where it prefetch the objects during browser idle time based on the

generated predictions. Predictions are generated dynamically for each new web

page visited by the user based on the information maintained in the repositories

that gets updated frequently based on the user access patterns.

Web objects embedded in main html pages are requested automatically

when user requests the main page. Requests for embedded objects are separated

from regular user requests. To avoid interference between the prefetch and

demand requests, any spare resources available on servers and network should be

utilized effectively by the web prefetching system (Kokku et al 2003).

Prefetching hit ratio and bandwidth overhead are the most popular parameters

used for evaluating the performance of prefetching system.


42

Based on the behavior of user when navigating the website, browser

idle time will vary; if user navigates too fast between the pages then web browser

will not have enough time to prefetch all the hints. It occurs even if the prediction

algorithm provides accurate hints to reach upper bound in latency savings. To

overcome this situation, it is important to provide the good hints in order so that

user can prefetch them to maximize latency savings.

3.2 NAÏVE BAYES APPROACH

The process of computing priority value of hyperlink involves

applying Naïve Bayes Classifier to find probability of each token in the hypertext

that gets added to finally produce the priority value of link. The use of Naïve

Bayes Classifier for computing the priority value are due to the fact that they are

fast and accurate, simple to implement and has the ability to dynamically learn

user access patterns. It can outperform more sophisticated classification methods

with its performance. Text classifier an automated means to determine metadata

of a document is applied to various issues such as spam filtering, category

suggestion for document indexing, sorting help desk requests.

The proposed approach generates predictions based on the following

steps:

a) Extract hyperlinks from a web page displayed in the browser

b) Select tokens (keywords) from each hypertext associated with the

hyperlink.
43

c) Compute probability for each token in the hypertext, which is

combined to generate priority value of hyperlink. Probability of

each token computed using Naïve Bayes Classifier.

d) Priority value used to decide the hyperlinks that will be added to

the prediction list (hints).

e) Prefetch web objects using hyperlinks in the prediction list.

User requests a web page by typing URL in the web browser or

clicking hyperlinks in the web page. When the cache contents are used to satisfy

the user requests, it significantly reduces the user access latency.

When user visits a web page and spends some time either reading or

exploring some valuable information, then the textual content of that page

reveals ‘region of interest’ that matches with the user’s interest. When users are

not visiting web pages according to a pattern, then it indicates that they are

randomly exploring the pages not looking for particular information.

3.2.1 Prediction/Prefetch Procedure

It is responsible for generating the web predictions that is used to

prefetch web objects for satisfying the user requests with minimal latency. It uses

client-side prefetching mechanism where the client directly prefetch web objects

from server and stores them in its local cache (prefetch cache) to serve the user

requests. Both prediction and prefetching engine are located in the client machine,

and the prediction engine uses components such as Tokenizer and Token
44

Repository for computing the priority value of hyperlinks. An on-demand request

is generated to the server only if the requested web page is not available in cache

or it has become invalid in the cache.

When user visits a web page, hyperlinks in that page forms a pool of

URL’s from which user selects a hyperlink that suits his/her interest to visit the

next page. The aim of proposed approach is to efficiently identify set of

hyperlinks in a web page that reflects user’s interest to create a prediction list that

is used for prefetching the web objects during browser idle time.

The process of collecting hyperlinks from a web page to create

prediction list for prefetching the web objects is shown in Figure 3.1. Procedural

steps for performing the prediction and prefetching (as in Figure 3.1) are

explained as follows:

1. User initially requests a new web page by typing its URL in the

web browser.

2. The requested web page is retrieved from server and displayed to

the user.

3. Process the displayed web page to extract hyperlinks and their

associated hypertexts for evaluation.

4. Each hypertext is processed to extract set of tokens, where ‘token’

represents meaningful word in the hypertext.


45

Web Browser
1
User enters the URL 5

2 User-Accessed
Repository
Web page displayed 6

3 Token Count

Hyperlinks 7
1. . . . . . . .
2. . . . . . . . Naïve Bayes Hyperlink
3. . . . . . . . Classifier Priority

4 8

Hypertext to Tokens Prediction List


1. . . . . . . .
2. . . . . . . .
3. . . . . . . .

10 9
Prefetch Cache

11

12
Display web page Internet

Figure 3.1 Prediction and Prefetching – Naïve Bayes Approach


46

5. Tokens of a hypertext are added to the user-accessed repository

when user visits a new web page by clicking particular hyperlink.

6. Update the token count if it already exists in the repository or else

add token as new entry in the repository.

7. Probability value for each token of hypertext computed using the

token count information maintained in the user-accessed repository.

Priority value for each hyperlink computed by combining the

probability values of its tokens. It uses Naïve Bayes to perform the

computation.

8. Hyperlinks are arranged based on its priority value to create a

prediction list (hints).

9. Web objects are prefetched during browser idle time using the

hyperlinks in prediction list and stored in prefetch cache.

10. When user requests new web page by either typing its URL in the

browser or clicking hyperlink in a page, the request is verified in

local cache (regular / prefetch cache) for its availability.

11. When the requested information is available in local cache, web

page is displayed with minimal latency.

12. The web page will be retrieved from server and displayed to the

user, when it is not available in local cache.


47

3.2.2 Implementation

The prediction and prefetching activity both occurs at the client

machine with prediction engine responsible for generating the predictions and

prefetching engine responsible for retrieving the web objects and storing it in

prefetch cache. A prediction is considered useless in reducing the user perceived

latency when the predicted object has already been demand requested by the user

and is waiting in the browser queue for connection to the web server.

3.2.2.1 Prediction Engine

It is responsible for computing the priority value of each hyperlink by

applying Naïve Bayes classifier on the set of tokens associated with each

hypertext. The advantages of using Naïve Bayes Classifier (Rish 2001) for

computing the probability value of tokens are: a) simple mechanism to compute

value for the specified data b) requires minimal storage, since it maintains only

the token count and c) performs incremental update whenever new data is

processed.

3.2.2.1.1 Tokenizer

When user visits a new web page, Tokenizer parses that page to extract

hyperlinks and its associated hypertexts. Each hypertext is analyzed to identify

meaningful keywords that act as tokens of that text. Hypertext refers to the text

that surrounds hyperlink definitions (hrefs) in web pages (Xu and Ibrahim 2004).
48

Hyperlink is contained in the source web page, and it is represented as:

<A HREF="http://www.ircache.net/"> Web Cache Information Site </A>


It refers the target page to be visited. Link’s anchor appears on the source page
with underlined text:
Web Cache Information Site
When user selects the anchor, the contents of target page will be displayed on the
browser.

In our approach, text between the tags <a> and </a> is used to

compute priority value of hyperlinks. When user clicks a hyperlink to visit new

web page, tokens of its hypertext are stored in user-accessed repository. When a

token has new entry in the user-accessed repository, it will have initial count

value of 1. For the tokens that exist already in the repository, its count value will

be incremented.

The Tokenizer analyzes each hypertext to remove the stop words,

since they do not provide any meaningful information to be considered for

computing the priority value of hyperlinks. To remove the stop words, tokens of

hypertext are compared with a database that contains commonly occurring stop

words such as the one shown in Figure 3.2. After removing the stop words,

tokens are further subjected to stemming. It is the process of converting the

words from its inflectional form or derivationally related form to their common

base form. Factors to be considered when stemming the words are: a) Different

words with same base meaning converted to same form and b) Words with

distinct meanings are kept separate. Porter stemming (Porter 1980) algorithm is
49

used to perform stemming operation, which is a simple utility to reduce the

english words to their word stems (without ‘ing’, ‘ings’, ‘s’).

able, about, above, again, after, and, any, back, be, been, before,
below, but, by, came, can, can't, did, do, each, edu, eg, even, ever, far,
for, few, get, go, gone, got, has, have, her, here, how, if, in, is, isn't,
keep, kept, last, less, little, like, let's, make, may, many, miss, more,
my, name, new, next, not, now, of, often, one once, only, over, plus,
per, please, quite, right, round, saw, say, seen, sent, shall, since, still,
take, than, that, this, there, thing, twice, two, use, us, via, want, was,
we way, when, where, who, why, yet, you, yours, zero

Figure 3.2 Some commonly used Stop Words

The removal of stop words helps to consider only meaningful words as

tokens and store it in user-accessed repository. Stemming minimizes the number

of unique entries for tokens in the repository.

3.2.2.1.2 User-Accessed Repository

It stores tokens of hypertexts associated with hyperlinks that are used

to visit the new web pages. Each token is stored with an initial count of 1, which

gets incremented when same token is added to the repository from hypertexts of

used hyperlinks. The tokens stored in the repository exhibits user’s browsing

interests and it is used for computing the priority value of hyperlinks. Repository

information reflects user and session characteristics, where session represents the
50

time interval between the start and end of user’s browsing instance. During a

browsing session, user clicks several hyperlinks to visit web pages of his interest.

Hypertexts associated with used hyperlinks are stored in the repository as

independent tokens (keywords). When user has long browsing session and surfs

the web focusing on same topic, then there will be saturation in identifying new

keywords to be added to the repository.

Table 3.1 User-Accessed Repository

a) Without Stemming b) With Stemming

Tokens Count Tokens Count


Academics 4 Academ 9
Academic 5
Academician 2 Academician 2
Admissions 2
Admission 1 Admiss 3
Applied 5
Appli 12
Apply 7
Computer 4 Comput 11
Computing 3
Computers 4 Cours 5
Courses 2
Course 3 Deadlin 2
Deadlines 1
Deadline 1 Engin 10
Engineering 6
Engineer 4 graduat 11
Graduate 2
Graduates 4 requir 6
Graduating 5
requirements 2
requirement 2
require 2
51

Table 3.1 represents the sample user-accessed repository (with and

without stemming) used for computing the priority value of hyperlinks. Each

entry in the repository contains token with its occurrence count. When stemming

is not applied, then words that are closely related to each other will have separate

entry in the repository with its occurrence count. It leads to the following

problems: a) increases the number of entries in the repository b) occurrence

count of related words spread across multiple entries. When stemming is applied,

related words are confined to a single base form that reduces the number of

entries in repository. It also improves the occurrence count, since the count

values of related words are combined into a single value.

When user performs browsing without focusing on specific topic of

interest, then most of the keywords generated would be trivial and cannot be used

for generating the predictions. The repository is of fixed size and new tokens are

added into it by eliminating old tokens when the repository size reaches its

maximum limit. Repository size should be carefully selected to avoid elimination

of legitimate tokens and to prevent trivial tokens from occupying the space for

longer time.

3.2.2.1.3 Computing Priority Value of Hyperlinks

Hypertext associated with each hyperlink present in a web page is

taken and its tokens are compared with tokens stored in user-accessed repository

to compute probability value of each token. The priority value of hyperlink is


52

obtained by multiplying the probability value of its tokens. Hyperlinks are then

sorted based on its priority value to create prediction list.

The priority value of each hyperlink is computed by applying Naïve

Bayes classification (Pop 2006) formula shown in Equation (3.1).

Pr (A|U) · Pr (U)
Pr (U|A) = (3.1)
Pr (A)

U = User-accessed Repository

A = Hypertext associated with Hyperlink

Pr (U | A) = probability that an hypertext is in user-accessed

repository

Pr (A | U) = probability that for given user-accessed repository the

tokens of a hypertext appears in that repository

Pr (U) = probability of user-accessed repository

Pr (A) = probability of occurrence of a particular hypertext

The value of Pr (U) will be 1, since it is the only repository used for

computation. The value of Pr (A) that acts as a scaling factor for Pr (U | A) will

be constant and is omitted during computation. Based on these factors, the

Equation (3.1) is simplified into Equation (3.2) as:

Pr (U|A) = Pr (A|U) · 1 (3.2)


53

Steps for computing the probability Pr (A | U) are:

1. Hypertext of each link represented as set of tokens

Hypertext = {T1, T2, T3 …… Tm}, T1to m = Tokens

2. Compute probability value of each token using Equation (3.3).

Count of Ti in U
Pr (Ti | U) = (3.3)
Total count of Tokens in U

where i = 1 to m

3. Compute probability value of hypertext using Equation (3.4) by

performing product of individual token probabilities.

m
Pr (A|U) = Π C + Pr (Ti | U) (3.4)
i=1

The probability value Pr (A | U) of hypertext will be the priority value

of its hyperlink. While computing probability Pr (A|U), a constant value ‘C’ (C

=1) is added to each token probability value irrespective of whether the token is

available or not in the user-accessed repository. When a token is not available in

the user-accessed repository, then its probability value will be zero. Reason for

adding ‘C’ to each token probability is to achieve the following two conditions:

1) probability value of hypertext should not be less than the individual token

probability 2) probability value of hypertext should not be zero due to absence of

few tokens in the user-accessed repository. When none of the tokens of a

hypertext is present in the user-accessed repository, then its probability value will

be 1 due to addition of ‘C’ to each token probability. The probability value of


54

hypertext will be greater than 1, if either few tokens or all the tokens of hypertext

are present in the user-accessed repository. Based on these factors, hyperlinks

having priority value greater than 1 are considered for inclusion in the prediction

list.

3.2.2.1.4 Prediction List

For each web page, based on the computed priority value of hyperlinks,

prediction list is created by including hyperlinks with good priority value. The

prediction list is implemented as a priority queue that always maintains high

priority links at the top of queue. Prefetching engine takes links from the top of

queue to prefetch web objects during browser’s idle time.

Hypertext
Hyperlink Prediction Engine (Computes Priority)

Hyperlink, Priority
Prediction List
Link1 Link 2 Link 3 Link 4 Link 5 Link 6 Link 7

1.786 1.783 1.69 1.6 1.567 1.54 1.34

Prefetch Engine

Figure 3.3 Prediction List for Prefetching

When user navigates to new web page, prediction list will be cleared

and filled with new set of hyperlinks based on the new web page. It helps to
55

eliminate prefetching of irrelevant links during user browsing session. Figure 3.3

represents the process of adding hyperlinks to the prediction list (maintained as

priority queue) that is used for prefetching the web objects.

3.2.2.2 Prefetching Engine

It is responsible for retrieving the web objects from server in advance

and store it in prefetch cache maintained at the client machine to serve user

requests with minimal latency. Web objects are prefetched using the hyperlinks

taken from the prediction list. It carries out prefetching only during browser’s

idle time. Prefetch requests are given low priority than regular user requests, so

whenever user makes a request the prefetching engine suspends ongoing

prefetching activity. The number of links that can be prefetched will vary

depending on the amount of time a user spends on each web page during its

browsing session. If user spends more time on a page, then more links can be

prefetched.

To eliminate the impact of caching due to temporal locality exhibited

in user access patterns, the client maintains prefetch cache separately from

browser’s in-built cache. When new web objects need to be stored in prefetch

cache and if it is full, then it selects objects not accessed for a long time to be

purged from cache to make space for storing newly downloaded objects.

Prefetching an object that expires very frequently or changes every time it is

requested will be useless and wastes resources.


56

3.3 FUZZY LOGIC APPROACH

Fuzzy Logic has been used over the years in several domains such as

expert systems, data mining and pattern recognition. It deals with fuzzy sets

(Zadeh 1965) that allow partial membership in a set represented by its degree of

relevance. Fuzzy Logic is capable of handling approximate or vague notions that

exist in several information retrieval (IR) tasks (Chris Tseng 2007) and helps to

establish meaningful and useful relationships among objects.

This approach also uses the information associated with hyperlinks in

a web page to predict web objects to be prefetched for satisfying user’s future

requests. The priority value of hyperlinks is computed by applying fuzzy logic

over the set of tokens associated with each hypertext and use it to generate the

prediction list. It uses information stored in two repositories: user-accessed and

predicted-unused for computing the priority value of hyperlinks. The use of

predicted-unused repository is to filter out hyperlinks that are of less or no

interest to the users.

3.3.1 Prediction/Prefetch Procedure

The prediction and prefetching engine are both implemented at the

client machine to perform prefetching activity that helps to minimize user

perceived latency. Prediction engine is efficiently designed to identify relevant

set of hyperlinks in a web page that reflects user interest. Prefetching engine uses
57

the prediction list to prefetch web objects and store it in prefetch cache before the

user demand requests those objects.

Figure 3.4 represents the process of applying fuzzy logic to decide the

set of hyperlinks in the prediction list, which will be used by prefetching engine

to download the web objects.

The procedural steps shown in Figure 3.4 are explained as follows:

1. User initially requests a web page by typing its URL in the web

browser.

2. The requested web page is displayed on the browser after its

contents is downloaded from web server.

3. Extract all the hyperlinks and its associated hypertexts from the

displayed web page for computing priority value of hyperlinks.

4. Process each hypertext to extract set of tokens, where ‘token’

represents meaningful word in the hypertext.

5. When user visits new web page by clicking hyperlink in the current

page, then tokens of that hypertext are added to user-accessed

repository.

6. Update the token count if it already exists in the repository or else

add token as new entry in the repository.


58

Web Browser
1
User enters the URL

2 5
User-Accessed Predicted -Unused
Display Web page Repository Repository
3 6 14
Extract Hyperlinks Token Count Token Count
1. . . . . . . .
2. . . . . . . . 7
3. . . . . . . .
13
4 Fuzzy Compute Priority value
Logic
Convert Hypertext
8
to Tokens
Prediction List
1. . . . . . . .
2. . . . . . . .
3. . . . . . . .

10 9
Prefetch Cache

11

12
Display web page Internet

Figure 3.4 Fuzzy based Prediction and Prefetching


59

7. Compute priority value of each hyperlink by applying fuzzy logic

over the set of tokens associated with each hypertext with reference

to user-accessed and predicted-unused repositories. Initially

predicted-unused repository will be empty and then it gets tokens

once prediction activity is started.

8. Based on the computed priority value, hyperlinks are selected to

form prediction list. Hyperlinks with high priority value remains at

the top of list.

9. Prefetch engine uses hyperlinks from the prediction list to

download web objects from server and store it in prefetch cache.

10. When user wish to visit new web page by either clicking hyperlink

in the current page or typing URL in the web browser, contents of

prefetch cache verified to find if it can satisfy the user request.

11. When prefetch cache is able to satisfy user request, then contents of

page gets displayed in the web browser with minimal latency.

12. When the requested contents not available in prefetch cache, then it

gets retrieved from web server and displayed to the user.

13. Tokens of hyperlinks in the prediction list that are not used by

users are moved to predicted-unused repository.


60

14. Count value of tokens gets incremented in predicted-unused

repository whenever tokens of unused hyperlinks are moved to the

repository. When user visits new web page, prediction list will be

cleared and then populated with new set of hyperlinks to support

prefetching activity.

3.3.2 Implementation

Similar to the Naïve Bayes approach, both prediction and prefetching

engine are implemented in the client machine. In fuzzy logic approach, two

repositories (user-accessed and predicted-unused) are used for computing the

priority of hyperlinks.

3.3.2.1 Prediction Engine

It computes priority value of hyperlinks by applying fuzzy logic over

the set of tokens related to hyperlinks. The role of Tokenizer and user-accessed

repository used in this approach are similar to that of Naïve Bayes Approach.

Tokenizer is responsible for parsing the web page to extract hyperlinks along

with its hypertexts, which is analyzed to form set of tokens from each hypertext

and stored in user-accessed repository when hyperlinks are used by users. User-

accessed repository is used to store tokens of hypertexts associated with

hyperlinks used by users to visit web pages. Additional repository called as

predicted-unused repository is used to filter out unwanted predictions.


61

3.3.2.1.1 Predicted- Unused Repository

The tokens of hypertexts associated with hyperlinks that are predicted

but not used are stored in this repository. It provides feedback to prediction

engine for tuning the generation of predictions from web pages. The repository

provides two main benefits: a) minimizes the influence of tokens in user-

accessed repository that are of less or no interest to the user when computing

priority value of hyperlinks b) minimizes number of predictions generated for

each web page. When ‘n’ number of predictions is generated for each web page,

only one in the prediction list may match with the hyperlink used by user to visit

the next page. The tokens of hyperlinks in prediction list that do not match with

user interests are stored in this repository. Tokens of hyperlink are subjected to

stop word removal and stemming before they are stored in this repository.

Steps for adding tokens to predicted-unused repository are:

1. Prediction engine recommends set of hyperlinks (predictions) for

each web page based on the computed priority values.

2. From the recommended set of hyperlinks, tokens are collected

and stored in a temporary buffer.

3. Check if hyperlink used by the user to visit next page matches

with any hyperlink in the prediction list. If there is no match, go

to step 5 else proceed to the next step.


62

4. The tokens of matched hyperlink available in the temporary

buffer are moved to user-accessed repository.

5. Tokens of unmatched hyperlinks available in the temporary

buffer are moved to predicted-unused repository. To create new

prediction list for the next web page, go to step 1.

3.3.2.1.2 Computing Priority Value of Hyperlinks

Each hypertext is represented as set of tokens {T1, T2, T3 . . . Tn} for

computing its priority value. Fuzzy logic is applied over the set of tokens by

associating it with fuzzy set (i.e. repository storing the tokens). The tokens are

related to fuzzy set with similarity degree in the range 0 to 1.

Let R1 represents the user-accessed repository and R2 represents the

predicted-unused repository. Membership value of each token (Ti) relative to

repository R1 is computed by dividing the token count in repository R1 with sum

of token count in both the repositories R1 and R2 as shown in Equation (3.5).

(TCi)R1
µR1(Ti ) = (3.5)
(TCi)R1 + (TCi)R2

Membership value of Ti relative to repository R2 is computed as shown


in Equation (3.6).

µR2(Ti) = 1 - µR1(Ti) [i = 1 to n | n = number of tokens] (3.6)


63

µR1(Ti) = Membership of Ti relative to repository R1

µR2(Ti) = Membership of Ti relative to repository R2

(TCi)R1 = Count of Ti in repository R1

(TCi)R2 = Count of Ti in repository R2

Membership value of Ti will be 1, if token is present only in a single

repository (i.e. R1 or R2). After computing the membership value of Ti relative to

repositories R1 and R2, they are compared to decide whether to include Ti for

computing the priority value of hyperlink. The Token Acceptance (TAi) of Ti will

be set to 1 if µR1 (Ti) > µR2 (Ti) i.e. Membership value of T i relative to R1 is

greater than that in R2; else TAi set to 0.

For Ti with its TAi set to 1, the token popularity (TPi) in repository R1

is computed by dividing its token count with maximum token count value in R1

as shown in Equation (3.7).

(TCi)R1
TPi = (3.7)
max [(TC)R1]

For Ti with its TAi =0, TPi will be set to zero.

The priority value (PV) of hyperlink is computed as shown in Equation


(3.8), where ‘n’ indicates the number of tokens in hypertext.

n
1 S TPi (3.8)
PV =
n i=1
64

3.3.2.1.3 Prediction List

It is implemented as a priority queue similar to that in Naïve Bayes

approach to maintain high priority links at the top of queue. The hyperlinks are

sorted based on the computed priority value that falls within the range of 0 to 1.

Figure 3.5 represents the prediction list that arranges hyperlinks based on the

computed priority values.

Hypertext
Hyperlink Prediction Engine (Computes Priority)
Hyperlink, Priority
Prediction List
Link1 Link 2 Link 3 Link 4 Link 5 Link 6 Link 7

0.98 0.93 0.86 0.85 0.75 0.71 0.65

Prefetching Engine

Figure 3.5 Prediction List based on Fuzzy Computations

3.3.2.2 Prefetching Engine

The hyperlinks in the prediction list are used by prefetching engine to

download web objects during browser idle time that helps to avoid interference

with regular user requests. Prefetching engine will not download all the predicted

web objects due to the following factors:


65

· Lack of idle time due to faster navigation between web pages by

users

· Few predicted objects may already exist in regular cache

· Few predicted objects may already be demand requested by users.

Prediction list should have hyperlinks that reflects user interests, else

prefetching of objects will waste user and server resources leading to

performance degradation. Web objects prefetched are stored in prefetch cache

and managed using LRU algorithm. If a demand requested web object resides in

prefetch cache, then it is moved to regular cache. A web object will not reside in

both the caches (regular and prefetch cache) at the same time.

3.4 EVALUATION

Trace based simulations cannot be used effectively for evaluating the

performance of Naïve Bayes and Fuzzy Logic approaches, since web traces do

not provide comprehensive view of client’s browsing interests. Effective way to

gather the required information is to capture user’s interest at client side by

analyzing browsing pattern in each session. User visits new web page either by

typing its URL in web browser or clicking hyperlink in a web page. In the

proposed approaches, user interests are obtained by tracking information

associated with hyperlinks used to visit web pages in browsing sessions.

Evaluation is carried out by performing web browsing that focuses on

information related to user’s interest. It uses open source browser (CxBrowser)


66

developed in c# language for performing the browsing sessions. Both the

prediction and prefetching engine implemented as add-on to the web browser

that allows user to configure prefetch settings based on the requirements. Each

browsing session has active and idle periods based on the access pattern of user.

Active period represents the phase where web objects are demand requested by

user and idle period represents the phase where the displayed web objects are

viewed by user. The retrieval of main html file and its embedded objects initiates

the active period. Idle period is used by prefetching engine to download web

objects using the hyperlinks in prediction list and store them in prefetch cache. A

log file is used to record user requests during browsing sessions that is used to

analyze the performance of proposed approaches.

Number of links prefetched depends on the amount of time a user

spends reading the displayed web page and it is considered an important attribute

in predicting user’s interests (Liang and Lai 2002; Gunduz and Ozsu 2003; Guo

et al 2007). The user navigation time was partitioned into four discrete intervals

(Xing and Shen 2004): passing, simple viewing, normal viewing and preferred

viewing. If user spends more time reading a web page (preferred viewing), it

increases browser idle time allowing many hyperlinks in the prediction list to be

prefetched. When user visits a web page that contains content irrelevant to his

interest, then prediction list for that page will have either zero or less number of

hyperlinks. When user demand requests a web object, its availability is first

checked in the local cache (regular or prefetch cache) before forwarding the

request to proxy or web server.


67

When user initially starts a browsing session, the repositories (user-

accessed and predicted-unused) remain empty and they cannot be used

immediately to compute the priority value of hyperlinks. User-accessed

repository will receive tokens once user starts using hyperlinks present in web

pages to visit new pages. Predicted-unused repository will receive tokens of

hyperlinks that are predicted but not utilized to prefetch web objects or does not

match with user’s browsing interest.

3.4.1 Experimental Results

The performance of proposed approaches (Naïve Bayes and Fuzzy

Logic) depends on the browsing interest of individual users and hence the results

from different users are incomparable. Results are obtained by analyzing the

browsing sessions carried out for a period of six weeks. To establish user access

patterns systematically, a topic of interest is specified to guide the user’s daily

browsing behaviors. The metrics used for evaluation are: Recall (hit rate) and

Precision (accuracy). Recall (Rc) shown in Equation (3.9) indicates the

percentage of user requests served using the contents of prefetch cache against

the total number of user requests. When recall is high, it effectively minimizes

the user perceived latency since most of the user requests are served from local

cache.

Prefetch Hits
Rc = (3.9)
Total User Requests
68

Precision (Pc) shown in Equation (3.10) indicates the percentage of

prefetched web pages requested by users against the total number of web pages

prefetched. It reflects the effectiveness in generating useful predictions during

browsing sessions.

Prefetch Hits
Pc = (3.10)
Total Prefetchs

3.4.1.1 Naïve Bayes Approach

The hyperlinks to be prefetched are predicted based on the priority

value computed using Naïve Bayes formula. Since prefetch operation is carried

out only during browser idle time, the number of hyperlinks prefetched will vary

dynamically depending on the user’s access pattern. Figure 3.6 represents the

hyperlinks that are predicted and prefetched for pages accessed by users (e.g.

user-A, user-B) in a browsing session. Initially few pages in the session have

least number of hyperlinks predicted, because the user-accessed repository

remains empty and it cannot be used to make predictions. When user starts

visiting new pages using hyperlinks in a session, then user-accessed repository is

filled with tokens that improve the predictions for the pages. When user visits the

pages quickly, then only few objects are predicted and prefetched. In some

instances, user spends more time in a particular page providing an opportunity to

predict and prefetch large number of hyperlinks.


69

User -A
20
18
16
14
No. of 12
Predicted
Pages 10
Prefetched
8
6
4
2
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Pages accessed in a session

User-B
35

30

25
No. of
Pages 20 Predicted
15 Prefetched

10

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Pages accessed in a session

Figure 3.6 Pages Predicted and Prefetched in a User Session


70

0.8

Recall
0.6 User-A
0.4 User-B

0.2

0
10 20 30 40 50 60 70
User access session
(in minutes)

Figure 3.7 Recall in Naïve Bayes Approach

Figure 3.7 represents the Recall (R c) achieved during different

browsing sessions based on the users’ access patterns. The percentage of Recall

varies between the users’ (e.g. user-A, user-B), due to the fact that the

prediction/prefetch activity depends on the individual browsing behavior during

each session. As shown in the graph, when user has long browsing sessions the

repositories will contain large number of tokens that helps to predict useful

hyperlinks to improve the performance. The number of links prefetched is based

on the browser idle time, and it varies depending on the user’s navigation pattern

across pages.
71

0.8

Precision
0.6 User-A
0.4 User-B

0.2

0
10 20 30 40 50 60 70
User access session
(in minutes)

Figure 3.8 Precision in Naïve Bayes Approach

Figure 3.8 represents the Precision (Pc) achieved during different

browsing sessions based on the access patterns of users. When user-accessed

repository collects tokens that effectively reflect the user interest, then the

prefetched pages will be useful to the user. During browsing sessions, users

searched for information related to a specific topic that helped to utilize the

repository effectively for generating the predictions. If user randomly access

pages not looking for specific information, then usage of prefetched pages will

reduce drastically leading to poor performance.

3.4.1.2 Fuzzy Logic Approach

The performance of fuzzy logic approach is also analyzed by

considering the metrics: Recall and Precision. This approach used two

repositories: user-accessed and predicted-unused to guide the prediction engine


72

in generating the predictions. Predicted-unused repository is used to provide

feedback to the system regarding the unused hyperlinks that helps to fine tune the

links suggested to be prefetched.

0.8

0.6
Recall

User-A
0.4 User-B

0.2

0
10 20 30 40 50 60 70
User access session
(in minutes)

Figure 3.9 Recall in Fuzzy Logic Approach

Figure 3.9 indicates the Recall (Rc) achieved during the browsing

sessions for fuzzy logic approach. As indicated in the graph, when user has long

browsing session the percentage of recall improves and helps to reduce the user

access latency. Since links are suggested based on the inference from both the

repositories (user-accessed and predicted-unused), majority of links prefetched is

useful in satisfying the user requests.


73

0.8

Precision
0.6 User-A
0.4 User-B

0.2

0
10 20 30 40 50 60 70
User access session
(in minutes)

Figure 3.10 Precision in Fuzzy Logic Approach

Figure 3.10 indicates the Precision (Pc) achieved during the browsing

sessions for fuzzy logic approach. The use of predicted-unused repository helps

to identify tokens of hypertext that are of less interest to users and eliminate them

from being used for computing the predictions. As a result, the hyperlinks

predicted and prefetched will closely match the user interests thus minimizing

the number of prefetched links left unused by users.

The performance of fuzzy logic approach is compared with various

approaches by altering the number of links to be prefetched during browser idle

time. Top-down approach by Markatos and Chronaki (1998) and Bigrams in link

approach by Georgakis and Li (2006) are used for comparison. It is also

compared with Naïve Bayes approach discussed in section 3.2.


74

0.8
Top-down
Recall 0.6
Naïve Bayes
0.4 Bigrams-Link
Fuzzy Logic
0.2

0
2 4 6 8 10
Number of links Prefetched

Figure 3.11 Comparison of Recall in various approaches

Figure 3.11 compares the Recall achieved in various approaches for

different number of prefetched hyperlinks. The number of links to be prefetched

during browser idle time is varied between 2 to 10 for analyzing the effectiveness

of each approach in satisfying the user requests. When more links are prefetched,

it can easily satisfy the user requests. As indicated in the graph, fuzzy logic

provides better performance over the other approaches in all the cases. Naïve

Bayes approach is better than Top-down approach and comes closer in

performance to the Bigrams in link approach. The reason for fuzzy logic

approach being able to produce better results are, it can refine the generation of

predictions based on user browsing pattern.


75

0.8
Top-down
Precision 0.6 Naïve Bayes
0.4 Bigrams-Link
Fuzzy Logic
0.2

0
2 4 6 8 10
Number of links Prefetched

Figure 3.12 Comparison of Precision in various approaches

Precision (Pc) achieved in various approaches is shown in Figure 3.12.

As shown in the graph, when minimal number of links is prefetched (i.e. 2 to 4)

it moderately satisfies the user requests, resulting in precision with a range of

(0.3 to 0.55) across various approaches. Precision in the range (0.4 to 0.65) is

achieved when more than four links are prefetched with an upper bound of up to

eight links. Prefetching more than eight links will satisfy the user requests but the

number of unused links will be more resulting in wastage of resources and poor

precision rate. Fuzzy logic approach provides better performance across all the

cases when compared to other approaches.


76

3.5 CONCLUSION

This chapter discusses web prefetching based on information

associated with hyperlinks present in web pages. Two approaches: Naïve Bayes

and Fuzzy Logic are designed to compute priority value of hyperlinks using

hypertext information. Naïve Bayes approach used only the token information

stored in user-accessed repository for computing the priority of hyperlinks.

Fuzzy logic approach used token information stored in user-accessed and

predicted-unused repository for computing the priority of hyperlinks. It helps to

generate effective predictions during browsing sessions, thereby efficiently

minimizing the user perceived latency.

The predictions are generated dynamically for each new web page

visited by the user. Experimental results indicate that the proposed approaches

generate efficient predictions to mitigate the latency. Recall and Precision

metrics clearly demonstrate the efficiency of fuzzy logic and naïve bayes

approaches over the existing algorithms.


77

CHAPTER 4

PRECEDENCE GRAPH BASED WEB PREDICTION

4.1 INTRODUCTION

Prefetch systems are designed to generate web predictions based on

criteria such as access patterns, object popularity and structure of accessed web

documents. This chapter discusses a prediction algorithm that builds Precedence

Graph (PG) by analyzing user access patterns to generate predictions that reflect

user’s future requests. It uses object URI and referrer in each user request to

build precedence relation between the requested web object and its source. The

algorithm differentiates the dependencies between objects of the same web page

and the objects of different web pages to incorporate the characteristics of current

websites when building the graph. It uses simple data structure for implementing

the graph that requires minimal memory and computational resources. When user

requests a new web page, it is satisfied using the contents of prefetch cache and it

reduces access latency observed by the users.

The prediction engine located at web server is responsible for building

the Precedence Graph and generating the predictions. It gathers information

related to user requests for the web pages stored in a server to build the graph.

The output of prediction engine is the hint list (predictions) with set of URI’s that

are likely to be requested by the user in near future. Prefetching engine located at
78

the client receives predictions from server and uses it to download web objects

and store them in prefetch cache maintained at the client. Precedence graph gets

updated dynamically with new nodes/arcs when user requests for new web

objects, which ensures that the predictions are generated based on the recent

information maintained in the graph. Aggressiveness of prefetching is controlled

by employing a ‘threshold’ parameter that is applied to the confidence value of

arcs in the graph when generating the predictions.

Precedence Graph is implemented as a directed graph to represent the

unidirectional relationship between the predecessor and successor nodes. The

predecessor-successor relationship is integrated into the graph on a per-request

basis by either inserting a new arc (when predecessor-successor relationship is

discovered for the first time) between the nodes or updating the occurrence count

of an existing arc (when relationship is already observed) in the graph.

Figure 4.1 represents the basic interaction between web client and web

server, where prefetching mechanism is enabled with server holding the

prediction engine and client performing the prefetching operation. When user

clicks on a URL from a displayed web page or types new URL in the web

browser, a HTTP GET request is sent by the browser to the server for fetching

the object. Web server on receiving the request establishes connection with the

prediction engine and requests it to generate the predictions. Prediction engine

performs predictions for the request and generates hint list, which the web server
79

sends it to the client through HTTP response by including the HTTP response

header (e.g. Link :< P2>; rel: prefetch) that indicates the link to be prefetched.

Web HTTP Web Local Prediction


Browser Protocol Server Connection Engine

Request uses
GET P1 Predict
P1
P1
Precedence
User Graph
P1 200 OK (P1) Hints: P2, P3
Link: <P2>; rel: prefetch Generates
Hints
Link: <P3>; rel: prefetch

GET P2
Browser
Idle
Time 200 OK (P2)
(Prefetch) P2

Request
P2 Prefetch
User Cache
P2

Figure 4.1 Browser & Server interaction with Prediction and Prefetching
80

When web browser is idle, it uses the hints received in HTTP response

to download the web objects and store it in prefetch cache. If user requests new

web page (e.g. P2), and if it is available in prefetch cache, then it is served

immediately without any latency.

The number of links that can be prefetched is bounded by the

following factors: a) availability of total bandwidth b) time the user spends in

each page and c) size of visited web pages. Prefetching algorithm should aim to

balance the cache-hit ratio and usefulness against the bandwidth overhead when

anticipating future requests.

4.2 PRECEDENCE GRAPH

The objectives of building Precedence Graph to generate the

predictions are:

· To build the graph with minimal number of arcs that will reduce

the computation time.

· To generate effective predictions in quick time.

· To manage the graph size within controllable limits by

periodically subjecting the graph to trimming of nodes and arcs.

4.2.1 Introduction

The prediction algorithm is designed to build Precedence Graph (PG)

that represents the user access patterns with nodes representing the web objects

and arcs representing the relation between web objects. Arcs are added between
81

the nodes when user request reports the requested object (successor node) and

source object (predecessor node) from where the request is generated. Each arc

has transition weight associated with it that represents the transition confidence

from predecessor to successor nodes.

When user requests a web object, the request information is used to

update the graph in two ways: 1) add a new node to represent the web object and

a new arc to represent the relationship between the new node and an existing

node in the graph 2) update the existing node and arc by incrementing the node

and arc occurrence count. The updated graph generates predictions for the user

request by analyzing nodes and arcs that are related to the node representing the

requested web object. Arcs with occurrence count greater than threshold value

are considered for generating the predictions. Prefetching engine uses the

generated predictions to download web objects from the server during browser

idle time.

Web access log files are filtered to select appropriate HTTP method

(i.e. GET) and HTTP response code (200 – OK, 304 – Not Modified, 206 –

Partial Content) before it is used for building the Precedence Graph. HTTP

headers offer accurate information, since they are explicitly provided by the web

client and server.


82

4.2.2 Building the Graph

The algorithm for building Precedence Graph is as follows:

Input:
· User requested object (URL)
· Referrer in the user request
· Object type (primary/secondary)
Output: updated Precedence Graph
Step 1: Adding new node (or) updating existing node
‘x’ → a node in the graph
Find ‘x’ that matches with the user requested object
If ‘x’ available, then
occurrence (x) ← occurrence (x) + 1 // count updated
Else
‘x’ newly created // represents the requested object
occurrence (x) =1
End if
Step 2: Adding new arc (or) updating existing arc
‘ y’ → a node in the graph
Find ‘y’ that matches with the referrer in user request
If ‘y’ available, then {
Find the arc ‘yx’ // transition from node y to x (y x)
If ‘yx’ available, then
occurrence (yx) ← occurrence (yx) + 1
Else
‘yx’ newly created // arc from y to x
occurrence (yx) = 1
End if
} Else
No arc gets added or updated in the graph
End if
Step 3: Compute transition confidence of all arcs from node ‘y’
arc transition confidence ← arc occurrence / ‘y’ occurrence
Return Precedence Graph
83

The algorithm analyzes the user access patterns to build Precedence

Graph for predicting user’s future requests. Each node in the graph represents

user requested web object with an initial count value of 1. When user requests

the same web object again, then node’s occurrence count get incremented. Each

arc in the graph represents transition from one web object into another object.

For each user requested web object, an arc will be drawn with an initial count

value of 1, if its source node (predecessor) is specified in the request. The arc

reflects precedence relation that exists between the source object (predecessor

node) and the newly requested object (successor node). When user repeats

request for the same web object through same source, then the existing arc that

reflects this relation get incremented.

Example: Consider a user requests new web object ‘B’ through source

object ‘A’. Precedence relation established between the objects (A and B) by

drawing an arc from A to B (i.e. object A precedes object B, A ® B) with its

initial count value as 1. When user again requests object B through object A, then

the existing arc get incremented by 1 for each reference.

The source of each requested web object is represented by HTTP

referrer that is recorded for each user request in the log file. Referrer information

is used to create an arc from the source object (predecessor) to the requested

object (successor).
84

P1.html Primary Node

Secondary Node

P2.html P1.gif
Primary Arc
Secondary Arc
P3.html
P2.jpg

Figure 4.2 Precedence Graph

Each web page contains a main object (html) referred as the primary

object and several embedded objects (jpg, png) referred as the secondary objects.

The primary nodes in graph represent main objects that are demand requested by

users, while secondary nodes represent embedded objects that are requested by

web browser.

Figure 4.2 represents sample Precedence Graph created with primary

and secondary nodes. Arcs connecting two primary nodes termed as primary arcs

and those connecting primary and secondary nodes termed as secondary arcs. As

shown in Figure 4.2, objects (P2.html, P1.gif) requested from P1.html, objects

(P2.jpg, P3.html) requested from P2.html, and P1.html requested from P3.html.

4.2.3 Updating the Graph

Initially Precedence Graph is empty and is built and updated through

continuous learning of user access patterns. Each node in the graph contains
85

object URI, node type (primary/secondary), occurrence count and list of

primary/secondary arcs. Each arc in the graph contains destination URI, arc type

(primary/secondary), occurrence count and transition confidence. Node

occurrence count represents the number of user requests to the web object

represented as a node in the graph. Arc occurrence count represents the number

of transitions from the predecessor to successor node. Arc transition confidence

is computed by dividing the arc occurrence count with predecessor node

occurrence count.

The graph is dynamically updated whenever user requests web objects

and it involves the following steps:

a) Node occurrence count is incremented if it represents the requested

web object; else new node is created and added to the graph with

an initial occurrence count of 1 to represent the web object.

b) Arc occurrence count is incremented if it represents the transition

from source object (predecessor) to the requested object

(successor); else new arc is created between the nodes and added to

the graph with an initial occurrence count of 1 to represent the

transition.

The graph will grow in size during its learning process and is

controlled by removing nodes and arcs that least represents the user interest and

does not influence the prediction process in generating the hints.


86

4.2.4 Predictions from the Graph

When user requests a web page through web browser, the primary

object of web page is first requested and then secondary objects are requested

either from server or local cache. For each web page, perfect prediction

algorithm (de la Ossa et al 2009) reports three types of hints: a) primary object of

next web page to be requested by user b) secondary objects associated with next

web page and c) further next pages. The proposed prediction algorithm generates

hints for a web object by analyzing nodes in the Precedence Graph. If a node

representing the requested web object is available in the graph, then its

associated arcs are analyzed to generate the predictions; else no predictions are

generated for the web object. The prediction engine needs to provide hints to the

browser as a sorted list according to their probability in order to prefetch relevant

pages.

Steps for generating the predictions are as follows:

1. For each user requested web object find a primary node in the

graph that represents the object.

2. If matching primary node is available, then analyze all the primary

arcs associated with that node. Select arcs having transition

confidence greater than or equal to the specified threshold.

3. Collect object URI’s stored in primary nodes that act as successor

nodes to the arcs selected in step 2. Arrange the object URI’s


87

based on their confidence value (highest to lowest) and add to the

prediction list.

4. Analyze secondary arcs associated with the primary nodes used in

step 3. Select arcs having transition confidence greater than or

equal to the specified threshold.

5. Collect object URI’s stored in secondary nodes that act as

successor nodes to the arcs selected in step 4. Arrange the object

URI’s based on their confidence value (highest to lowest) and add

to the prediction list.

6. Final prediction list contains object URI’s from both primary and

secondary nodes of graph. Prefetching engine uses the list to

download web objects from server. Go to Step 1 if user requests a

new web object.

Predictions (Hints) generated at server can be provided to the client in

three different ways:

1. In a response HTTP header

Link: <one.html>; rel = prefetch

2. In a ‘meta’ tag on the HTML header

< meta HTTP-EQUIV = “Link” CONTENT = “<one.html>;

rel= prefetch”>
88

3. In a ‘link’ tag on the HTML body

< link rel = “prefetch” href = “one.html”>

Figure 4.3 shows sample HTTP header that supplies the referrer

information to server. It is recorded in the access log file at server, which is used

by the prediction engine to build the graph.

Request URL : http://www.rediff.com/getahead

Request Method : GET


Status Code : HTTP/1.0 200 OK
Accept : text/html, application/xhtml + xml, application/xml
Host : www.rediff.com
Proxy-Connection : keep-alive
Referer of
Referer : http://www.rediff.com/sports requested URL

Figure 4.3 Sample HTTP header with referer information

Figure 4.4 shows sample HTTP response from server that includes the

link to be prefetched during idle time. It is used by the client to download the

web object.

4.2.5 Prefetching the Web Objects

Prefetching engine that is integrated into the web browser normally

prefetches URLs provided using HTTP protocol without any embedded objects.

It will not prefetch URLs that contain parameters (queries). For example, Mozilla

Firefox an open source web browser that has web prefetching capabilities
89

recognizes hints included in the response HTTP headers or embedded in the

HTML file to perform prefetching (Fisher 2003).

Content-Encoding : gzip
Content-Length : 10933
Content-Type : text/html
Date : Sat, 24 Nov 2012 08:16:14 GMT
Proxy-Connection : keep-alive
Server : Apache
Vary : Accept-Encoding
Via : 1.0 localhost: 8080 (squid/2.6.STABLE6)
X-Cache : MISS from localhost
Link to be prefetched
X-Cache-Lookup : MISS from localhost: 8080
Link : <http://www.rediff.com/getahead/img1.jpg>
rel = “prefetch”

Figure 4.4 Sample HTTP response with link to be prefetched

In our approach, prefetching engine located at client receives

predictions from server via HTTP response header and uses it to prefetch web

objects during browser idle time. It helps to avoid interference with regular user

requests. Object MIME type in HTTP response header is used to determine

whether a web object is eligible for caching. The downloaded objects are stored

in prefetch cache that is maintained separately from regular cache to improve hit

rate. Web object will not be prefetched if it already exists in either regular or

prefetch cache. When user demand requests a new web page, then prefetching
90

activity in progress will be terminated. When prefetching mechanism is not

available in client machine, then user requests will be served from either local

cache (regular) or web server.

Prefetching engine should not retrieve a web page that will be accessed

after a long time, since at the time of access the page may contain old data. User

perceived latency can be significantly reduced by prefetching more pages, but the

prefetch accuracy will diminish if the prefetched pages are not referenced by

users.

Table 4.1 Sample user requests in a session

Requested URL Referrer for the request


/P1.html -
/P1.gif /P1.html
/P1.jpg /P1.html
/P2.html /P1.html
/P2.jpg /P2.html
/P3.html /P2.html
/P3.jpg /P3.html
/P3.gif /P3.html
/P1.html /P3.html
/P1.gif /P1.html
/P1.jpg /P1.html
/P4.html /P1.html
/P4.png /P4.html
/P4.jpg /P4.html
/P5.html /P4.html
/P1.html -
/P1.gif /P1.html
/P1.jpg /P1.html
/P2.html /P1.html
91

4.2.6 Implementation Example

Table 4.1 shows sample user requests in a session that is used to

illustrate the working of proposed prediction algorithm. The web requests in

Table 4.1 are used to build Precedence Graph shown in Figure 4.5. Primary and

secondary nodes in the graph represented with object URI’s and their occurrence

count. Primary and secondary arcs in the graph represented with their occurrence

count.

P1.html 3 1
1 2 3 3

P2.html 2 P1.jpg 3 P1.gif 3 P4.html 1

1 1 1 1
1

P3.html 1 P2.jpg 1 P4.png 1 P4.jpg 1

1 1
P5.html 1
P3.jpg 1 P3.gif 1

Figure 4.5 Precedence Graph built using the user requests

Figure 4.6 represents the adjacency map implementation of Precedence

Graph in which the primary nodes of graph are stored as keys in the map and

primary/secondary arcs originating from each node are stored as list associated

with keys in the map. Secondary nodes of graph are not stored as keys in the map,
92

since in most cases they do not act as source of new web object. Each element in

the list is shown with three fields: object URI, arc occurrence count and arc

transition confidence.

Key Value
P1.html 3 P1.gif 3 1 P1.jpg 3 1 P2.html 2 0.6 P4.html 1 0.3

P2.html 2 P2.jpg 1 0.5 P3.html 1 0.5

P3.html 1 P3.jpg 1 1 P3.gif 1 1 P1.html 1 1

P4.htm1 1 P4.png 1 1 P4.jpg 1 1 P5.html 1 1

P5.htm1 1

Figure 4.6 Adjacency Map for the Precedence Graph

arc occurrence count


Arc Transition Confidence =
node occurrence count

Example:

Arc transition confidence of objects with reference to primary node

P1.html is:

P1.gif = 3/3 =1

P2.html = 2/3 = 0.6

P4.html = 1/3 = 0.3


93

Table 4.2 represents the hints generated for user requests based on the

information maintained in the Precedence Graph shown in Figure 4.5.

Table 4.2 Hints generated for user requests

User Requests Hints with Confidence value

/P2.html (0.6), /P2.jpg (0.3),


/P1.html
/P4.html (0.3) , /P4.png (0.3 ), /P4.jpg (0.3)

/P2.html /P3.html(0.5) , /P3.jpg (0.5), /P3.gif (0.5)

/P3.html /P1.html (1), /P1.gif (1), /P1.jpg (1)

/P4.html /P5.html (1)

The prediction algorithm provides both primary and secondary objects

as hints. But compared to secondary objects, the primary objects provide more

page latency savings due to the following factors:

a) Service time of primary objects is much longer than that of

secondary objects.

b) Web browsers use single connection to request primary objects,

whereas it uses two parallel connections simultaneously to request

secondary objects.

c) Secondary objects are requested by browser only after the primary

object is received and parsed.


94

4.3 GRAPH TRIMMING

The prediction algorithm dynamically builds Precedence Graph by

constantly updating it with request information whenever user accesses the web

pages. When the graph size increases over time, there will be increase in the

requirement for computational resources. The information stored in the graph

will become obsolete due to the following factors: a) change in access patterns of

the user due to change in topic of interest b) web pages previously accessed by

the users may be removed from the website or referred by a different URL. In

these cases, the occurrence count of nodes and arcs will not be updated and it

remains in their old count value. When graph contains such obsolete information,

it wastes memory and computational resources of the prediction engine by

generating useless predictions that degrades system performance. The problem

can be solved by periodically trimming the graph to remove nodes and arcs that

least represents the users’ interests.

When designing the algorithm to perform trimming operation, the

following issues need to be considered: a) it should not increase the resource

consumption of the prediction algorithm when trimming is performed and b) it

should not affect the prediction accuracy of the algorithm by removing useful

nodes and arcs from the graph. The trimming operation analyzes the entire graph

to cover all its nodes and arcs. The nodes are removed from the graph based on

its popularity (number of access) and access time. If a node does not reach its

minimum popularity or it is not accessed for a long time, then it will be removed
95

from the graph. Nodes having popularity greater than the prescribed threshold or

it has been accessed recently, then it will be retained in the graph. Table 4.3

represents the notations used in the trimming algorithm.

Table 4.3 Notations used in Trimming Algorithm

Variable Meaning

T_C Time Counter

T_Th Trimming Threshold


n_occ_th Node occurrence Threshold

arc_th Arc occurrence threshold

arc_occ Arc occurrence count

n_occ Node occurrence count

Time_Diff Threshold to decide removal of node from graph

n_a_t Node access time

4.3.1 Invoking Trimming Operation

The time counter and trimming threshold are used to decide the

invocation of trimming algorithm. The interval duration to be added with the

trimming threshold is set by the user and it can be decided such that it will not

affect the performance of the system. The procedure used for invoking the graph

trimming algorithm is as follows:


96

Initialization:
T_C = node access time
T_Th = T_C + Interval Duration
Step 1: Updating the Graph and node access time
While (T_C < T_Th) {
// add new node or arc; else increment count of node or arc
Update the graph {
Increment node/arc occurrence count; (OR)
Add new node/arc;
}
T_C = node access time; // Updated with new access time
}
Step 2: Invoking Trimming operation on the Graph
If (T_C ≥ T_Th) {
// Access time greater or equal to threshold, perform trimming
Invoke Trimming algorithm;
}
Step 3: Reset the counters
After trimming is completed, the counters are reset:
T_C = 0;
T_Th =0;
Access time in Node =0; // reset in all the nodes
Go to Step 1 to restart the activity with:
T_C = New access time
T_Th = New threshold value
97

Time Counter (T_C)

It is used to keep track of the current access time, whenever the graph

is updated with user request information. The graph gets updated in two ways: a)

addition of new node or arc to reflect the access information b) increment the

occurrence count of existing node or arc.

Trimming Threshold (T_Th)

It is used to decide when the graph is subjected to trimming operation.

During trimming, the nodes and arcs of graph will be analyzed and based on their

occurrence count and access time, suitable action will be taken.

T_Th = T_C (Initial) + Interval Duration

Trimming threshold value is the summation of the interval duration

specified by the user and the initial value that is set in the time counter before

starting the updating activity in the graph. Whenever time counter is initialized

with new access time, then trimming threshold will have new time that acts as

the timeline to decide the invocation of trimming operation. The interval duration

is configurable and it is set by the user depending on the requirement.

The value for interval duration needs to be selected properly in such a

way that it should not degrade the performance of the prediction algorithm. If the

interval duration is too long, then the graph will contain outdated information

that will be of no use to the user when they are given as predictions. If the
98

interval duration is too short, then it leads to frequent trimming of the graph that

will waste the computational resources and also it may remove useful

information from the graph.

4.3.2 Trimming Algorithm

It is invoked when the time counter (T_C) value reaches up to the

trimming threshold (T_Th) value.

Each node in the graph maintains node access time (n_a_t) that is

updated with the current access time to reflect the request information. The node

access time is compared with time counter value to find when the particular node

was recently accessed. It is compared with threshold value (Time_Diff) to decide

on the action that needs to be performed on a particular node.

T_C = Access Time (recent value)

n_a_t = node access time (last accessed value)

If [T_C – n_a_t] greater than or equal to the threshold (Time_Diff)

value, then the particular node need to be removed from the graph since it has not

been accessed for a long time.

If [T_C – n_a_t] less than the threshold (Time_Diff) value, then arcs

associated with the particular node will be analyzed. Based on the arc threshold,

its primary and secondary arcs will be selected for removal from the graph.
99

The algorithm that performs trimming on the graph is as follows:

If ([T_C – n_a_t] ≥ Time_Diff) {


// node not accessed for long time, need to be removed
If (node has no outgoing links)
Remove the particular node;
Else {
Remove all its secondary arcs and nodes;
Remove all its primary arcs;
Remove primary nodes if it meets criteria;
Remove the particular node;
}
Remove all incoming arcs to this node;
}
Else {
// node recently accessed, analyze its arcs
If (arc_occ < arc_th) {
If (secondary arc)
Remove secondary node and arc;
Else {
Remove primary arc;
Remove primary node if it meets criteria;
}
}
Else
Remove the particular node if its popularity is less;
}
100

Example:

For the illustration, consider the following initializations to the various

variables used in the algorithm.

Time Counter (T_C) = 0;

Interval Duration = 1 hr (3600 Sec)

Trimming Threshold (T_Th) = T_C + Interval Duration

= 0 + 3600

= 3600

Time Difference (Time_Diff) = 30 min (1800 Sec)

Node occurrence Threshold (n_occ_th) = 50

Arc occurrence Threshold (arc_th) = 20

The time counter is initialized to zero and it gets new access time

whenever the graph is accessed. Trimming operation will be performed in the

time interval of every one hour. Minimum number of node occurrences will be

50 and the arc occurrences will be 20.

In the graph shown in Figure 4.10, the meaning of following variables is:

· n_occ = node count

Incremented each time the node is accessed

· n_a_t = node access time (in seconds)

Updated with new access time when the node is accessed


101

P1.html
25 n_occ = 100 12
n_a_t = 3600

100 100
P2.html 10 P3.html
n_occ = 70
n_occ = 100
P1.gif P1.jpg n_a_t = 2500
n_a_t = 2854 n_occ = 100 n_occ = 100

70
100 60
P5.html
P2.gif n_occ = 20
P3.jpg
P4.html n_occ = 100
n_occ = 100
n_a_t = 1700
n_occ = 60
n_a_t = 2850

Figure 4.7 Precedence Graph before Trimming

Figure 4.7 represents the sample Precedence Graph considered for

demonstrating the trimming operation. Each primary node in the graph is

represented with information such as: object URI, node occurrence count and

current access time. Each secondary node is represented with information such

as: object URI and node occurrence count. Primary and secondary arcs

represented with its occurrence count. Access time of secondary objects will be

same as that of primary object, because they will be accessed in most cases

whenever primary object is requested by user.


102

Let us consider that the time counter (T_C) reached its threshold limit;

i.e. T_C = T_Th (T_C = 3600). The graph is now subjected to trimming

operation.

The trimming done on the graph is as follows:

1. The analysis starts with node containing P1.html.

n_occ > n_occ_th // node count greater than threshold

n_a_t = T_C // node access time equal to time counter

P1.html will be retained in the graph. Now analyze its primary and

secondary arcs for further action.

2. P1.html has two secondary arcs leading to P1.gif and P1.jpg. In both

the nodes,

n_occ > n_occ_th // node count greater than threshold

arc_occ > arc_th // arc count greater than threshold

P1.gif and P1.jpg will be retained in the graph.

3. Consider primary arc from P1.html to P3.html.

arc_occ < arc_th // arc count less than threshold

Arc will be removed from the graph.

For the node P3.html, its

n_occ > n_occ_th // node count greater than threshold

(T_C – n_a_t) < Time_Diff // difference in time less than Threshold

P3.html will be retained in the graph.


103

4. Consider primary arc from P1.html to P5.html.

arc_occ < arc_th // arc count less than threshold

Arc will be removed from the graph.

For the node P5.html, its

n_occ < n_occ_th // node count less than threshold

(T_C – n_a_t) > Time_Diff // time difference greater than threshold

P5.html will be removed from the graph.

5. Consider primary arc from P1.html to P2.html.

arc_occ > arc_th // arc count greater than threshold

Arc will be retained in the graph.

For the node P2.html, its

n_occ > n_occ_th // node count greater than threshold

(T_C – n_a_t) < Time_Diff // difference in time less than Threshold

P2.html will be retained in the graph.

After the trimming operation is completed, the access time (n_a_t) of

all the nodes will be initialized to 0. The node and arc occurrence count reduced

to 10% of its original count value, so that in the next interval an accurate analysis

is carried out.

Figure 4.8 represents the Precedence Graph after trimming operation is

performed with access time of nodes set to 0 and the count values reduced to

10% of its original value.


104

P1.html
3 n_occ = 10 P3.html
n_a_t = 0 n_occ = 7
n_a_t = 0

10 10
P2.html 7
n_occ = 10
P1.gif P1.jpg
n_a_t = 0 n_occ = 10 n_occ = 10 P3.jpg
n_occ = 10

10 6

P2.gif
n_occ = 10 P4.html
n_occ = 6
n_a_t = 0

Figure 4.8 Precedence Graph after Trimming

4.4 EXPERIMENTAL ENVIRONMENT

This section discusses the experiments conducted for evaluating the

proposed algorithm and the workload characteristics used for building the graph.

4.4.1 Experimental Setup

The experimental setup comprises of web server with prediction

engine and client with prefetching engine. Web server builds Precedence Graph

using user access patterns and then generates predictions for user requests. Client

receives predictions from web server and uses it to download web objects during
105

browser idle time. To simulate group of users accessing the web server for

information, real web traces are fed to client that uses prefetching enabled web

browser. The time interval between two successive web requests computed using

timestamp value recorded in the log file to mimic actual client behavior. Each

user request and its response are recorded in a log file during simulation. The log

file will be analyzed after completion of the simulation to compute the

performance metrics (Precision and Recall) of the system. Prediction algorithm

constantly learns the user’s access patterns during experiments, thus guaranteeing

that the knowledge of algorithm gets updated whenever patterns change.

4.4.2 Training Data

The training data is crucial in correctly predicting the user requests,

since most of the web prefetching techniques use part of user access sequence in

constructing the prediction model before using it to generate the predictions. If

training data has few access requests, then relevant user requests will be missed

resulting in poor representation of browsing characteristics. If training data

includes excessive user accesses, then it may contain some outdated user access

patterns and browsing information.

Web access log files record users’ access patterns during website

navigation. The log files can be maintained at client, server or proxy in the web

architecture. Several research initiatives used log files maintained at web server

as its main data source for experimentation.


106

The log files for experimentation are collected from our institutional

web server that is maintained to provide academic related information such as

news articles, admission details, course details, examination details etc. to the

faculty members and students. Most web pages maintained in the server are static

and it has only minimal percentage of dynamic pages.

The log file includes information such as: requested URLs, request

time, object type, identifier assigned to IP address of user requesting the URL,

elapsed time for serving the request. Table 4.4 lists the important fields in a log

file entry with a description about its function.

Table 4.4 Important fields in a log file entry

Field Description
192.168.10.1 Client IP address that made the request

10/Oct/2011:10:12:45 Timestamp of visit as seen by the server

GET Request method

/logo.gif Requested object

HTTP/1.1 Protocol used for request and response

200 HTTP response code - OK

1345 Bytes transferred for the request

http://www.psgtech.edu/ Referrer URL – Source page from where


request was sent
107

4.4.2.1 Preprocessing Log Files

Web logs are preprocessed to reformat them for effectively identifying

web access sessions and use the information to build Precedence Graph for

generating the predictions. First task in preprocessing is to perform data cleaning

by removing redundant and useless entries from web log file and retain only

valid entries related to the visited web pages.

Entries that are removed from log file during data cleaning operation

are:

· Requests executed by automated programs such as web robots,

spiders and crawlers

· Requests with unsuccessful HTTP status codes

· Request methods other than “GET” method.

The second task in preprocessing is to perform session identification

by segmenting long sequence of web requests into individual user access

sessions. Each user session consists of sequence of web pages visited over a

period of time. When a user remains idle for more than 30 minutes without

making any request, then the next request from same user is considered as the

start of new access session.

4.5 RESULTS

This section discusses the performance of Precedence Graph in

generating predictions for the user requests by measuring Recall and Precision
108

metrics that quantifies the efficiency and usefulness of the generated predictions.

When the contents of prefetch cache are used to satisfy the user requests, then it

indicates prefetch hit else it is prefetch miss. The prediction algorithm needs to

generate useful hints for each web request to achieve high hit rate and reduction

in user access latency.

Recall (hit rate) represents the ratio of prefetch hits to the total number

of user requests, and it measures the usefulness of predictions. Precision

(accuracy) represents the ratio of prefetched pages requested by the user from

prefetch cache to the total prefetched pages.

Prefetch Hits Prefetch Hits


Recall = Precision =
User Requests Total Prefetchs

The Recall and Precision metrics are measured by varying the

prediction threshold that controls the number of predictions generated for the

user requests.

Figure 4.9 represents the Recall achieved for user requests with

different prediction threshold (0.2 to 0.5). When threshold is 0.5, the number of

links recommended as hints from the graph will be minimal resulting in medium

number of user requests being satisfied using the contents of cache. For threshold

of 0.2 the Recall achieved is very high, because the graph will generate more

predictions that allow it to satisfy more number of user requests.


109

1
0.9
0.8
0.7 Th=0.5
Recall 0.6
Th=0.4
0.5
0.4 Th=0.3
0.3 Th=0.2
0.2
0.1
0
5000 10000 15000 20000
No. of User Requests

Figure 4.9 Recall for user requests with different Thresholds

A smaller threshold value achieves higher hit rate but increases the

bandwidth consumption by prefetching more objects. If there is adequate

availability of bandwidth and hardware resources, smaller threshold value can be

used to achieve higher hit rate if waiting time is a critical requirement for users.

Figure 4.10 represents the Precision achieved for user requests with

different prediction threshold (0.2 to 0.5). When threshold is 0.2, it allows

predicting and prefetching of more objects to satisfy the user requests. But in real

situations, some of the prefetched objects remain unused by users resulting in

low precision. A balanced performance is achieved with threshold of 0.5, which

allows generation of accurate and moderate number of predictions per request.


110

0.8
Th=0.5
Precision
0.6 Th=0.4
0.4 Th=0.3
Th=0.2
0.2

0
5000 10000 15000 20000
No. of User Requests

Figure 4.10 Precision for user requests with different Thresholds

The performance of Precedence Graph is compared with existing

algorithms such as DG - Dependency Graph (Padmanabhan and Mogul 1996)

and DDG- Double Dependency Graph (Domenech et al 2006). Precedence Graph

(PG) provides performance similar to that of DG and DDG with less

computational requirements. Resource requirement for the graph will be based

on the number of nodes and arcs that constitute the graph. For each object

requested by the user, a node is created and added to the graph to represent that

object in all the algorithms (PG, DG and DDG). The difference now lies in the

number of arcs added between the nodes in the graph.

Figure 4.11 represents the number of arcs built in a graph by different

algorithms based on the user requests. It clearly indicates that the Precedence

Graph (PG) has less number of arcs compared to the existing methods (DG and
111

DDG). In case of PG, it adds an arc between the nodes based on the precedence

relation inferred from the request stored in the log file and it does not consider

the sequence of user requests. In case of DG and DDG, it adds an arc between

the nodes based on the sequence of user requests recorded in the log file.

15000

12000
No.of Arcs

9000 DG
DDG
6000 PG
3000

0
5000 10000 15000 20000
No. of User Requests

Figure 4.11 Number of Arcs in different Graphs

When implementing the Precedence Graph, only its primary nodes are

added as key values in the adjacency map. Secondary nodes of the graph are

added only as link elements to a particular key value in the map. It will avoid

wastage of key entries for secondary nodes that are not going to add any link

elements.

Predictions are generated from the graph by analyzing various arcs and

nodes associated with the node that represents the requested web object. The

time taken by an algorithm to generate predictions for a user request depends on


112

the total number of arcs in the graph. Precedence Graph is able to generate

predictions in quick time due to the fact that it has less number of arcs compared

to DG and DDG.

The results presented above are for the Precedence Graph that is built

normally using the user access patterns. It is not subjected to any trimming

operation, and the graph grew in size as it learns the user access patterns. When

trimming algorithm is applied over the graph, significant changes can be noticed

in the number of arcs and nodes the graph possesses after the operation. The size

of graph is significantly reduced after each trimming operation.

5000

4000
No.of Nodes

3000 PG
2000 PG + Trimming

1000

0
5000 10000 15000 20000
No. of User Requests

Figure 4.12 Number of Nodes in PG with/without Trimming

Figure 4.12 compares the number of nodes in the Precedence Graph

with and without trimming operation. As shown in the graph, with trimming

operation the number of nodes is very minimal when compared to the graph
113

without trimming. Similarly, Figure 4.13 indicates the reduction in number of

arcs when trimming is applied to the graph.

5000

4000
No.of Arcs

3000 PG
2000 PG + Trimming

1000

0
5000 10000 15000 20000
No. of User Requests

Figure 4.13 Number of Arcs in PG with/without Trimming

4.6 CONCLUSION

This chapter discusses a prediction algorithm that builds Precedence

Graph by learning the user access patterns to predict user’s future requests. The

algorithm differentiates relationship between primary objects (HTML) and

secondary objects (e.g., images) by considering two types of arcs (primary and

secondary) when constructing the graph. Precedence Graph is built with fewer

arcs than the existing approaches (DG and DDG), since it considers only the

precedence relation for the user request rather than the user access sequences

recorded in the log file. The graph structure gets updated dynamically with new
114

nodes/arcs based on the user requests, which ensures that the predictions

generated will reflect the latest requirements of the user.

Experimental results indicate that the Precedence Graph achieved good

Recall and Precision with minimal resource consumption (i.e. usage of memory

and computational resources). To effectively control the growth in graph size, a

trimming algorithm is designed to periodically remove the unwanted nodes and

arcs from the graph. It helps Precedence Graph to learn new user access patterns

without worrying about the size of graph.


115

CHAPTER 5

CACHE REPLACEMENT SCHEME TO ENHANCE WEB

PREFETCHING

5.1 INTRODUCTION

Web caching and prefetching techniques provide effective solution to

enhance the response time of end users. The web objects are stored at locations

closer to end users for serving their requests with minimal delay. Web caching

exploits the temporal locality and prefetching exploits the spatial locality that is

inherent in the user access patterns of web objects. Web caches are categorized

into: client cache, proxy cache and server cache depending on the location where

they are deployed in the web architecture (Zeng et al 2004). Server cache also

referred to as reverse or inverse cache handles web documents of a single web

server and reduces its workload. Proxy caches that are often located near network

gateways allow several users to share the resources and reduce bandwidth

required over expensive dedicated internet connections. Client cache also

referred to as browser cache are located close to the web users and provide short

response time if the requested object is available in cache. It enhances the web

access performance and is economical to manage due to its close proximity to

end users.
116

Web prefetching decides when and what web objects to be fetched

from web server during its operation. Two approaches for prefetching the objects

from server are: a) online approach - It fetches web objects during short pauses

that occur when user reads displayed page on screen b) offline approach – It

fetches web objects during off-peak periods or when user remains idle for certain

time period. When aggressive prefetching is employed it can create cache

pollution by replacing useful data with prefetched data in the cache. Similarly, if

web objects stored in the cache as part of web caching are not accessed

frequently, then it creates cache pollution that negatively affects system

performance. For effectively utilizing the limited cache capacity and to avoid

cache pollution, replacement algorithms are designed to manage the contents in

cache by effectively selecting the objects to be evicted from cache for storing

new objects. Cache replacement schemes need to implement algorithms that

don’t use complicated data structures to provide effective performance.

Several research work in recent years have applied intelligent

techniques such as back-propagation neural network (Cobb and ElAarag 2008),

fuzzy systems (Ali and Shamsuddin 2009) and evolutionary algorithms

(Sulaiman et al 2008, Ali et al 2011) to implement cache replacement schemes

in web caching and prefetching environment. The techniques reported in these

works indicate that the replacement activity based on intelligent approaches are

more efficient and adaptive to web caching environment compared to classical

replacement approaches (e.g. LRU, LFU).


117

This chapter discusses an efficient cache replacement scheme for

managing the client-side cache that is partitioned into regular and prefetch cache

for handling web caching and prefetching. Regular cache stores web objects

received from the following sources: a) objects that are demand requested by the

users and b) frequently accessed objects in prefetch cache that are transferred to

regular cache. Prefetch cache stores web objects downloaded based on the

predictions generated as part of web prefetching. The contents of regular cache

are managed using the replacement algorithm based on Fuzzy Inference System

(FIS). LRU algorithm is used to manage the contents of prefetch cache. The

proposed scheme is designed such that it retains the useful web objects for longer

time duration and removes the unwanted objects from cache for efficient

performance. Integration of prefetching in to the client cache system improves

the hit ratio because the prefetched objects are stored in prefetching cache

maintained independently from regular cache.

5.2 CACHE REPLACEMENT - OVERVIEW

Cache replacement algorithms are designed to effectively decide the

web objects to be evicted from cache for satisfying the following aspects:

· Effective utilization of available cache space

· Improving hit ratio

· Reducing network traffic

· Minimizing the load on origin web server


118

Replacement algorithm will compute priority of web objects stored in

the cache to select web objects to be evicted from cache. Factors considered for

computing the priority of web objects are: popularity (frequency), recency, object

size, popularity consistency, access latency (delay) and object type (html/text,

image/video, application). Web access latency (delay) represents the time

interval between sending the user request and receiving the last byte of requested

content as response. Recency represents the time when object was last referenced

and it reflects the temporal locality that exists in user access patterns. Web

objects are selected for eviction from the cache such that it has the lowest access

demand in the near future. Replacement policy is applied whenever cache

reaches its maximum limit or to evict objects that are not used for long duration.

Combining several factors to influence the replacement process in

deciding the web objects to be removed from cache is not an easy task as each

factor has its own significance in different situations. Locality of reference

characterizes the ability to predict future accesses to web objects based on the

past accesses to objects. Two main types of locality are: Temporal and Spatial.

Temporal locality indicates that recently accessed objects are likely to be

accessed again in the future. Spatial locality indicates that accesses to certain

objects can be used as a reference to predict future accesses to other objects.

Each web object is identified using different characteristics and among

them URL is the unique characteristic to identify the object. Most replacement

strategies use a combination of these characteristics to make their decisions.


119

Important characteristics of web objects are (Podlipnig et al 2003):

· Recency - Time when object was last requested

· Frequency - Number of requests to the object

· Size - Size of web object in bytes

· Cost - Cost involved in fetching object from origin

server

· Request value - Benefit gained from storing the object in

cache

· Expiration time - Time to Live (TTL) of the object

Factors such as object size, object type and access latency are static

and they are determined only once when the object is initially requested by the

user. Factors such as frequency, recency and popularity consistency are dynamic

and they are computed frequently till the object resides in cache.

Podlipnig et al (2003) categorized cache replacement algorithms as:

· Frequency based

Frequency of objects based on its popularity was analyzed and

used as the deciding factor for future actions on the web

objects.

· Recency based

It exploits the temporal locality seen in web requests patterns

and recency was used as the main deciding factor in selecting

the objects to be removed from cache.


120

· Frequency / Recency based

It combines both recency and frequency factors in making


decisions on the web objects stored in cache.
· Function based

It uses a general function to compute the value of an object

based on which the decisions are taken.

· Randomized

The objects are randomly selected for removal from the cache.

Temporal locality and document popularity influence the web request

sequences. Object size and cost of fetching an object from server, along with

temporal locality and long term popularity plays significant role in performance

of cache replacement schemes.

5.3 FUZZY INFERENCE SYSTEM

Fuzzy Inference System (FIS) shown in Figure 5.1 is a popular

computing framework based on the concept of fuzzy set theory, fuzzy if-then

rules and fuzzy reasoning. Fuzzy Inference is the process of formulating the

mapping from a given input to an output using fuzzy logic. The mapping

provides a basis from which decisions can be made or patterns discerned. FIS has

good function approximation capability that is reflected in various problems such

as control, modeling and classification. Fuzzification transforms the crisp input

into degree of match with linguistic values. Knowledge base comprises of two

components: Rule base and Database. The rule base contains various fuzzy if-
121

then rules and database defines the membership functions of fuzzy sets used in

fuzzy rules. Inference engine is responsible for making decision operation on

rules. Defuzzification transforms the fuzzy results into crisp output.

Knowledge Base

Database Rule base

Input Output
Fuzzification Defuzzification

Fuzzy Set Fuzzy Set


Inference Engine

Figure 5.1 Framework of Fuzzy Inference System

The time complexity of FIS system depends on the number of rules it

considers to make decision. Fewer rules in the database will result in better

system performance.

5.3.1 Membership Function

It provides a measure of the degree of similarity of an element to a

fuzzy set. It can be chosen either arbitrarily by the user based on his experience

or designed using machine learning methods. Different shapes of membership

function that exists are: triangular, trapezoidal, piecewise-linear, Gaussian and

bell-shaped.
122

· Gaussian Membership Function

- ( x -c ) 2

f ( x;s , c) = e 2s 2

It depends on two parameters σ and c.

· Trapezoidal Membership Function

ì 0, x£a ü
ïx-a ï
ï , a £ x £ bï
ïb - a ï
f ( x; a, b, c, d ) = í 1, b £ x £ cý
ïd - x ï
ïd - c , c £ x £ dï
ï 0, d £ x ïþ
î

It depends on four scalar parameters: a, b, c and d.

· Bell Function

1
f ( x; a, b, c) = 2b
x-c
1+
a

It depends on three parameters: a, b and c. The parameter ‘b’ is usually

positive. Parameter ‘c’ is used to locate the center of curve.


123

· Triangular Function

ì 0, x£a ü
ïx - a ï
ïï b - a , a £ x £ b ïï
f ( x; a, b, c) = í ý
c-x
ï , b £ x £ cï
ïc -b ï
ïî 0, c £ x ïþ

It is a function of vector ‘x’ and depends on three parameters: a, b, c.

Each membership function is assigned a linguistic term and it will map

the input parameters to the membership value in the range 0 to 1. The input space

is sometimes referred to as Universe of Discourse (Z).

Figure 5.2 Membership Functions for Recency


124

Consider the system takes four input parameters: Recency, Frequency,

Delay Time and Object Size. Each input parameter is associated with three

membership functions: {low, medium and high} that will map the input values to

the associated fuzzy sets in the degree of 0 to 1.

Figure 5.2 represents the Recency input being mapped to its

membership functions {low, medium and high} and it is illustrated using a bell

curve. Figure 5.3 represents the Frequency input being mapped to its

membership functions {low, medium and high}.

Figure 5.3 Membership Functions for Frequency

The input parameter Delay Time is mapped to its membership

functions {low, medium and high} as shown in Figure 5.4.


125

Figure 5.4 Membership Functions for Delay Time

Figure 5.5 represents the mapping of input ‘Object Size’ to its

membership functions {small, medium and large}.

Figure 5.5 Membership Functions for Object Size


126

5.3.2 Fuzzy Rules

The rules are linguistic IF-THEN statements that constitute a key

aspect in the performance of fuzzy inference system. It describes the relationship

between input and output values. IF part is called as “antecedent” and THEN part

is called as “consequent”.

Example:

IF {Frequency is low} THEN {Removal is high}

Antecedent Consequent

{Frequency, Removal} are linguistic variables.

{low, high} are linguistic terms that correspond to membership

function.

If antecedent of a rule has more than one part, then fuzzy operator

(AND) is applied to obtain a single value that represents the antecedent result for

that rule. The consequent is a fuzzy set represented by a membership function

and it can be reshaped using a function associated with the antecedent.

Decisions are taken by testing all the rules in a Fuzzy Inference

System, so the rules must be combined in order to generate the final output.

Aggregation is the process by which the fuzzy sets that represent the output of

each rule are combined into a single fuzzy set. It occurs only once for each

output variable before performing defuzzification.


127

5.3.3 Defuzzification

It takes the aggregated output fuzzy set as input and produces a single

output value (crisp data).

Commonly used methods for defuzzification as shown in Figure 5.6

are:

§ Centroid of Area (COA)

It is the most commonly used technique and is considered to

be more accurate. It returns the center of area under the curve.

òm A ( z ) zdz
zCOA = z

òm z
A ( z )dz

mA(z) is the aggregated output membership function.

§ Bisector of Area (BOA)

It will divide the region into two sub-regions of equal area

and it sometimes coincides with the centroid line.

Z BOA b

òm
a
A ( z )dz = ò m A ( z )dz
z BOA

where a = min {z; z ÎZ} and b = max {z; z ÎZ}.


128

§ Mean of Maximum (MOM)

ò zdz
z MOM = z'
,
ò dz
z'

where Z' = {z; mA(z) = m*}

z1 + z 2
if max m A ( z ) = [ z1 , z2 ] then z MOM =
2

§ Smallest of Maximum (SOM)

Amongst all z that belong to [z1, z2], the smallest is called zSOM

§ Largest of Maximum (LOM)

Amongst all z that belong to [z1, z2], the largest value is called

zLOM

Figure 5.6 Methods to perform Defuzzification


129

5.4 PROPOSED FRAMEWORK

The framework shown in Figure 5.7 manages the client-side cache by

partitioning them into two parts: regular cache and prefetch cache. Each part of

the cache has its own storage space and is managed independently using separate

replacement policy. Regular cache is managed using Fuzzy Inference System

(FIS) algorithm and prefetch cache is managed using LRU algorithm.

Purge Objects
– FIS Used

Client
Request
Regular Interaction
User
Browser
Cache
Web Response
Server Interaction
Prefetch User
Prefetch
Cache
Response

Purge Objects
– LRU Used

Figure 5.7 Framework for managing regular/prefetch cache

Regular cache stores web objects that are demand requested by users

and frequently accessed objects that are transferred from prefetch cache. Prefetch

cache stores web objects that are downloaded from server using the predictions

generated as part of web prefetching. When user frequently accesses the objects
130

stored in prefetch cache, they are moved to regular cache to ensure that the

popular objects reside in cache for longer time duration. The scheme effectively

removes the useless objects to alleviate cache pollution and maximize the hit

ratio.

When users’ requests are satisfied using the contents of either regular

or prefetch cache, then it indicates cache hit and the requests are not forwarded

to the web server. In case of cache miss, the requests are forwarded to server for

acquiring the required data.

When server receives the user request, it performs the following tasks:

· Records the details of user request in access log file

· Fetch the requested object and generate predictions for the

request

· Sends the requested object and its predictions to client.

Server analyzes the user requests stored in access log file to generate

the predictions and deliver it to client. Client on receiving the requested object

along with list of predictions from server performs the following tasks:

· Received web object stored in regular cache and displayed to user

· Prefetch web objects based on the prediction list during browser

idle time and store in prefetch cache.


131

Client Request

No Content
Cacheable

Yes

Content No Regular No
Prefetched Cache Full

Yes Yes
Purge objects
No Prefetch (FIS used)
Cache Full

Yes Store new


object in cache
Purge objects
(LRU used)

Store new
object in cache

Object Yes
frequently
accessed

No
Object available
to client access

Figure 5.8 Workflow of caching/prefetching system


132

The client can also take the responsibility of generating the predictions

on its own and use them to prefetch web objects from server. These downloaded

objects are then stored in prefetch cache. The web objects received based on

demand requests are then stored in regular cache.

The workflow of caching system that also incorporates prefetching

mechanism is illustrated in Figure 5.8. When a web object needs to be stored in

the cache, the caching system first verifies if the object is cacheable or not. If it is

cacheable, then it verifies whether it is a prefetched or demand requested object.

In case of prefetched object, it will be stored in prefetch cache by verifying

whether it is full or it has space to store the object. LRU algorithm is used to

purge objects from the prefetch cache. When objects residing in prefetch cache

are accessed frequently within a short time period, then they are moved to regular

cache. The regular cache is verified whether it can accommodate the objects

coming from prefetch cache or objects demand requested by user. When regular

cache is full, then objects are purged based on the outcome of FIS algorithm. The

objects stored in regular and prefetch cache is used to satisfy the client requests

with minimal latency. In case if the object is not cacheable, then they are

delivered directly to the client and displayed on the web browser.

The commonly used factors to determine the popularity of web objects

are: frequency, recency and object size. Object popularity is a good estimator for

verifying the cacheability of documents, since the objects that are more popular
133

have high probability of being referenced again by the user in near future

resulting in increased cache hit rate.

The requests are considered as cacheable based on the following factors:

· It must have a defined size in bytes that should be greater than

zero.

· It must use GET or HEAD method and the status code should be

200 (OK), 206 (Partial Content), 304 (Not Modified).

Dynamic requests are not cached, since they return unique objects

every time they are accessed by the user.

5.4.1 Fuzzy System - Input / Output

The input parameters to Fuzzy Inference System are labeled as {IP1 to

IP4} and the target output labeled as {OT}.

Table 5.1 Input Parameters to FIS

Variable Meaning
IP1 Recency of Web object
IP2 Frequency of Web object
IP3 Retrieval time of Web object
IP4 Size of Web object

Frequency and Recency for the objects are estimated based on the

sliding window mechanism discussed in (Romano and ElAarag 2011). Sliding

window of a request represents the time before and after the request is made.
134

Table 5.2 Symbols used with their meanings

Symbol Meaning
Oi requested object
∆Ti time period since object Oi was last requested
Fi Frequency of object Oi within sliding window
SWL Sliding Window length
OT Target Output

Recency (IP1) of object Oi computed as:

max (SWL, ∆Ti) if Oi was requested before


recency (Oi) =
SWL if Oi requested for the first time

If an object is requested for the first time, then its recency will be fixed

as SWL; else it will have the maximum value among SWL and ∆Ti.

Frequency (IP2) of object Oi computed as:

Fi +1 if ∆Ti £ SWL
frequency (Oi) =
Fi =1 if Oi accessed beyond SWL

Frequency of object (Oi) is incremented by 1 with respect to the

previous frequency value, if the request for Oi is within backward-looking SWL;

i.e. the time interval between the previous request and the new request is within

the bounds of backward-looking SWL. Else, the frequency value will be

reinitialized to 1.
135

Target output (OT) will be set to 1, if the object is re-requested again

within the forward looking sliding window; else, OT will be 0. The objective is to

use the information of web object requested in the past to predict its revisit in the

forward looking sliding window.

5.4.2 Managing Regular Cache

When a user requested object or an object transferred from prefetch

cache need to be stored in regular cache, it checks whether there is sufficient

storage space to accommodate the object. If storage space is available, then

object will be stored in the cache. Else, it decides to evict objects based on the

outcome generated by Fuzzy Inference System (FIS) algorithm. FIS takes the

input parameters of object and decides whether it can reside in the cache or it

should be purged. Input parameters are fuzzified by applying the bell

membership function and the aggregated output is defuzzified using the centroid

of area method.

When the object has high recency and frequency, then it has good

chance of residing in the cache. If the outcome from FIS has a value greater than

0.5, then it indicates that the object can reside in cache; else it can be purged. The

algorithm used for managing the contents of regular cache is as follows:


136

OP= object in prefetch cache


OR= object in regular cache
ON = New object
Begin
1. Object to be stored in regular cache.
2. Check if it is ON or OP
3. If (ON ) go to step 5
4. If (Op reference ≥ 2) in short duration, Op moved to regular cache.
5. If (size of ON / OP > available free space in regular cache) {
For each object OR in regular cache {
// apply FIS to find the popularity of object
If (popularity of OR ≥ 0.5)
OR.cache =1; // object resides in cache
else
OR.cache =0; // purge the object
}
Do {
Remove objects with OR.cache =0 from regular cache
} while (size of ON / OP > free space in regular cache)
6. Store ON / OP in regular cache;
Remove OP from prefetch cache;
End
137

5.5 IMPLEMENTATION

It discusses the training data used for simulation and the process

involved in extracting useful information to be given as input to the Fuzzy

Inference System.

5.5.1 Training Data

The BU Web trace containing the records of HTTP requests from

clients in Boston University Computer Science Department (BU Web Trace,

1995) is used for the simulation. The data collection consists of 9633 files

comprising 1,143,839 requests representing a population of about 762 different

users. The trace files contain sequence of web object requests that is served either

from local cache or from the network.

Each line in the log file represents unique URL requested by the user.

It consists of machine name, the timestamp when the request was made, User_ID

number, requested URL, size of the document and the object retrieval time in

seconds. If a log entry indicates the number of bytes as zero and the retrieval

delay as zero, then it indicates that the request was satisfied using the contents of

internal cache.

From the collection of requests representing a large number of users,

we randomly select the traces of 15 different users to be used in the simulation.


138

beaker 791129602 449542 "http://cs-www.bu.edu/" 2009 1.135154


beaker 791129603 747730 "http://cs-www.bu.edu/lib/pics/bu-logo.gif" 1805 0.301080
beaker 791129604 445737 "http://cs-www.bu.edu/lib/pics/bu-label.gif" 717 0.356171
beaker 791129611 867528 "http://cs-www.bu.edu/courses/Home.html" 3279 0.295710
beaker 791129660 367234 "http://cs-www.bu.edu/faculty/mcchen/cs320/" 721 0.470706
beaker 791129660 928492 "http://cs-www.bu.edu/icons/blank.xbm" 696 0.305217
beaker 791129661 485690 "http://cs-www.bu.edu/icons/back.xbm" 694 0.477853
beaker 791129662 205927 "http://cs-www.bu.edu/icons/text.xbm" 715 0.287997
beaker 791497224 96312 "http://cs-www.bu.edu/" 2087 0.774428
beaker 791497225 226976 "http://cs-www.bu.edu/lib/pics/bu-logo.gif" 1803 0.290951
beaker 791497225 996915 "http://cs-www.bu.edu/lib/pics/bu-label.gif" 715 0.357485
beaker 791497229 937950 "http://cs-www.bu.edu/faculty/Home.html" 1700 0.408853
beaker 791497232 451959 "http://cs-www.bu.edu/faculty/heddaya/Home.html" 1576 0.738620

beaker 791497233 294701


"http://cs-www.bu.edu/faculty/heddaya/Images/MyPhotos/recursive.gif" 13851 0.443401

beaker 791497243 48300 "http://cs-www.bu.edu/faculty/heddaya/navigation.html" 7131


0.322925

beaker 791497257 357990 "http://nearnet.gnn.com/gnn/arcade/comix/graphics/Dilbert.gif"


9893 4.110718

Figure 5.9 Sample Log File of a client used for Preprocessing

5.5.2 Data Preprocessing

The log files to be used for simulation undergo preprocessing to

extract useful information that reflects user navigational behavior. Figure 5.9

represents the sample log file that is used for preprocessing operation. The

processed file with valid information is then used for simulation.


139

Steps involved in preprocessing are:

· Parse the log file to identify distinct fields in each record entry and

to track the boundaries between successive records stored in the

file.

· Assign unique identifier (URL_ID) to each URL that helps to track

the events easily during simulation.

· Extract the useful fields from each line in the log file to be used for

analysis.

The output file generated after preprocessing the log file contains the

following fields for each request entry:

· Requested URL

· Unique ID assigned to each URL (URL_ID)

· Timestamp of the request

· Delay time

· Size of the requested object

Table 5.3 represents the sample preprocessed data created from the log

file that will be used to obtain the training data to be given as input to the Fuzzy

Inference System.
140

Table 5.3 Preprocessed data from the log file

URL URL_ID Timestamp Delay Size


Time (ms) (bytes)
http://cs-www.bu.edu/ 1 791129602 1135 2009

http://cs-www.bu.edu/ 1 791129673 367 2009

http://www.wired.com/ 2 791129783 837 941

http://www.wired.com/Images/spacer.gif 3 791129785 503 277

http://cs-www.bu.edu/ 1 791497224 774 2087

http://cs-www.bu.edu/pics/bu-logo.gif 4 791497225 290 1803

http://cs-www.bu.edu/pics/bu-label.gif 5 791497225 357 715

http://cs-www.bu.edu/faculty/home.html 6 791497229 408 1700

http://cs- 7 791497735 969 7966


www.bu.edu/faculty/best/BestWeb.html
http://cs-www.bu.edu/pics/bu-logo.gif 4 791497790 208 1803

http://cs-www.bu.edu/pics/bu-label.gif 5 791497825 317 715

http://www.wired.com/ 2 791429783 737 941

The information shown in Table 5.3 is further processed to create the

training data as shown in Table 5.4. The recency and frequency values are

assigned based on the sliding window mechanism discussed in section 5.4.1.

Time period for sliding window length (SWL) in both the forward and backward

scenario is taken as 20 minutes to simulate the user browsing patterns. Since the
141

user tends to change browsing patterns often and they may have short browsing

sessions, we fix SWL to be 20 minutes (i.e. 1200sec) for the simulation.

Table 5.4 Training data created from preprocessed file

Inputs Target
Recency Frequency Retrieval Size
Time (ms) (bytes)

1200 1 1135 2009 1


1200 2 367 2009 0
1200 1 837 941 0
1200 1 503 277 0
3675.5 1 774 2087 0
1200 1 290 1803 1
1200 1 357 715 1
1200 1 408 1700 0
1200 1 969 7966 0
1200 2 208 1803 0
1200 2 317 715 0
3000 1 737 941 0

When an object is requested for the first time or its re-request is within

the SWL length, then its recency will be set as 1200. If the time difference

between the new request and the previous request to an object is greater than the

SWL, then its recency will be the value representing the time difference.
142

The frequency for the object will be set to 1, if it is requested for the

first time. If object is re-requested within the SWL length, then its frequency is

incremented by 1; else, the frequency of object will be re-initialized to 1

irrespective of its previous value. The target output will be set to 1, if an object

has future reference within the forward SWL length; else its value is set to 0.

5.6 PERFORMANCE EVALUATION

Trace driven simulations are used to evaluate the performance of cache

replacement policies. The storage space allocated for cache in the client machine

will be distributed equally between the regular and prefetch cache; i.e. 50% of

the total capacity allocated to regular cache and remaining 50% to the prefetch

cache. To simulate the prefetching of objects, client based prediction and

prefetching discussed in chapter 3 is used. When user requested web page is

displayed on the browser, predictions are made for that page and objects are

prefetched and stored in prefetch cache. If the objects in prefetch cache are

requested frequently, then they are moved to the regular cache.

5.6.1 Performance Metrics

The effectiveness of replacement algorithm in improving the

performance of web caching and prefetching is evaluated using the metrics: Hit

Rate (HR) and Byte Hit Rate (BHR). HR represents the percentage of user

requests served using the objects available in cache. It characterizes

improvement in availability and minimization of user latency. BHR represents


143

the percentage of bytes served from cache against the total number of bytes

requested by users. It characterizes the reduction in network traffic and easing of

link congestion. Increase in HR significantly contributes to the improvement in

latency savings (Zhu and Hu 2007, Shi et al 2006).

Important point to note is that the Hit Rate and Byte Hit Rate cannot

be optimized for at the same time (podlipnig 2003). Strategies that optimize Hit

Rate give preference to smaller sized objects, which tend to decrease the Byte Hit

Rate by giving less preference to larger objects.

5.6.2 Experimental Results

The performance of proposed scheme (FIS-LRU) compared with most

common replacement policies: LRU and LFU in terms of HR and BHR. In LRU,

least recently used objects are removed first and it is a simple and efficient

scheme for uniform sized objects. In LFU, least frequently used objects are

removed first and its advantage is its simplicity. It is also compared with

NNPCR-2 (Romano and ElAarag 2011) an intelligent web caching approach that

uses Back-Propagation Neural Network (BPNN) in making replacement

decisions.

The algorithms are simulated by varying the cache size from 10MB to

100MB. Log files of 15 different users are split into three groups: user (1 to 5) in

Group A, user (6 to 10) in Group B, user (11 to 15) in Group C. HR and BHR
144

for each group are evaluated separately to analyze the behavior of algorithms that

uses the traces of different set of users in each group.

Figure 5.10 represents the hit rate of different polices using the traces

of Group-A (user 1 to 5). Figure 5.11 represents the hit rate using traces of

Group-B (user 6 to 10). Figure 5.12 represents hit rate using traces of Group-C

(user 11 to 15).

Group A

70
65
Hit Ratio (%)

LRU
60
LFU
55
NNPCR-2
50
FIS-LRU
45
40
10 20 40 60 80 100
Cache Size (MB)

Figure 5.10 Hit Ratio using Traces of Group-A (user 1 to 5)

When cache size increases, it improves the HR for all the replacement

policies. It is due to the fact that the cache can store large number of web objects

to satisfy the user requests. As observed in the graphs of different traces, the HR

of proposed scheme (FIS-LRU) is better than the other approaches. LFU

produces the least HR due to cache pollution. The performance of FIS-LRU is


145

better than NNPCR-2 in most cases and in few the results match with that of

NNPCR-2.

Group-B

70
65
Hit Ratio (%)

LRU
60
LFU
55
NNPCR-2
50 FIS-LRU
45
40
10 20 40 60 80 100
Cache Size (MB)

Figure 5.11 Hit Ratio using Traces of Group-B (user 6 to 10)

Group-C

70
65
Hit Ratio (%)

60 LRU
LFU
55
NNPCR-2
50
FIS-LRU
45
40
10 20 40 60 80 100
Cache Size (MB)

Figure 5.12 Hit Ratio using Traces of Group-C (user 11 to 15)


146

Group-A

50

Byte Hit Ratio (%) 45


40 LRU
LFU
35
NNPCR-2
30 FIS-LRU
25
20
10 20 40 60 80 100
Cache Size (MB)

Figure 5.13 Byte Hit Ratio using Traces of Group-A (user 1 to 5)

Group-B

50
45
Byte Hit Ratio (%)

LRU
40
LFU
35
NNPCR-2
30 FIS-LRU
25
20
10 20 40 60 80 100
Cache Size (MB)

Figure 5.14 Byte Hit Ratio using Traces of Group-B (user 6 to 10)
147

Group-C

50
Byte Hit Ratio (%) 45
40 LRU
LFU
35
NNPCR-2
30
FIS-LRU
25
20
10 20 40 60 80 100
Cache Size (MB)

Figure 5.15 Byte Hit Ratio using Traces of Group-C (user 11 to 15)

Byte Hit Rate of different policies using the traces of Group-A is

shown in Figure 5.13, using the traces of Group-B is shown in Figure 5.14 and

using the traces of Group-C is shown in Figure 5.15. As observed in these graphs,

BHR of the proposed scheme (FIS-LRU) is better in all the cases when compared

to other replacement policies.

5.7 CONCLUSION

This chapter discusses a cache replacement scheme that efficiently

manages the client-side cache, which is partitioned into regular and prefetch

cache for handling the web caching and prefetching. The proposed scheme uses

Fuzzy Inference System (FIS) based algorithm for managing the contents of

regular cache and LRU algorithm for managing the contents of prefetch cache.

When objects stored in prefetch cache are frequently accessed by users, then they
148

are moved to regular cache where they are managed efficiently based on the

outcome of FIS algorithm. The scheme helps to retain useful objects for longer

time period while effectively removing the unwanted objects from the cache.

The performance of proposed scheme (FIS-LRU) in terms of HR and

BHR is compared with various algorithms (LRU, LFU and NNPCR-2), where

LRU and LFU are basic algorithms and NNPCR-2 is an intelligent algorithm

based on back-propagation neural network. HR and BHR for the proposed

scheme are computed by considering both the regular and prefetch cache.

Results clearly indicate that the proposed scheme (FIS-LRU) outperforms other

algorithms in terms of HR and BHR.


149

CHAPTER 6

CONCLUSION

Web caching and prefetching techniques have been designed and used

primarily to reduce the user perceived latency. Web prefetching employs

prediction techniques to accurately speculate the user future requests in advance

and download the objects before user actually demand requests them. It alleviates

the problems encountered in web caching. Several researchers over the years

have investigated various issues associated with web prefetching to provide

solutions for reducing the latency. It has been observed that a fast and accurate

prediction is crucial for improving the prefetching performance. Prefetching

techniques could prefetch large number of web objects, when there is increase in

bandwidth availability for the users.

Contributions made in this thesis are:

In first contribution, the focus is on using the information associated

with hyperlinks embedded in web pages to generate the predictions. Both

prediction and prefetching engine are deployed in the client machine and the

access patterns are observed when user views web pages in the browser. Two

new approaches (Naïve Bayes and Fuzzy Logic) have been proposed to generate

the predictions. When user views a webpage, the hyperlinks in that page are
150

prioritized based on the computations by Naïve Bayes and Fuzzy Logic.

Hyperlinks with high priority value forms part of the prediction list (hints) that is

used by the prefetching engine to download web objects during browser idle time.

User accessed repository is used to store information of hyperlinks used to

navigate the web pages. Predicted unused repository is used to store information

of unused hyperlinks and it provides feedback to the prediction engine to fine

tune its predictions. Both the approaches could generate effective predictions to

minimize the access latency. The approaches would be effective when user has

focused browsing patterns looking for information related to specific topic

instead of randomly viewing unrelated web pages.

The second contribution is focused on server based predictions, where

the user access patterns recorded in server log files is used to build Precedence

Graph based on which the predictions are generated. Precedence Graph

effectively records the predecessor and successor relationship between the user

requests. It has less number of arcs in the graph compared to the existing

algorithms (DG and DDG). Graph trimming has been employed to keep the size

of graph to manageable limits and to avoid useless information residing in the

graph. Server intimates the predictions (hints) to the client through HTTP

response headers that is easily recognized by the browser. During idle time,

client uses the predictions to download web objects and store them in cache to

serve the users with minimal latency.


151

Final contribution is focused on cache replacement scheme to

effectively manage the client side cache that is used to support both caching and

prefetching. The cache is partitioned into two parts: regular cache (for caching)

and prefetch cache (for prefetching). Regular cache is managed using Fuzzy

Inference System (FIS) based replacement algorithm and prefetch cache is

managed using LRU algorithm. The objective is to retain the useful objects for

longer time duration and effectively remove the unwanted objects from cache to

improve caching and prefetching. Benefits of prefetching could be fully utilized

when the frequently accessed prefetched objects are properly retained in the

cache for longer time period to maximize the hit rate.

6.1 SUGGESTIONS FOR FUTURE WORK

o To investigate the semantic characteristics of web pages based on

the new HTML standard, which could be utilized to build effective

prediction system at the client for providing better personalized

services to the individual users.

o Exploring the different possibilities of effectively collaborating the

user access patterns with web page content to build simple and

efficient data structures that can be used to generate predictions. It

could cater to dynamically generated web pages, multimedia

contents that are increasingly used in the current web.


152

o Designing cache replacement policies with other machine learning

techniques that could effectively cater to the demands of

prefetching.

o Designing a feedback system that could inform the prediction

engine located in any part of the web architecture about the

prefetched objects being accessed by the user to improve its

performance.

o Methods discussed in the thesis could be tested for Big Data to

check the performance and adaptability of the proposed approaches.

Further, theoretical evaluation could be performed and its outcome

compared with the experimental results.


163

LIST OF PUBLICATIONS

INTERNATIONAL JOURNALS

1. Venketesh.P and Venkatesan.R, “A Survey on Applications of Neural


Networks and Evolutionary Techniques in Web Caching”, IETE Technical
Review, Vol.26, Issue 3, pp.171-180, 2009. (Impact Factor: 0.724)

2. Venketesh.P, Venkatesan.R and Arunprakash.L, “Semantic Web


Prefetching Scheme using Naïve Bayes Classifier”, International Journal of
Computer Science and Applications, Vol. 7, No. 1, pp. 66 – 78, 2010. (SJR-
SCImago Journal Rank: 0.029)

3. Venketesh.P and Venkatesan.R, “Graph based Prediction Model to


Improve Web Prefetching”, International Journal of Computer Applications,
Vol.36, No.10, pp.37-43, 2011.

4. Venketesh.P and Venkatesan.R, “Adaptive Web Prefetching Scheme using


Link Anchor Information”, International Journal of Applied Information
Systems, Vol.2, No.1, pp.39-46, 2012.

5. Venketesh.P and Venkatesan.R, “Effective Web Cache Replacement


Scheme to Support Caching and Prefetching”, International Journal of Web
Science, Inderscience Publishers (Communicated).
Low Latency via Redundancy

Ashish Vulimiri P. Brighten Godfrey Radhika Mittal


UIUC UIUC UC Berkeley
vulimir1@illinois.edu pbg@illinois.edu radhika@eecs.berkeley.edu

Justine Sherry Sylvia Ratnasamy Scott Shenker


UC Berkeley UC Berkeley UC Berkeley and ICSI
justine@eecs.berkeley.edu sylvia@eecs.berkeley.edu shenker@icsi.berkeley.edu

ABSTRACT Achieving consistent low latency is challenging. Modern


Low latency is critical for interactive networked applications. applications are highly distributed, and likely to get more
But while we know how to scale systems to increase capacity, so as cloud computing separates users from their data and
reducing latency — especially the tail of the latency distribu- computation. Moreover, application-level operations often
tion — can be much more difficult. In this paper, we argue require tens or hundreds of tasks to complete — due to many
that the use of redundancy is an effective way to convert ex- objects comprising a single web page [25], or aggregation of
tra capacity into reduced latency. By initiating redundant many back-end queries to produce a front-end result [2, 14].
operations across diverse resources and using the first result This means individual tasks may have latency budgets on
which completes, redundancy improves a system’s latency the order of a few milliseconds or tens of milliseconds, and
even under exceptional conditions. We study the tradeoff the tail of the latency distribution is critical. Such outliers
with added system utilization, characterizing the situations are difficult to eliminate because they have many sources in
in which replicating all tasks reduces mean latency. We then complex systems; even in a well-provisioned system where
demonstrate empirically that replicating all operations can individual operations usually work, some amount of uncer-
result in significant mean and tail latency reduction in real- tainty is pervasive. Thus, latency is a difficult challenge for
world systems including DNS queries, database servers, and networked systems: How do we make the other side of the
packet forwarding within networks. world feel like it is right here, even under exceptional condi-
tions?
One powerful technique to reduce latency is redundancy:
Categories and Subject Descriptors Initiate an operation multiple times, using as diverse re-
C.2.0 [Computer-Communication Networks]: General sources as possible, and use the first result which completes.
Consider a host that queries multiple DNS servers in paral-
Keywords lel to resolve a name. The overall latency is the minimum
of the delays across each query, thus potentially reducing
Latency; Reliability; Performance both the mean and the tail of the latency distribution. For
example, a replicated DNS query could mask spikes in la-
1. INTRODUCTION tency due to a cache miss, network congestion, packet loss,
Low latency is important for humans. Even slightly higher a slow server, and so on. The power of this technique is
web page load times can significantly reduce visits from users that it reduces latency precisely under the most challenging
and revenue, as demonstrated by several sites [28]. For ex- conditions—when delays or failures are unpredictable—and
ample, injecting just 400 milliseconds of artificial delay into it does so without needing any information about what these
Google search results caused the delayed users to perform conditions might be.
0.74% fewer searches after 4-6 weeks [9]. A 500 millisecond Redundancy has been employed to reduce latency in sev-
delay in the Bing search engine reduced revenue per user by eral networked systems: notably, as a way to deal with
1.2%, or 4.3% with a 2-second delay [28]. Human-computer failures in DTNs [21], in a multi-homed web proxy over-
interaction studies similarly show that people react to small lay [5], and in limited cases in distributed job execution
differences in the delay of operations (see [17] and references frameworks [4, 15, 32].
therein). However, these systems are exceptions rather than the
rule. Redundant queries are typically eschewed, whether
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
across the Internet or within data centers. The reason is
for profit or commercial advantage and that copies bear this notice and the full citation rather obvious: duplicating every operation doubles system
on the first page. Copyrights for components of this work owned by others than the utilization, or increases usage fees for bandwidth and com-
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or putation. The default assumption in system design is that
republish, to post on servers or to redistribute to lists, requires prior specific permission doing less work is best.
and/or a fee. Request permissions from permissions@acm.org. But when exactly is that natural assumption valid? De-
CoNEXT’13, December 9-12, 2013, Santa Barbara, California, USA.
Copyright is held by the owner/author(s). Publication rights licensed to ACM.
spite the fact that redundancy is a fundamental technique
ACM 978-1-4503-2101-3/13/12 ...$15.00. that has been used in certain systems to reduce latency, the
http://dx.doi.org/10.1145/2535372.2535392.
conditions under which it is effective are not well understood and file sizes are small, replication provides substan-
— and we believe as a result, it is not widely used. tial latency reduction of up to 2× in the mean and
up to 8× in the tail. As predicted by our analysis,
In this paper, we argue that redundancy is an effective mean latency is reduced up to a server-side threshold
general technique to achieve low latency in networked sys- load of 30-40%. We also show that when retrieved
tems. Our results show that redundancy could be used much files become large or the database resides in memory,
more commonly than it is, and in many systems represents replication does not offer a benefit. This occurs across
a missed opportunity. both a web service database and the memcached in-
Making that argument requires an understanding of when memory database, and is consistent with our analysis:
replication improves latency and when it does not. Con- in both cases (large or in-memory files), the client-side
sider a system with a fixed set of servers, in which queries cost of replication becomes significant relative to the
are relatively inexpensive for clients to send. If a single client mean query latency.
duplicates its queries, its latency is likely to decrease, but it
also affects other users in the system to some degree. If all • In-network packet replication. We design a simple
clients duplicate every query, then every client has the ben- strategy for switches, to replicate the initial packets of
efit of receiving the faster of two responses (thus decreasing a flow but treat them as lower priority. This offers
mean latency) but system utilization has doubled (thus in- an alternate mechanism to limit the negative effect of
creasing mean latency). It is not immediately obvious under increased utilization, and simulations indicate it can
what conditions the former or latter effect dominates. yield up to a 38% median end-to-end latency reduction
Our first key contribution is to characterize when such for short flows.
global redundancy improves latency. We introduce a queue-
In summary, as system designers we typically build scal-
ing model of query replication, giving an analysis of the ex-
able systems by avoiding unnecessary work. The significance
pected response time as a function of system utilization and
of our results is to characterize a large class of cases in which
server-side service time distribution. Our analysis and ex-
duplicated work is a useful and elegant way to achieve ro-
tensive simulations demonstrate that assuming the client-
bustness to variable conditions and thus reduce latency.
side cost of replication is low, there is a server-side threshold
load below which replication always improves mean latency.
We give a crisp conjecture, with substantial evidence, that 2. SYSTEM VIEW
this threshold always lies between 25% and 50% utilization In this section we characterize the tradeoff between the
regardless of the service time distribution, and that it can benefit (fastest of multiple options) and the cost (doing more
approach 50% arbitrarily closely as variance in service time work) due to redundancy from the perspective of a system
increases. Our results indicate that redundancy should have designer optimizing a fixed set of resources. We analyze this
a net positive impact in a large class of systems, despite the tradeoff in an abstract queueing model (§2.1) and evaluate
extra load that it adds. it empirically in two applications: a disk-backed database
While our analysis only addresses mean latency, we believe (§2.2) and an in-memory cache (§2.3). We then discuss a
(and our experimental results below will demonstrate) that setting in which the cost of overhead can be eliminated:
redundancy improves both the mean and the tail. a data center network capable of deprioritizing redundant
Our second key contribution is to demonstrate multiple traffic (§2.4).
practical application scenarios in which replication empiri- §3 considers the scenario where the available resources are
cally provides substantial benefit, yet is not generally used provisioned according to payment, rather than static.
today. These scenarios, along with scenarios in which repli-
cation is not effective, corroborate the results of our analysis.
2.1 System view: Queueing analysis
More specifically: Two factors are at play in a system with redundancy.
Replication reduces latency by taking the faster of two (or
• DNS queries across the wide area. Querying mul- more) options to complete, but it also worsens latency by in-
tiple DNS servers reduces the fraction of responses creasing the overall utilization. In this section, we study the
later than 500 ms by 6.5×, while the fraction later interaction between these two factors in an abstract queue-
than 1.5 sec is reduced by 50×, compared with a non- ing model.
replicated query to the best individual DNS server. Al- We assume a set of N independent, identical servers, each
though this incurs added load on DNS servers, replica- with the same service time distribution S. Requests arrive in
tion saves more than 100 msec per KB of added traffic, the system according to a Poisson process, and k copies are
so that it is more than an order of magnitude bet- made of each arriving request and enqueued at k of the N
ter than an estimated cost-effectiveness threshold [29, servers, chosen uniformly at random. To start with, we will
30]. Similarly, a simple analysis indicates that repli- assume that redundancy is “free” for the clients — that it
cating TCP connection establishment packets can save adds no appreciable penalty apart from an increase in server
roughly 170 msec (in the mean) and 880 msec (in the utilization. We consider the effect of client-side overhead
tail) per KB of added traffic. later in this section.
Figures 1(a) and 1(b) show results from a simulation of
• Database queries within a data center. We im- this queueing model, measuring the mean response time
plement query replication in a database system similar (queueing delay + service time) as a function of load with
to a web service, where a set of clients continually read two different service time distributions. Replication im-
objects from a set of back-end servers. Our results in- proves the mean, but provides the greatest benefit in the
dicate that when most queries are served from disk tail, for example reducing the 99.9th percentile by 5× under
Deterministic service time Pareto service time Pareto service time:
CDF at load 0.2
Mean response time (s)

Mean response time (s)


2.5
1.4 1
2

than threshold
1 copy

Fraction later
1.3 0.1
1.5 0.01 2 copies
1.2
1 0.001
1.1 1 copy 1 copy 0.0001
2 copies 0.5 2 copies 1e-05
1
0 1e-06
0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5 1 10 100 1000
Load Load Response time (s)

(a) Mean: deterministic (b) Mean: Pareto (c) CDF: Pareto

Figure 1: A first example of the effect of replication, showing response times when service time distribution
is deterministic and Pareto (α = 2.1)

Pareto service times. Note the thresholding effect: in both Theorem 1. Within the independence approximation, if
systems, there is a threshold load below which redundancy the service times at every server are i.i.d. exponentially dis-
always helps improve mean latency, but beyond which the tributed, the threshold load is 33%.
extra load it adds overwhelms any latency reduction that it
Proof. Assume, without loss of generality, that the mean
achieves. The threshold is higher — i.e., redundancy helps
service time at each server is 1 second. Suppose requests
over a larger range of loads — when the service time distri-
arrive at a rate of ρ queries per second per server.
bution is more variable.
Without replication, each server evolves as an M/M/1
The threshold load, defined formally as the largest uti-
queue with departure rate 1 and arrival rate ρ. The re-
lization below which replicating every request to 2 servers
sponse time of each server is therefore exponentially dis-
always helps mean response time, will be our metric of in-
tributed with rate 1 − ρ [6], and the mean response time is
terest in this section. We investigate the effect of the service 1
.
time distribution on the threshold load both analytically and 1−ρ
in simulations of the queueing model. Our results, in brief: With replication, each server is an M/M/1 queue with
departure rate 1 and arrival rate 2ρ. The response time of
1. If redundancy adds no client-side cost (meaning server- each server is exponentially distributed with rate 1 − 2ρ,
side effects are all that matter), there is strong evi- but each query now takes the minimum of two independent
dence to suggest that no matter what the service time samples from this distribution, so that the mean response
distribution, the threshold load has to be more than 1
time of each query is 2(1−2ρ) .
25%. Now replication results in a smaller response time if and
1 1
2. In general, the higher the variability in the service-time only if 2(1−2ρ) < 1−ρ , i.e., when ρ < 31 .
distribution, the larger the performance improvement
achieved. While we focus on the k = 2 case in this section, the
analysis in this theorem can be easily extended to arbitrary
3. Client-side overhead can diminish the performance im- levels of replication k.
provement due to redundancy. In particular, the thresh- Note that in this special case, since the response times are
old load can go below 25% if redundancy adds a client- exponentially distributed, the fact that replication improves
side processing overhead that is significant compared mean response time automatically implies a stronger distri-
to the server-side service time. butional dominance result: replication also improves the pth
If redundancy adds no client-side cost percentile response time for every p. However, in general,
an improvement in the mean does not automatically imply
Our analytical results rely on a simplifying approximation: stochastic dominance.
we assume that the states of the queues at the servers evolve
completely independently of each other, so that the average In the general service time case, two natural (service-time
response time for a replicated query can be computed by independent) bounds on the threshold load exist.
taking the average of the minimum of two independent sam- First, the threshold load cannot exceed 50% load in any
ples of the response time distribution at each server. This system. This is easy to see: if the base load is above 50%,
is not quite accurate because of the correlation introduced replication would push total load above 100%. It turns out
by replicated arrivals, but we believe this is a reasonable that this trivial upper bound is tight — there are fami-
approximation when the number of servers N is sufficiently lies of heavy-tailed high-variance service times for which the
large. In a range of service time distributions, we found that threshold load goes arbitrarily close to 50%. See Figures 2(a)
the mean response time computed using this approximation and 2(b).
was within 3% of the value observed in simulations with Second, we intuitively expect replication to help more as
N = 10, and within 0.1% of the value observed in simula- the service time distribution becomes more variable. Fig-
tions with N = 20. ure 2 validates this trend in three different families of distri-
We start with a simple, analytically-tractable special case: butions. Therefore, it is reasonable to expect that the worst-
when the service times at each server are exponentially dis- case for replication is when the service time is completely
tributed. A closed form expression for the response time deterministic. However, even in this case the threshold load
CDF exists in this case, and it can be used to establish the is strictly positive because there is still variability in the sys-
following result. tem due to the stochastic nature of the arrival process. With
Weibull service times Pareto service times Simple two-point service time distribution
0.5 0.5 0.5

0.4 0.4 0.4


Threshold load

Threshold load

Threshold load
0.3 0.3 0.3

0.2 0.2 0.2

0.1 0.1 0.1

0 0 0
0 2 4 6 8 10 12 14 16 18 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Inverse shape parameter 𝛾 Inverse scale paramemter 𝛽 p

(a) Weibull (b) Pareto (c) Two-point discrete distribution

Figure 2: Effect of increasing variance on the threshold load in three families of unit-mean distributions:
Pareto, Weibull, and a simple two-point discrete distribution (service time = 0.5 with probability p, 1−0.5p
1−p
with probability 1 − p). In all three cases the variance is 0 at x = 0 and increases along the x-axis, going to
infinity at the right edge of the plot.

the Poisson arrivals that we assume, the threshold load with 0.5
deterministic service time turns out to be slightly less than Conjectured lower bound
0.4 Uniform
26% — more precisely, ≈ 25.82% — based on simulations
Dirichlet

Threshold load
of the queueing model, as shown in the leftmost point in
Figure 2(c). 0.3
We conjecture that this is, in fact, a lower bound on the
threshold load in an arbitrary system. 0.2

Conjecture 1. Deterministic service time is the worst 0.1


case for replication: there is no service time distribution in
which the threshold load is below the (≈ 26%) threshold when 0
the service time is deterministic. 1 2 4 8 16 32 64 128 256 512
Size of distribution support
The primary difficulty in resolving the conjecture is that
general response time distributions are hard to handle an- Figure 3: Randomly chosen service time distribu-
alytically, especially since in order to quantify the effect of tions
taking the minimum of two samples we need to understand
the shape of the entire distribution, not just its first few mo-
ments. However, we have two forms of evidence that seem Myers and Vernon [23], the threshold load is minimized when
to support this conjecture: analyses based on approxima- the service time distribution is deterministic.
tions to the response time distribution, and simulations of
the queueing model. The heavy-tail approximation by Olvera-Cravioto et al. [24]
The primary approximation that we use is a recent re- applies to arbitrary regularly varying service time distribu-
sult by Myers and Vernon [23] that only depends on the tions, but for our analysis we add an additional assumption
first two moments of the service time distribution. The ap- requiring that the service time be sufficiently heavy. For-
proximation seems to perform fairly well in numerical eval- mally, we require that the service time distribution have a
uations with light-tailed service time distributions, such as higher coefficient of variation than the exponential distribu-
the Erlang and hyperexponential distributions (see Figure 2 tion, which
√ amounts to requiring that the tail index α be
in [23]), although no bounds on the approximation error are < 1 + 2. (The tail index is a measure of how heavy a
available. However, the authors note that the approxima- distribution is: lower indices mean heavier tails.)
tion is likely to be inappropriate when the service times are Theorem 3. Within the independence approximation and
heavy tailed. the approximation due to Olvera-Cravioto et al. [24], if the
As a supplement, therefore, in the heavy-tailed case, we service time
√ distribution is regularly varying with tail index
use an approximation by Olvera-Cravioto et al. [24] that α < 1 + 2, then the threshold load is > 30%.
is applicable when the service times are regularly varying1 .
Heavy-tail approximations are fairly well established in queue- Simulation results also seem to support the conjecture.
ing theory (see [26, 33]); the result due to Olvera-Cravioto We generated a range of service time distributions by, for
et al. is, to the best of our knowledge, the most recent (and various values of S, sampling from the space of all unit-mean
most accurate) refinement. discrete probability distributions with support {1, 2, ..., S}
The following theorems summarize our results for these in two different ways — uniformly at random, and using a
approximations. We omit the proofs due to space constraints. symmetric Dirichlet distribution with concentration param-
eter 0.1 (the Dirichlet distribution has a higher variance and
Theorem 2. Within the independence approximation and generates a larger spread of distributions than uniform sam-
the approximation of the response time distribution due to pling). Figure 3 reports results when we generate a 1000
1
The class of regularly varying distributions is an important different random distributions for each value of S and look
subset of the class of heavy-tailed distributions that includes at the minimum and maximum observed threshold load over
as its members the Pareto and the log-Gamma distributions. this set of samples.
0.5 ate requests according to identical Poisson processes. Each
Pareto request downloads a file chosen uniformly at random from
0.4 Exponential the entire collection. We only test read performance on a
Deterministic
Threshold load

static data set; we do not consider writes or updates.


0.3 Figure 5 shows results for one particular web-server con-
figuration, with
0.2
• Mean file size = 4 KB
0.1
• File size distribution = deterministic, 4 KB per file
0
0 0.2 0.4 0.6 0.8 1 • Cache:disk ratio = 0.1
Extra latency per request added by replication • Server/client hardware = 4 servers and 10 clients, all
(as fraction of mean service time) identical single-core Emulab nodes with 3 GHz CPU,
2 GB RAM, gigabit network interfaces, and 10k RPM
Figure 4: Effect of redundancy-induced client-side disks.
latency overhead, with different server service time
distributions. Disk is the bottleneck in the majority of our experiments –
CPU and network usage are always well below peak capacity.
The threshold load (the maximum load below which repli-
Effect of client-side overhead cation always helps) is 30% in this setup — within the 25-
50% range predicted by the queueing analysis. Redundancy
As we noted earlier, our analysis so far assumes that the reduces mean latency by 33% at 10% load and by 25% at
client-side overhead (e.g. added CPU utilization, kernel pro- 20% load. Most of the improvement comes from the tail.
cessing, network overhead) involved in processing the repli- At 20% load, for instance, replication cuts 99th percentile
cated requests is negligible. This may not be the case when, latency in half, from 150 ms to 75 ms, and reduces 99.9th
for instance, the operations in question involve large file percentile latency 2.2×.
transfers or very quick memory accesses. In both cases, the The experiments in subsequent figures (Figures 6-11) vary
client-side latency overhead involved in processing an addi- one of the above configuration parameters at a time, keeping
tional replicated copy of a request would be comparable in the others fixed. We note three observations.
magnitude to the server latency for processing the request. First, as long as we ensure that file sizes continue to re-
This overhead can partially or completely counteract the main relatively small, changing the mean file size (Figure 6)
latency improvement due to redundancy. Figure 4 quanti- or the shape of the file size distribution (Figure 7) does not
fies this effect by considering what happens when replication siginificantly alter the level of improvement that we observe.
adds a fixed latency penalty to every request. These results This is because the primary bottleneck is the latency in-
indicate that the more variable distributions are more for- volved in locating the file on disk — when file sizes are small,
giving of overhead, but client side overhead must be at least the time needed to actually load the file from disk (which
somewhat smaller than mean request latency in order for is what the specifics of the file size distribution affect) is
replication to improve mean latency. This is not surprising, negligible.
of course: if replication overhead equals mean latency, repli- Second, as predicted in our queueing model (§2.1), in-
cation cannot improve mean latency for any service time creasing the variability in the system causes redundancy to
distribution — though it may still improve the tail. perform better. We tried increasing variability in two dif-
ferent ways — increasing the proportion of access hitting
2.2 Application: disk-backed database disk by reducing the cache-to-disk ratio (Figure 8), and run-
Many data center applications involve the use of a large ning on a public cloud (EC2) instead of dedicated hardware
disk-based data store that is accessed via a smaller main- (Figure 9). The increase in improvement is relatively minor,
memory cache: examples include the Google AppEngine although still noticeable, when we reduce the cache-to-disk
data store [16], Apache Cassandra [10], and Facebook’s Haystack ratio. The benefit is most visible in the tail: the 99.9th per-
image store [7]. In this section we study a representative centile latency improvement at 10% load goes up from 2.3×
implementation of such a storage service: a set of Apache in the base configuration to 2.8× when we use the smaller
web servers hosting a large collection of files, split across the cache-to-disk ratio, and from 2.2× to 2.5× at 20% load.
servers via consistent hashing, with the Linux kernel man- The improvement is rather more dramatic when going
aging a disk cache on each server. from Emulab to EC2. Redundancy cuts the mean response
We deploy a set of Apache servers and, using a light-weight time at 10-20% load on EC2 in half, from 12 ms to 6 ms
memory-soaking process, adjust the memory usage on each (compare to the 1.3 − 1.5× reduction on Emulab). The tail
server node so that around half the main memory is avail- improvement is even larger: on EC2, the 99.9th percentile
able for the Linux disk cache (the other half being used by latency at 10-20% load drops 8× when we use redundancy,
other applications and the kernel). We then populate the from around 160 ms to 20 ms. It is noteworthy that the
servers with a collection of files whose total size is chosen worst 0.1% of outliers with replication are quite close to the
to achieve a preset target cache-to-disk ratio. The files are 12 ms mean without replication!
partitioned across servers via consistent hashing, and two Third, as also predicted in §2.1, redundancy ceases to help
copies are stored of every file: if the primary is stored on when the client-side overhead due to replication is a signif-
server n, the (replicated) secondary goes to server n + 1. We icant fraction of the mean service time, as is the case when
measure the response time when a set of client nodes gener- the file sizes are very large (Figure 10) or when the cache
Mean response time 99.9th %ile response time Load 0.2: CDF
Response time (ms)

Response time (ms)


40 1

than threshold
Fraction later
30 600 0.1
400 0.01
20
1 copy 1 copy 0.001 1 copy
10 200 0.0001
2 copies 2 copies 2 copies
0 0 1e-05
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 10 100 1000
Load Load Response time (ms)

Figure 5: Base configuration

Mean response time 99.9th %ile response time Load 0.2: CDF
Response time (ms)

Response time (ms)


40 1

than threshold
Fraction later
30 600 0.1
20 400 0.01
1 copy 200 1 copy 1 copy
10 0.001
2 copies 2 copies 2 copies
0 0 0.0001
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 10 100 1000
Load Load Response time (ms)

Figure 6: Mean file size 0.04 KB instead of 4 KB

Mean response time 99.9th %ile response time Load 0.2: CDF
Response time (ms)

Response time (ms)

40 1

than threshold
Fraction later
30 600 0.1
20 400 0.01
1 copy 200 1 copy 1 copy
10 0.001
2 copies 2 copies 2 copies
0 0 0.0001
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 10 100 1000
Load Load Response time (ms)

Figure 7: Pareto file size distribution instead of deterministic

Mean response time 99.9th %ile response time Load 0.2: CDF
Response time (ms)

Response time (ms)

40 1
than threshold
Fraction later

30 600 0.1
400 0.01
20
1 copy 1 copy 0.001 1 copy
10 200 0.0001
2 copies 2 copies 2 copies
0 0 1e-05
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 10 100 1000
Load Load Response time (ms)

Figure 8: Cache:disk ratio 0.01 instead of 0.1. Higher variability because of the larger proportion of accesses
hitting disk. Compared to Figure 5, 99.9th percentile improvement goes from 2.3× to 2.8× at 10% load, and
from 2.2× to 2.5× at 20% load.
Fraction later than threshold

Mean response time 99.9th %ile response time Rate 1000 queries/sec/node: CDF
Response time (ms)

Response time (ms)

40 1
600 1 copy
30 0.1
2 copies
20 400 0.01
1 copy 200 1 copy
10 0.001
2 copies 2 copies
0 0 0.0001
0 2 4 6 8 10 12 0 2 4 6 8 10 12 10 100 1000
Arrival rate (queries/sec/node) Arrival rate (queries/sec/node) Response time (ms)

Figure 9: EC2 nodes instead of Emulab. x-axis shows unnormalised arrival rate because maximum throughput
seems to fluctuate. Note the much larger tail improvement compared to Figure 5.
Mean response time 99.9th %ile response time Load 0.2: CDF
Response time (ms)

Response time (ms)


100 1

than threshold
Fraction later
80 600 0.1
60 400 0.01
40 1 copy 1 copy 1 copy
20 200 0.001
2 copies 2 copies 2 copies
0 0 0.0001
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 10 100 1000
Load Load Response time (ms)

Figure 10: Mean file size 400 KB instead of 4 KB

Mean response time 99.9th %ile response time Load 0.2: CDF
Response time (ms)

Response time (ms)

2 1

than threshold
Fraction later
60 0.1
1.5
0.01
1 40 0.001
1 copy 20 1 copy 0.0001 1 copy
0.5
2 copies 2 copies 1e-05 2 copies
0 0 1e-06
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0.1 1 10 100 1000
Load Load Response time (ms)

Figure 11: Cache:disk ratio 2 instead of 0.1. Cache is large enough to store contents of entire disk

Mean response time 99.9th %ile response time Load 0.2: CDF
Response time (ms)

Response time (ms)

0.5 2 1
than threshold

1 copy
Fraction later

0.4 0.1
1.5
0.3 0.01 2 copies
1 0.001
0.2 1 copy 1 copy 0.0001
0.1 0.5
2 copies 2 copies 1e-05
0 0 1e-06
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0.1 1 10 100
Load Load Response time (ms)

Figure 12: memcached


1 overhead was mitigated by the response time reduction it
Fraction later than threshold 1 copy: real achieved). We now consider a setting in which this over-
0.1 2 copies: real
1 copy: stub
head can be essentially eliminated: a network in which the
0.01 2 copies: stub switches are capable of strict prioritization.
Specifically, we consider a data center network. Many
0.001 data center network architectures [2, 18] provide multiple
0.0001
equal-length paths between each source-destination pair, and
assign flows to paths based on a hash of the flow header [20].
1e-05 However, simple static flow assignment interacts poorly with
the highly skewed flow-size mix typical of data centers: the
1e-06
majority of the traffic volume in a data center comes from a
1e-05 0.0001 0.001 0.01 0.1
small number of large elephant flows [2, 3], and hash-based
Response time (s) flow assignment can lead to hotspots because of the possi-
bility of assigning multiple elephant flows to the same link,
Figure 13: memcached: stub and normal version which can result in significant congestion on that link. Re-
response times at 0.1% load cent work has proposed mitigating this problem by dynam-
ically reassigning flows in response to hotspots, in either a
centralized [1] or distributed [31] fashion.
is large enough that all the files fit in memory (Figure 11). We consider a simple alternative here: redundancy. Ev-
We study this second scenario more directly, using an in- ery switch replicates the first few packets of each flow along
memory distributed database, in the next section. an alternate route, reducing the probability of collision with
an elephant flow. Replicated packets are assigned a lower
2.3 Application: memcached (strict) priority than the original packets, meaning they can
We run a similar experiment to the one in the previous never delay the original, unreplicated traffic in the network.
section, except that we replace the filesystem store + Linux Note that we could, in principle, replicate every packet —
kernel cache + Apache web server interface setup with the the performance when we do this can never be worse than
memcached in-memory database. Figure 12 shows the ob- without replication — but we do not since unnecessary repli-
served response times in an Emulab deployment. The results cation can reduce the gains we achieve by increasing the
show that replication seems to worsen overall performance amount of queueing within the replicated traffic. We repli-
at all the load levels we tested (10-90%). cate only the first few packets instead, with the aim of reduc-
To understand why, we test two versions of our code at ing the latency for short flows (the completion times of large
a low (0.1%) load level: the “normal” version, as well as a flows depend on their aggregate throughput rather than in-
version with the calls to memcached replaced with stubs, dividual per-packet latencies, so replication would be of little
no-ops that return immediately. The performance of this use).
stub version is an estimate of how much client-side latency We evaluate this scheme using an ns-3 simulation of a
is involved in processing a query. common 54-server three-layered fat-tree topology, with a full
Figure 13 shows that the client-side latency is non-trivial. bisection-bandwidth fabric consisting of 45 6-port switches
Replication increases the mean response time in the stub ver- organized in 6 pods. We use a queue buffer size of 225
sion by 0.016 ms, which is 9% of the 0.18 ms mean service KB and vary the link capacity and delay. Flow arrivals
time. This is an underestimate of the true client-side over- are Poisson, and flow sizes are distributed according to a
head since the stub version, which doesn’t actually process standard data center workload [8], with flow sizes varying
queries, does not measure the network and kernel overhead from 1 KB to 3 MB and with more than 80% of the flows
involved in sending and receiving packets over the network. being less than 10 KB.
The client-side latency overhead due to redundancy is thus Figure 14 shows the completion times of flows smaller than
at least 9% of the mean service time. Further, the service 10 KB when we replicate the first 8 packets in every flow.
time distribution is not very variable: although there are Figure 14(a) shows the reduction in the median flow com-
outliers, more than 99.9% of the mass of the entire distri- pletion time as a function of load for three different delay-
bution is within a factor of 4 of the mean. Figure 4 in §2.1 bandwidth combinations (achieved by varying the latency
shows that when the service time distribution is completely and capacity of each link in the network). Note that in all
deterministic, a client-side overhead greater than 3% of the three cases, the improvement is small at low loads, rises un-
mean service time is large enough to completely negate the til load ≈ 40%, and then starts to fall. This is because at
response time reduction due to redundancy. very low loads, the congestion on the default path is small
In our system, redundancy does not seem to have that ab- enough that replication does not add a significant benefit,
solute a negative effect – in the “normal” version of the code, while at very high loads, every path in the network is likely
redundancy still has a slightly positive effect overall at 0.1% to be congested, meaning that replication again yields lim-
load (Figure 13). This suggests that the threshold load is ited gain. We therefore obtain the largest improvement at
positive though small (it has to be smaller than 10%: Fig- intermediate loads.
ure 12 shows that replication always worsens performance Note also that the performance improvement we achieve
beyond 10% load). falls as the delay-bandwidth product increases. This is be-
cause our gains come from the reduction in queuing delay
2.4 Application: replication in the network when the replicated packets follow an alternate, less con-
gested, route. At higher delay-bandwidth products, queue-
Replication has always added a non-zero amount of over-
ing delay makes up a smaller proportion of the total flow
head in the systems we have considered so far (even if that
% improvement in median flow completion time 99th %ile flow completion time CDF: Load 0.4
40 12 1
35 5 Gbps, 2 us per hop No replication 0.9 No replication

Completion time (ms)


10 Gbps, 2 us per hop 10 Replication 0.8 Replication
30
% improvement

than threshold
0.7

Fraction later
10 Gbps, 6 us per hop 8
25 0.6
20 6 0.5
15 0.4
4 0.3
10
2 0.2
5 0.1
0 0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Total load Total load Flow completion time (ms)

Figure 14: Median and tail completion times for flows smaller than 10 KB

completion time, meaning that the total latency savings servers and the clients against the economic value of the
achieved is correspondingly smaller. At 40% network load, latency improvement that would be achieved. In our eval-
we obtain a 38% improvement in median flow completion uation we find that the latency improvement achieved by
time (0.29 ms vs. 0.18 ms) when we use 5 Gbps links with redundancy is orders of magnitude larger than the required
2 us per-hop delay. The improvement falls to 33% (0.15 ms threshold in both the applications we consider here.
vs. 0.10 ms) with 10 Gbps links with 2 us per-hop delay,
and further to 19% (0.21 ms vs. 0.17 ms) with 10 Gbps links 3.1 Application: Connection establishment
with 6 us per-hop delay. We start with a simple example, demonstrating why repli-
Next, Figure 14(b) shows the 99th percentile flow comple- cation should be cost-effective even when the available choices
tion times for one particular delay-bandwidth combination. are limited: we use a back-of-the-envelope calculation to
In general, we see a 10-20% reduction in the flow comple- consider what happens when multiple copies of TCP-handshake
tion times, but at 70-80% load, the improvement spikes to packets are sent on the same path. It is obvious that this
80-90%. The reason turns out to be timeout avoidance: at should help if all packet losses on the path are independent.
these load levels, the 99th percentile unreplicated flow faces In this case, sending two back-to-back copies of a packet
a timeout, and thus has a completion time greater than the would reduce the probability of it being lost from p to p2 .
TCP minRTO, 10 ms. With redundancy, the number of In practice, of course, back-to-back packet transmissions are
flows that face timeouts reduces significantly, causing the likely to observe a correlated loss pattern. But Chan et
99th percentile flow completion time to be much smaller al. [11] measured a significant reduction in loss probabil-
than 10 ms. ity despite this correlation. Sending back-to-back packet
At loads higher than 80%, however, the number of flows pairs between PlanetLab hosts, they found that the aver-
facing timeouts is high even with redundancy, resulting in a age probability of individual packet loss was ≈ 0.0048, and
narrowing of the performance gap. the probability of both packets in a back-to-back pair being
Finally, Figure 14(c) shows a CDF of the flow completion dropped was only ≈ 0.0007 – much larger than the ∼ 10−6
times at one particular load level. Note that the improve- that would be expected if the losses were independent, but
ment in the mean and median is much larger than that in still 7× lower than the individual packet loss rate.2
the tail. We believe this is because the high latencies in the As a concrete example, we quantify the improvement that
tail occur at those instants of high congestion when most of this loss rate reduction would effect on the time required to
the links along the flow’s default path are congested. There- complete a TCP handshake. The three packets in the hand-
fore, the replicated packets, which likely traverse some of shake are ideal candidates for replication: they make up
the same links, do not fare significantly better. an insignificant fraction of the total traffic in the network,
Replication has a negligible impact on the elephant flows: and there is a high penalty associated with their being lost
it improved the mean completion time for flows larger than (Linux and Windows use a 3 second initial timeout for SYN
1 MB by a statistically-insignificant 0.12%. packets; OS X uses 1 second [12]). We use the loss prob-
ability statistics discussed above to estimate the expected
3. INDIVIDUAL VIEW latency savings on each handshake.
We consider an idealized network model. Whenever a
The model and experiments of the previous section in- packet is sent on the network, we assume it is delivered suc-
dicated that in a range of scenarios, latency is best opti- cessfully after (RT T /2) seconds with probability 1 − p, and
mized in a fixed set of system resources through replication. lost with probability p. Packet deliveries are assumed to be
However, settings such as the wide-area Internet are better independent of each other. p is 0.0048 when sending one
modeled as having elastic resources: individual participants copy of each packet, and 0.0007 when sending two copies
can selfishly choose whether to replicate an operation, but of each packet. We also assume TCP behavior as in the
this incurs an additional cost (such as bandwidth usage or Linux kernel: an initial timeout of 3 seconds for SYN and
battery consumption). In this section, we present two exam- SYN-ACK packets and of 3 × RT T for ACK packets, and
ples of wide-area Internet applications in which replication exponential backoff on packet loss [12].
achieves a substantial improvement in latency. We argue With this model, it can be shown that duplicating all three
that the latency reduction in both these applications out- packets in the handshake would reduce its expected comple-
weighs the cost of the added overhead by comparing against
a benchmark that we develop in a companion article [29]. 2
It might be possible to do even better by spacing the trans-
The benchmark establishes a cost-effectiveness threshold by missions of the two packets in the pair a few milliseconds
comparing the cost of the extra overhead induced at the apart to reduce the correlation.
tion time by approximately (3+3+3×RT T )×(4.8−0.7) ms, 1

Fraction later than threshold


which is at least 25 ms. The benefit increases with RT T ,
and is even higher in the tail: duplication would improve 0.1
the 99.9th percentile handshake completion time by at least
880 ms. 0.01
Is this improvement worth the cost of added traffic? Qual- 1 server
itatively, even 25 ms is significant relative to the size of the 0.001 2 servers
5 servers
handshake packets. Quantitatively, a cost-benefit analysis 10 servers
is difficult since it depends on estimating and relating the 0.0001
direct and indirect costs of added traffic and the value to 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

humans of lower latency. While an accurate comparison is Response time threshold (s)

likely quite difficult, the study referenced at the beginning of


this section [29, 30] estimated these values using the pricing Figure 15: DNS response time distribution.
of cloud services, which encompasses a broad range of costs,
including those for bandwidth, energy consumption, server 70
utilization, and network operations staff, and concluded that 60
in a broad class of cases, reducing latency is useful as long

% latency reduction
50
as it improves latency by 16 ms for every KB of extra traf-
fic. In comparison, the latency savings we obtain in TCP 40

connection establishment is more than an order of magni- 30


Mean
tude larger than this threshold in the mean, and more than 20 Median
two orders of magnitude larger in the tail. Specifically, if we 10 95th %ile
assume each packet is 50 bytes long then a 25-880 ms im- 99th %ile
0
provement implies a savings of around 170-6000 ms/KB. We 2 3 4 5 6 7 8 9 10
caution, however, that the analysis of [29, 30] was necessar- Number of copies of each query
ily imprecise; a more rigorous study would be an interesting
avenue of future work. Figure 16: Reduction in DNS response time, aver-
aged across 15 PlanetLab servers.
3.2 Application: DNS
An ideal candidate for replication is a service that in-
volves small operations and which is replicated at multiple reduction with just 2 DNS servers in all metrics, improving
locations, thus providing diversity across network paths and to 50-62% reduction with 10 servers. Finally, we compared
servers, so that replicated operations are quite independent. performance to the best single server in retrospect, i.e., the
We believe opportunities to replicate queries to such services server with minimum mean response time for the queries to
may arise both in the wide area and the data center. Here, individual servers in Stage 2 of the experiment, since the
we explore the case of replicating DNS queries. best server may change over time. Even compared with this
We began with a list of 10 DNS servers3 and Alexa.com’s stringent baseline, we found a result similar to Fig. 16, with
list of the top 1 million website names. At each of 15 Plan- a reduction of 44-57% in the metrics when querying 10 DNS
etLab nodes across the continental US, we ran a two-stage servers.
experiment: (1) Rank all 10 DNS servers in terms of mean How many servers should one use? Figure 17 compares the
response time, by repeatedly querying a random name at a marginal increase in latency savings from each extra server
random server. Note that this ranking is specific to each against the 16 ms/KB benchmark [29, 30] discussed earlier
PlanetLab server. (2) Repeatedly pick a random name and in this section. The results show that what we should do de-
perform a random one of 20 possible trials — either querying pends on the metric we care about. If we are only concerned
one of the ten individual DNS servers, or querying anywhere with mean performance, it does not make economic sense to
from 1 to 10 of the best servers in parallel (e.g. if sending 3
copies of the query, we send them to the top 3 DNS servers
in the ranked list). In each of the two stages, we performed
one trial every 5 seconds. We ran each stage for about a Incremental Improvement
week at each of the 15 nodes. Any query which took more 1000
99th %ile
Latency savings (ms/KB)

than 2 seconds was treated as lost, and counted as 2 sec


when calculating mean response time. Mean
100 Break-even point
Figure 15 shows the distribution of query response times
across all the PlanetLab nodes. The improvement is sub-
stantial, especially in the tail: Querying 10 DNS servers, the
10
fraction of queries later than 500 ms is reduced by 6.5×, and
the fraction later than 1.5 sec is reduced by 50×. Averaging
over all PlanetLab nodes, Figure 16 shows the average per- 1
cent reduction in response times compared to the best fixed 2 3 4 5 6 7 8 9 10
DNS server identified in stage 1. We obtain a substantial Number of DNS servers
3
The default local DNS server, plus public servers from
Level3, Google, Comodo, OpenDNS, DNS Advantage, Nor- Figure 17: Incremental latency improvement from
ton DNS, ScrubIT, OpenNIC, and SmartViper. each extra server contacted
contact any more than 5 DNS servers for each query, but if did not study the systems view of optimizing a fixed set of
we care about the 99th percentile, then it is always useful resources.
to contact 10 or more DNS servers for every query. Note Most importantly, unlike all of the above work, our goal is
also that the absolute (as opposed to the marginal) latency to demonstrate the power of redundancy as a general tech-
savings is still worthwhile, even in the mean, if we contact nique. We do this by providing a characterization of when
10 DNS servers for every query. The absolute mean latency it is (and isn’t) useful, and by quantifying the performance
savings from sending 10 copies of every query is 0.1 sec / improvement it offers in several use cases where it is appli-
4500 extra bytes ≈ 23 ms/KB, which is more than twice the cable.
break-even latency savings. And if the client costs are based
on DSL rather than cell service, the above schemes are all 5. CONCLUSION
more than 100× more cost-effective. We studied an abstract characterization of the tradeoff
Querying multiple servers also increases caching, a side- between the latency reduction achieved by redundancy and
benefit which would be interesting to quantify. the cost of the overhead it induces to demonstrate that re-
Prefetching — that is, preemptively initiating DNS lookups dundancy should have a net positive impact in a large class
for all links on the current web page — makes a similar of systems. We then confirmed empirically that redundancy
tradeoff of increasing load to reduce latency, and its use is offers a significant benefit in a number of practical appli-
widespread in web browsers. Note, however, that redun- cations, both in the wide area and in the data center. We
dancy is complementary to prefetching, since some names believe our results demonstrate that redundancy is a pow-
in a page will not have been present on the previous page erful technique that should be used much more commonly
(or there may not be a previous page). in networked systems than it currently is. Our results also
will guide the judicious application of redundancy within
4. RELATED WORK only those cases where it is a win in terms of performance
Replication is used pervasively to improve reliability, and or cost-effectiveness.
in many systems to reduce latency. Distributed job exe-
cution frameworks, for example, have used task replication Acknowledgements
to improve response time, both preemptively [4, 15] and to We would like to thank our shepherd Sem Borst and the
mitigate the impact of stragglers [32]. anonymous reviewers for their valuable suggestions. We
Within networking, replication has been explored to re- gratefully acknowledge the support of NSF grants 1050146,
duce latency in several specialized settings, including repli- 1149895, 1117161 and 1040838.
cating DHT queries to multiple servers [22] and replicat-
ing transmissions (via erasure coding) to reduce delivery
time and loss probability in delay-tolerant networks [21, 27].
6. REFERENCES
[1] M. Al-Fares, S. Radhakrishnan, B. Raghavan,
Replication has also been suggested as a way of providing
N. Huang, and A. Vahdat. Hedera: dynamic flow
QoS prioritization and improving latency and loss perfor- scheduling for data center networks. In Proceedings of
mance in networks capable of redundancy elimination [19]. the 7th USENIX conference on Networked systems
Dean and Barroso [13] discussed Google’s use of redun-
design and implementation, NSDI’10, pages 19–19,
dancy in various systems, including a storage service similar
Berkeley, CA, USA, 2010. USENIX Association.
to the one we evaluated in §2.2, but they studied specific sys-
[2] M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye,
tems with capabilities that are not necessarily available in
P. Patel, B. Prabhakar, S. Sengupta, and
general (such as the ability to cancel outstanding partially-
M. Sridharan. Data center TCP (DCTCP). In
completed requests), and did not consider the effect the total
SIGCOMM, 2010.
system utilization could have on the efficacy of redundancy.
In contrast, we thoroughly evaluate the effect of redundancy [3] M. Alizadeh, S. Yang, S. Katti, N. McKeown,
at a range of loads both in various configurations of a de- B. Prabhakar, and S. Shenker. Deconstructing
ployed system (§2.2, §2.3), and in a large space of synthetic datacenter packet transport. In Proceedings of the 11th
scenarios in an abstract system model (§2.1). ACM Workshop on Hot Topics in Networks,
Andersen et al. [5]’s MONET system proxies web traf- HotNets-XI, pages 133–138, New York, NY, USA,
fic through an overlay network formed out of multi-homed 2012. ACM.
proxy servers. While the primary focus of [5] is on adapt- [4] G. Ananthanarayanan, A. Ghodsi, S. Shenker, and
ing quickly to changes in path performance, they replicate I. Stoica. Why let resources idle? Aggressive cloning of
two specific subsets of their traffic: connection establish- jobs with Dolly. In USENIX HotCloud, 2012.
ment requests to multiple servers are sent in parallel (while [5] D. G. Andersen, H. Balakrishnan, M. F. Kaashoek,
the first one to respond is used), and DNS queries are repli- and R. N. Rao. Improving web availability for clients
cated to the local DNS server on each of the multi-homed with MONET. In USENIX NSDI, pages 115–128,
proxy server’s interfaces. We show that replication can be Berkeley, CA, USA, 2005. USENIX Association.
useful in both these contexts even in the absence of path di- [6] S. Asmussen. Applied Probability and Queues. Wiley,
versity: a significant performance benefit can be obtained by 1987.
sending multiple copies of TCP SYNs to the same server on [7] D. Beaver, S. Kumar, H. C. Li, J. Sobel, and
the same path, and by replicating DNS queries to multiple P. Vajgel. Finding a needle in haystack: facebook’s
public servers over the same access link. photo storage. In Proceedings of the 9th USENIX
In a recent workshop paper [30] we advocated using re- conference on Operating systems design and
dundancy to reduce latency, but it was preliminary work implementation, OSDI’10, pages 1–8, Berkeley, CA,
that did not characterize when redundancy is helpful, and USA, 2010. USENIX Association.
[8] T. Benson, A. Akella, and D. A. Maltz. Network traffic [22] J. Li, J. Stribling, R. Morris, and M. Kaashoek.
characteristics of data centers in the wild. In IMC, Bandwidth-efficient management of DHT routing
pages 267–280, New York, NY, USA, 2010. ACM. tables. In NSDI, 2005.
[9] J. Brutlag. Speed matters for Google web search, June [23] D. S. Myers and M. K. Vernon. Estimating queue
2009. http://services.google.com/fh/files/ length distributions for queues with random arrivals.
blogs/google_delayexp.pdf. SIGMETRICS Perform. Eval. Rev., 40(3):77–79, Jan.
[10] Apache Cassandra. http://cassandra.apache.org. 2012.
[11] E. W. Chan, X. Luo, W. Li, W. W. Fok, and R. K. [24] M. Olvera-Cravioto, J. Blanchet, and P. Glynn. On
Chang. Measurement of loss pairs in network paths. In the transition from heavy-traffic to heavy-tails for the
IMC, pages 88–101, New York, NY, USA, 2010. ACM. m/g/1 queue: The regularly varying case. Annals of
[12] J. Chu. Tuning TCP parameters for the 21st century. Applied Probability, 21:645–668, 2011.
http://www.ietf.org/proceedings/75/slides/ [25] S. Ramachandran. Web metrics: Size and number of
tcpm-1.pdf, July 2009. resources, May 2010. https://developers.google.
[13] J. Dean and L. A. Barroso. The tail at scale. com/speed/articles/web-metrics.
Commun. ACM, 56(2):74–80, Feb. 2013. [26] K. Sigman. Appendix: A primer on heavy-tailed
[14] P. Dixon. Shopzilla site redesign – we get what we distributions. Queueing Systems, 33(1-3):261–275,
measure, June 2009. 1999.
http://www.slideshare.net/shopzilla/ [27] E. Soljanin. Reducing delay with coding in (mobile)
shopzillas-you-get-what-you-measure-velocity-2009. multi-agent information transfer. In Communication,
[15] C. C. Foster and E. M. Riseman. Percolation of code Control, and Computing (Allerton), 2010 48th Annual
to enhance parallel dispatching and execution. IEEE Allerton Conference on, pages 1428–1433. IEEE, 2010.
Trans. Comput., 21(12):1411–1415, Dec. 1972. [28] S. Souders. Velocity and the bottom line.
[16] Google AppEngine datastore: memcached cache. http://radar.oreilly.com/2009/07/
https://developers.google.com/appengine/docs/ velocity-making-your-site-fast.html.
python/memcache/usingmemcache#Pattern. [29] A. Vulimiri, P. B. Godfrey, and S. Shenker. A
[17] W. Gray and D. Boehm-Davis. Milliseconds matter: cost-benefit analysis of low latency via added
An introduction to microstrategies and to their use in utilization, June 2013. http://web.engr.illinois.
describing and predicting interactive behavior. Journal edu/~vulimir1/benchmark.pdf.
of Experimental Psychology: Applied, 6(4):322, 2000. [30] A. Vulimiri, O. Michel, P. B. Godfrey, and S. Shenker.
[18] A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, More is less: Reducing latency via redundancy. In
C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and Eleventh ACM Workshop on Hot Topics in Networks
S. Sengupta. VL2: a scalable and flexible data center (HotNets-XI), October 2012.
network. In ACM SIGCOMM, pages 51–62, New [31] X. Wu and X. Yang. Dard: Distributed adaptive
York, NY, USA, 2009. ACM. routing for datacenter networks. In Proceedings of the
[19] D. Han, A. Anand, A. Akella, and S. Seshan. RPT: 2012 IEEE 32nd International Conference on
re-architecting loss protection for content-aware Distributed Computing Systems, ICDCS ’12, pages
networks. In Proceedings of the 9th USENIX 32–41, Washington, DC, USA, 2012. IEEE Computer
conference on Networked Systems Design and Society.
Implementation, NSDI’12, pages 6–6, Berkeley, CA, [32] M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and
USA, 2012. USENIX Association. I. Stoica. Improving MapReduce performance in
[20] C. Hopps. Computing TCP’s retransmission timer heterogeneous environments. In USENIX OSDI, pages
(RFC 6298), 2000. 29–42, Berkeley, CA, USA, 2008.
[21] S. Jain, M. Demmer, R. Patra, and K. Fall. Using [33] A. P. Zwart. Queueing Systems With Heavy Tails.
redundancy to cope with failures in a delay tolerant PhD thesis, Technische Universiteit Eindhoven,
network. In ACM SIGCOMM, 2005. September 2001.
164

CURRICULUM VITAE

Mr.P.Venketesh was born at Coimbatore on 11th April 1979. He

completed his schooling from G.D Matriculation School, Coimbatore in 1996.

He received his B.Sc Computer Technology degree from P.S.G

College of Technology, Coimbatore in May 1999. He obtained his M.Sc

Computer Technology degree from the same institution in May 2001. He also

obtained his M.S (By Research) degree from Anna University, Chennai in April

2008. He is currently working as Assistant Professor (Senior Grade) in the

Department of Computer and Information Sciences, PSG College of Technology,

Coimbatore.

He is a life member of ISTE, ACCS and ISSS. His areas of interest

include Content Distribution Networks, Web Caching and Prefetching,

Distributed Computing and Web Mining.

You might also like