You are on page 1of 28

Web Prefetching and Caching using Domain Based Approach

20122013

Chapter 1 Introduction

SHAH AND ANCHOR KUTCHHI ENGINEERING COLLEGE

Page 1

Web Prefetching and Caching using Domain Based Approach

20122013

The World Wide Web is a huge information repository. When so many users access this information repository, it is easy to find certain patterns in the way they access web resources. Web request prediction has been implemented in the past, primarily for static content. Increasing web content and Internet traffic is making web prediction models very popular. The objective of a prediction model is to identify the subsequent requests of a user, given the current request that a user has made. This way the server can pre-fetch it and cache these pages or it can pre-send this information to the client. The idea is to control the load on the server and thus reduce the access time. Careful implementation of this technique can reduce access time and latency, making optimal usage of the servers computing power and the network bandwidth. Domain based model is a machine learning technique and is different from the approach that data mining does with web logs. Data mining approach identifies the classes of users using their attributes and predicting future actions without considering interactivity and immediate implications. There are other techniques like prediction by partial matching and information retrieval that may be used in conjunction with Markov modeling, to enhance performance and accuracy. A web prediction model unlike other prediction models are particularly challenging because of the many states that it has to hold and the dynamic nature of the web in terms of user actions and continuously changing content. We therefore use the domain based approach along with the click streams or timestamps of the web page access. Predictive web prefetching refers to the mechanism of deducing the forthcoming page accesses of a client based on it past accesses. Web prefetching is the process of deducing clients future request for web document and getting that document in to the cache, before an explicit request is made for them. Prefetching capitalizes on the spatial locality present in request streams, that is, correlated reference for different document and exploits the clients idle time, i.e., the time between successive request the main advantage employing prefetching is that it prevents bandwidth underutilization and hides part of latency. Web Prefetching acts complementary to caching; it can significantly improve cache performance and reduce the user perceived latency. The Web caching aims to improve the performance of web-based systems by storing and reusing web objects that are likely to be used in the near future. It has proven to be an effective technique in reducing network traffic, decreasing the access latency and lowering the server load .Web caching has focused on the use of historic information about web objects to aid the cache replacement policies. These policies take into account not only information about the webdocument access frequency, but also document sizes and access costs. This past information is used to generate estimates on how often and how expensive it is for the objects saved in the cache to be accessed again in the near future .An important advantage of the www is that many web servers keep a server access log of its users. These logs can be used to train a prediction model for future document accesses. Based on these models, it can obtain frequent access

SHAH AND ANCHOR KUTCHHI ENGINEERING COLLEGE

Page 2

Web Prefetching and Caching using Domain Based Approach

20122013

patterns in web logs and mine association rules for path prediction. Incorporate our associationbased prediction model into proxy caching and prefetching algorithms to improve their performance. Recently, a few researches used mining techniques to explore the browsing behaviours of users in web services. Web prefetching involves two main steps. First, predictions are made based on previous experience about users accesses and preferences, and the corresponding hints are provided. Second, the prefetching engine decides what objects are going to be prefetched. The prefetching engine can be located at the web browser or at an intermediate web proxy server. The web server can perform the predictions. They can also be done by the web browser or by an intermediate proxy. In this work, it is assumed that the web server provides hints and the web client prefetches them.

SHAH AND ANCHOR KUTCHHI ENGINEERING COLLEGE

Page 3

Web Prefetching and Caching using Domain Based Approach

20122013

Chapter 2 Literature Survey

SHAH AND ANCHOR KUTCHHI ENGINEERING COLLEGE

Page 4

Web Prefetching and Caching using Domain Based Approach A Model of Data Warehousing Process Maturity

20122013

[1]. Even though data warehousing (DW) requires huge investments, the data warehouse market is experiencing incredible growth. However, a large number of DW initiatives end up as failures. In this paper, we argue that the maturity of a data warehousing process (DWP) could significantly mitigate such large-scale failures and ensure the delivery of consistent, high quality, single-version of truth data in a timely manner. However, unlike software development, the assessment of DWP maturity has not yet been tackled in a systematic way. In light of the critical importance of data as a corporate resource, we believe that the need for a maturity model for DWP could not be greater. In this paper, we describe the design and development of a five-level DWP maturity model (DWP-M) over a period of three years. A unique aspect of this model is that it covers processes in both data warehouse development and operations. Over 20 key DW executives from 13 different corporations were involved in the model development process. The final model was evaluated by a panel of experts; the results strongly validate the functionality, productivity, and usability of the model. We present the initial and final DWP-M model versions, along with illustrations of several key process areas at different levels of maturity. There are some issues in Data Warehouse Maturity Process: The Data Warehouse Maturity process is not a full proof process. Inspite of Spending a very long time, there is no assurance that the data warehousing process will matured. It is a very expensive process. It is time consuming.

Hence this topic could not be taken as a topic for project topic.

I.

Modelling Massive RFID Data Sets: A Gateway-Based Movement Graph Approach

[2]. The increasingly wide adoption of RFID technology by retailers to track containers, pallets, and even individual items as they move through the global supply chain, from factories in exporting countries, through transportation ports, and finally to stores in importing countries, creates enormous data sets containing rich multidimensional information on the movement patterns associated with objects and their characteristics. However, this information is usually hidden in terabytes of low-level RFID readings, making it difficult for data analysts to gain

SHAH AND ANCHOR KUTCHHI ENGINEERING COLLEGE

Page 5

Web Prefetching and Caching using Domain Based Approach

20122013

insight into the set of interesting patterns influencing the operation and efficiency of the procurement process. In order to realize the full benefits of detailed object tracking information, we need to develop a compact and efficient RFID cube model that provides OLAP-style operators useful to navigate through the movement data at different levels of abstraction of both spatiotemporal and item information dimensions. This is a challenging problem that cannot be efficiently solved by traditional data cube operators, as RFID data sets require the aggregation of high-dimensional graphs representing object movements, not just that of entries in a flat fact table. We propose to model the RFID data warehouse using a movement graph-centric view, which makes the warehouse conceptually clear, better organized, and obtaining significantly deeper compression and performance gain over competing models in the processing of path queries. The importance of the movement graph approach to RFID data warehousing can be illustrated with an example. Example. Consider a large retailer with a global supplier and distribution network that spans several countries and that tracks objects with RFID tags placed at the item level. Such a retailer sells millions of items per day through thousands of stores around the world, and for each such item, it records the complete set of movements between locations, starting at factories in producing countries, going through the transportation network, and finally arriving at a particular store where the item is purchased by a customer. The complete path traversed by each item can be quite long as readers are placed at very specific locations within factories, ships, and stores (e.g., a production lane, a particular truck, or an individual shelf inside a store). Further, for each object movement, properties such as shipping cost, temperature, and humidity can be recorded. There are some issues in :

II.

Web Prefetching and Caching using Domain Based Approach


[6]. The World Wide Web is a huge information repository. When so many users access this information repository, it is easy to find certain patterns in the way they access web resources. Web request prediction has been implemented in the past, primarily for static content. Increasing web content and Internet traffic is making web prediction models very popular. The objective of a prefetching model is to identify the subsequent requests of a user, given the current request that a user has made. This way the server can pre-fetch it and cache these pages or it can pre-send this information to the client. The idea is to control the load on the server and thus reduce the access time. Careful implementation of this technique can reduce access time and latency, making optimal usage of the servers computing power and the network bandwidth. Increasing web content and Internet traffic is making web prefetching models popular. A web prediction model helps to predict user requests ahead of time, making web servers more responsive. It caches these pages at the server side or pre-sends the response to the client to reduce web latency. Web prefetching involves two main steps. First, predictions are made based

SHAH AND ANCHOR KUTCHHI ENGINEERING COLLEGE

Page 6

Web Prefetching and Caching using Domain Based Approach

20122013

on previous experience about users accesses and preferences, and the corresponding hints are provided. Second, the prefetching engine decides what objects are going to be prefetched. The prefetching engine can be located at the web browser or at an intermediate web proxy server. The web server can perform the predictions. they can also be done by the web browser or by an intermediate proxy . In this work, it is assumed that the web server provides hints and the web client prefetches them.
A. Link Prefetching.

[9]This mechanism, utilizes browser idle time to download documents that the user might visit in the future. A web page provides a set of pre-fetching hints to the browser and after the browser finishes loading the page, it starts pre-fetching specified documents and stores them in its cache. When the user visits one of the pre-fetched documents, it can be served up quickly out of the browser's cache. Fisher et. al proposed a server driven approach for ink prefetching. In this approach browser follows special directives from the web server or proxy server that instructs itto pre-fetch specific documents. This mechanism allows servers to control the contents to be prefetched by the browser. The browser looks for either HTML <link> tag or an HTTP Link: header tag to pre-fetch the subsequent links. The Link: header can also be specified within the HTML document itself by using a HTML <meta> tag.When the browser is idle, it observes these hints and queues up each unique request to be pre-fetched. But In this Approach, only the Links are Prefetched. Hence, we shall not use this algorithm for our project. B. Top 10 Approach [8]Evangelos P. Markatos et al. proposes a top 10 approach to prefetching on the web, in which the server calculates the list of most popular documents. This approach is easy to implement in a client server architecture. It consider frequency of access for predicting the web object, not the client characteristics on the web. Hence, we shall not use this algorithm for our project. C. Domain Top Approach [13]Seung Won Shin et al. proposes a domain top approach for web prefetching, which combines the Proxys active knowledge of most popular domains and documents.In this approach Proxy is responsible for calculating the most popular domains and most popular documents in those domains, then prepares a rank list for prefetching. The Domain Top Approach process is explained with the help of Diagram below

SHAH AND ANCHOR KUTCHHI ENGINEERING COLLEGE

Page 7

Web Prefetching and Caching using Domain Based Approach

20122013

Figure 1. Structure of rank list for Domain Top Approach

We Shall Implement this algorithm in our Project.

SHAH AND ANCHOR KUTCHHI ENGINEERING COLLEGE

Page 8

Web Prefetching and Caching using Domain Based Approach

20122013

Chapter 3 Problem Statement

SHAH AND ANCHOR KUTCHHI ENGINEERING COLLEGE

Page 9

Web Prefetching and Caching using Domain Based Approach

20122013

The User is allowed to Browse and Surf Internet over a period of Time. After using the Internet over a period of time, a pattern of data usage is generated. Based on this pattern, we predict which will be the next web page that the user will open. So, we prefetch all these webpages and store them into the cache. Also the webpages belonging to that domain will be prefetched and stored into the cache.

SHAH AND ANCHOR KUTCHHI ENGINEERING COLLEGE

Page 10

Web Prefetching and Caching using Domain Based Approach

20122013

Chapter 4 Requirement Analysis

4.1 SRS DOCUMENT


4.1.1 INTRODUCTION

SHAH AND ANCHOR KUTCHHI ENGINEERING COLLEGE

Page 11

Web Prefetching and Caching using Domain Based Approach


4.1.1.1 Purpose

20122013

This specification document describes the capabilities that will be provided by the technology Web Prefetching and Caching using Domain Based model. It also states the various constraints by which the system will abide. 4.1.1.2 Scope The technology product Web Prefetching and Caching will be a simulation application that will be used for prefetching the pages that are used most recently or that are most probable to be open. This is obtained from the usage pattern of the user. 4.1.1.3 Intended Audience The intended audiences for this document are the development team, testing team and end users which can be any person from any domain. 4.1.1.4 Definition, Acronyms and Abbreviations Prefetching The concept of Pre Loading the data from the internet before it is put into use is called prefetching. Semantics Words having similar meaning are said to be semantic words. 4.1.1.5 References i. ii. Format of Project Synopsis Report given by SAKEC IEEE Recommended Practice for Software Requirement Specification-IEEE std 830-1998.

4.1.2 OVERALL DESCRIPTION The World Wide Web is a huge information repository. When so many users access this

information repository, it is easy to find certain patterns in the way they access web resources. Web request prediction has been implemented in the past, primarily for static content. Increasing web content and Internet traffic is making web prediction models very popular. The objective of a prefetching model is to identify the subsequent requests of a user, given the current request that a user has made. This way the server can pre-fetch it and cache these pages or it can pre-send this information to the client. The idea is to control the load on the server and thus reduce the access time.

SHAH AND ANCHOR KUTCHHI ENGINEERING COLLEGE

Page 12

Web Prefetching and Caching using Domain Based Approach

20122013

Careful implementation of this technique can reduce access time and latency, making optimal usage of the servers computing power and the network bandwidth. 4.1.2.1 Product Perspective

Web Prefetching

4.1.2.2 Product Functions

Domain Based Model

We will prefetch the pages that will belong to the same domain or that follow the data usage pattern. This pattern is generated using Domain based Model and click stream based Web Prediction. 4.1.2.3 Operating Environment The operating environments required for this application to implement successfully are, Jdk1.3.0_02 installed in the system. Java Netbeans and MYSQL with a necessary connector. Windows environment

4.1.3 EXTERNAL INTERFACE REQUIREMENTS 4.1.3.1 User Interfaces 1. Here the User will give his web pages list. 2. This List is directly associated with database. 4.1.3.2 Hardware Interface 1. Screen resolution of at least 1280*720- required for proper and complete viewing of screens. Higher resolution would not be a problem.

SHAH AND ANCHOR KUTCHHI ENGINEERING COLLEGE

Page 13

Web Prefetching and Caching using Domain Based Approach

20122013

2. Standalone system or network based- not a concern, as it will be possible to run the application on any of these. 4.1.3.3 Software Interface 1. Any windows-based operating system(Windows 95/98/2000/NT/XP/VISTA/7) 2. Java Netbeans - for coding/developing the software application. 3. Microsoft Office to enable user to access the presentations on WEB PREFETCHING. 4. Adobe to enable user to access the user manual. 4.1.4 SYSTEM FEATURES We will prefetch the pages that will belong to the same domain or that follow the data usage pattern. This pattern is generated using domain based model with click stream.
4.1.5 OTHER NON FUNCTIONAL REQUIREMENTS

4.1.5.1 Safety Requirements For Safety we must take care that only safe and clear websites are opened. Else it would affect the hardware components and thereby would affect the System. 4.1.5.2 Security Requirements None 4.1.5.3 Software Quality Attributes i. Maintainability: This application will be designed in a maintainable manner. It will be easy to incorporate new requirements in the individual modules. ii. Portability: This application will be easily portable on any windows-based system that has JDK installed. iii. Reusability: This application aims at reusing the same modules so that improvements in Web Prefetching can be done easily and more algorithms can be added. iv. Usability: This application will be used by the development team, testing team and end users which can be any person from any domain.

SHAH AND ANCHOR KUTCHHI ENGINEERING COLLEGE

Page 14

Web Prefetching and Caching using Domain Based Approach


v.

20122013

Testability: This application will be easy to test. Test cases and other documents relating to testing will be provided to make testing easy.

vi.

Reliability: This application is expected to work and prefetch maximum websites that are belonging to the same domain or same pattern of data use.

Chapter 5 Project Design

SHAH AND ANCHOR KUTCHHI ENGINEERING COLLEGE

Page 15

Web Prefetching and Caching using Domain Based Approach

20122013

5.1 System Flowchart

5.2 Usecase Diagram

SHAH AND ANCHOR KUTCHHI ENGINEERING COLLEGE

Page 16

Web Prefetching and Caching using Domain Based Approach

20122013

5.3 Activity Diagram

SHAH AND ANCHOR KUTCHHI ENGINEERING COLLEGE

Page 17

Web Prefetching and Caching using Domain Based Approach

20122013

SHAH AND ANCHOR KUTCHHI ENGINEERING COLLEGE

Page 18

Web Prefetching and Caching using Domain Based Approach


5.4 Class Diagram

20122013

SHAH AND ANCHOR KUTCHHI ENGINEERING COLLEGE

Page 19

Web Prefetching and Caching using Domain Based Approach

20122013

Chapter 6 Implementation Details

Module wise system implementation


SHAH AND ANCHOR KUTCHHI ENGINEERING COLLEGE Page 20

Web Prefetching and Caching using Domain Based Approach Module 1 Extracting of History of the User

20122013

The History of the web access is taken by the software. Here as per the history, we consider the webpages which are accessed for more than a particular time interval eg 5 sec.

Module 2

SHAH AND ANCHOR KUTCHHI ENGINEERING COLLEGE

Page 21

Web Prefetching and Caching using Domain Based Approach


Creating Cache

20122013

Here as per the Web Access History, we have split the pages into their specific domains. Domains are set up as per the website url domains. Website domains set in our project are Commercial ( .com), Organizational ( .org), Information ( .info), Networks ( .net), Education, Governmental ( .gov), Others ( .in, .ca , etc web pages of the Specific Countries). After deleting the contents of Cache, the cache gets cleared.

SHAH AND ANCHOR KUTCHHI ENGINEERING COLLEGE

Page 22

Web Prefetching and Caching using Domain Based Approach

20122013

Chapter 7 Testing Development of test cases (functional)

SHAH AND ANCHOR KUTCHHI ENGINEERING COLLEGE

Page 23

Web Prefetching and Caching using Domain Based Approach

20122013

Test case ID 1

Objective

Steps / Description

Input

Expected Output

Actual Output Cache Booster Software Opens up

Result

Remark

Open the Software of

Click on Ram Booster

Mouse Event

Cache Booster Software should open

Cache Booster Software is available The

Pass

Cache Booster Software

Run the Cache Click on Run Booster Software Button on the Cache Booster Software

Mouse Event

up Entire List of Entire List Webpages in of History to be Webpages shown. shown. A List of Files A List of from the most Files from frequently used pages the most frequently appeared Cache list after successful execution, webpages cache

cache Pass is

booster software

in History is running successfully Cache Pass Contents are checked and verified.

Check the cache Click on Cache Mouse contents Folder in the Softwares Folder Event

should appear used pages 4 Delete contents Click on Cache Mouse and Cache list of cache and re folder, delete the Keyboard should be execute the software contents and re execute the Cache Booster software. Event deleted and after successful webpages in cache Software perfectly Pass

deleted and working

execution, the the should appear appeared in

SHAH AND ANCHOR KUTCHHI ENGINEERING COLLEGE

Page 24

Web Prefetching and Caching using Domain Based Approach

20122013

Chapter 8 Results and Analysis

SHAH AND ANCHOR KUTCHHI ENGINEERING COLLEGE

Page 25

Web Prefetching and Caching using Domain Based Approach

20122013

We compared the performance of the Pre-fetch based LFU with the traditional LFU. By grouping the users and analysing their previous access patterns, the system is able to cache pages that might be of interest to the users. By caching or pre-fetching the pages, the access speed of these pages has considerably increased. This system also helps us in efficiently using the cache, since the pages are cached based on the access history of a group of users and not based on individual users.

SHAH AND ANCHOR KUTCHHI ENGINEERING COLLEGE

Page 26

Web Prefetching and Caching using Domain Based Approach

20122013

Chapter 9 Conclusion and Future Scope

SHAH AND ANCHOR KUTCHHI ENGINEERING COLLEGE

Page 27

Web Prefetching and Caching using Domain Based Approach

20122013

In recent years, there has been considerable research in exploring novel methods and techniques to group users based on the information hidden in their browsing patterns. In this project, we present our approach to group hosts (each host represents an organizationally related group of users) according to their Web request patterns. We use the domain based prefetching with timestamps algorithm to cluster these communities of requests. We compare the performance of the Pre-fetch based LFU with the traditional LFU. By grouping the users and analyzing their previous access patterns, the system is able to predict pages that might be of interest to the users. By caching or pre-fetching the pages, the access speed of these pages can be considerably increased. This system also helps us in efficiently using the cache, since the pages are cached based on the access history of a group of users and not based on individual users.

SHAH AND ANCHOR KUTCHHI ENGINEERING COLLEGE

Page 28

You might also like