You are on page 1of 4

An Efficient Web Caching Algorithm based

on LFU-K replacement policy

© Vladimir V. Prischepa
Chelyabinsk State University
vladimir@csu.ac.ru

PhD advisor: Leonid B. Sokolinsky

Abstract is connected with the fact that many Web users have cor-
relation in requests (common interests). Researches have
In this paper we analyse effectiveness of LFU-K shown that 25-40% of all requested documents account
replacement policy for the purposes of caching for 70% of users requests [2]. Moreover, the nets of ho-
on proxy servers and give the results of traces mogeneous organisations (university, corporation, etc.)
analysis taken from real proxy servers to reveal have the highest correlation of requests.
a set of properties of network traffic. On the ba- While caching Web objects on proxy servers as well
sis of the analysis we have drawn a conclusion as in other aspects of caching there appears a problem of
about expediency of usage of LFU-K policy finite storage of the cache and hence, a necessity of mak-
which uses information about dynamic change ing room for new documents. A replacement policy de-
of document popularity, for Web caching. The termines which object is to be removed from the cache.
scheme of LFU-K policy is given as well as re- Selection of an effective policy can considerably increase
sults of experiments aimed to compare its effec- caching effectiveness reducing network traffic by 20%
tiveness with the most popular replacement al- and more.
gorithms. Difficulties in selecting a replacement policy are con-
nected with specific characteristics of the network traffic
1. Introduction. which are the following [4]:
In the last decade intensive growth of information vol- - HTTP protocol gives access only to files of full size,
umes in the World-Wide-Web has led to the problem of i. e. a proxy server cache can satisfy the user’s request
network loads. Development of technical means for only if the file has been stored completely (i.e. it can
transmitting data and their introduction into practice do not store incomplete objects).
not correspond with the Internet growth. Hence, there is a - Documents stored in the proxy server cache are of
need to find other approaches to solve the problem of different size, from several bytes to hundreds of
network loads. megabytes.
The common method of solution without drawing ad- - A stream of requests to the cache is a sum of streams
ditional technical means is via caching Web objects (text, of requests of hundreds and thousands of users.
image, etc.). Web documents are subdivided into two Because sizes of documents are different there is a
types: static and dynamic. Most of the Web objects are need to introduce a new metric of effectiveness called
static documents which can be stored in a certain data Byte Hit Rate apart from Hit Rate policy as the basic
‘storage’, i.e. a cache, for the further usage. While reac- metric of effectiveness. Byte Hit Rate is computed as a
cessing these documents the proxy server checks whether ratio of amount of bytes derived from the proxy cache to
there have been any modifications on the source site and the total amount of bytes requested by the user.
if there have not been any, the user has the page Because of difference in documents sizes metric sug-
downloaded right from the cache. gested by Byte Hit Rate is the most adequate to deter-
Usage effectiveness of caching allows to use it on dif- mine the policy effectiveness as it is the one that shows
ferent levels such as Internet browser, proxy server of a the savings in network traffic.
local net, Internet proxy server. Proxies serving a large
set of clients show the highest effectiveness of caching. It

Proceedings of the Spring Young Researcher's Collo-


quium on Database and Information Systems SYR-
CoDIS, St.-Petersburg, Russia, 2004
Percent requests (BU) Percentage of requests for 20% most popular objects
(BU).

1,8% 1,2%
26,5% 23,3%
26,2% 27,6%
less than 1 Kbyte less than 1 Kbyte
1-5 Kbyte 1-5 Kbyte
5-100 Kbyte 5-100 Kbyte
more than 100 Kbyte more than 100 Кбайт

45,5%
47,9%

Figure 1. Percentage of requests for objects depending Figure 2. Percentage of requests for the most popular
on their size (BU trace). objects depending on their size.

Theoretical distribution and requests distribution


in the real trace

12000
Percent traffic (BU)
BU -Trace
10000
Zipf 80-20
1,6% 7,5%
8000
Access

less than 1 Kbyte


1-5 Kbyte 6000

52,5%
5-100 Kbyte
38,4% more than 100 Kbyte 4000

2000

0
1 3 5 7 9 11 13 15 17 19 21 23 25 27
Page num ber

Figure 3. Percentage of traffic in all BU trace, depend- Figure 4. Comparison of theoretical distribution and
ing on the size of objects. requests distribution in the real BU trace.

showing the lack of correlation between the size of a


2. Proxy traces analysis. document and its popularity [2].
Secondly, another disadvantage is that traffic gener-
We have thoroughly analysed traces and found out a set ated by small-size documents (up to 5 Kb) is small in
of regularities. Most of the requested objects are of a comparison with one generated by big-size documents
small size up to 5 kilobytes (see Figure 1). This property (more than 5 Kb) (see Figure 3). Such a correlation con-
of traces serves a basis of most of the up-to-date re- cerns both the whole trace and the most popular objects.
placement policies for the proxy server (GD-Size [3], Thus, small-size objects caching leads to high effective-
GDSF [4], LRV, SIZE and others). In these policies the ness in Hit Rate metric and not high one in Byte Hit Rate.
most significant parameter to calculate the rating of ob- But as it has been pointed out high indices in Byte Hit
jects is the document size that is the smaller is the size of Rate exactly imply policy effectiveness from the point of
a document the longer it is stored in the cache and vice view of decrease in network traffic volume.
versa, the bigger is its size, the sooner it is removed. But Another significant characteristic is that showing
such an approach has its disadvantages. probability change of accesses to documents. In relatively
Firstly, the Figure 2 shows that percentage of requests small parts of the trace (see Figure 4) the probability of
for the most popular documents depending on their size page requests submits to Zipf-like distribution, or rule
practically does not change in comparison with all the “80-20” [2] (80% of requests are addressed to 20% of
documents. That is a conclusion can be drawn that docu- documents). But in larger parts of the trace significant
ments of different sizes can be popular with the same changes in many of the most popular objects take place.
probability. This conclusion corresponds with researches Some of the documents lose their popularity, some be-
come more popular.
B U trace CSU trace

0,45
0,45

0,4 0,4
LR U
LRU

Byte Hit Rate


0,35
Byte Hit Rate

S IZE
0,35 GD_SIZE
G D _S IZE
0,3 GDSF GDSF
LFU -1 0,3 LFU-1
0,25

0,2 0,25

0,15 0,2
50 100 150 200 100 150 200
C ache size Cache size

Figure 5. Byte Hit Rate of BU trace Figure 6. Byte Hit Rate of CSU trace.

For example, Boston University trace, which we have Using the formula above one can compute ratings of
analysed, includes more than 250 thousand requests and all the pages in the cache. If there is no room in the
has clearly shown changes in many of the popular ob- cache the least accessed document is replaced.
jects. Here we took 600 of the most popular objects in It should be pointed out that the best advantage of
the first and the second half of the trace to compare and this algorithm is determining object popularity both
found out that two third of the objects are different. during quite a long period of users accesses (the latest
Thus, we come to the conclusion that it is necessary m-requests) and a possibility to react quickly to appear-
to use replacement policies aimed to distinguish ance of new popular objects (the latest h-requests).
changes in object popularity. LRU policy exactly de- Parameters for LFU-K algorithm were selected as
termines changes in request probability in time but it is follows: m parameter was derived from analytical esti-
quite primitive and does not correspond properly with mations [7]. Value of h parameter was derived from
sharp changes in popularity. It should be pointed out empirical results. In our experiment we set h=3000.
that several special replacement policies for proxy sev-
ers based upon changes in object popularity in time 4. Experiment results.
have been worked out. For example, Pitkow/Recker
policy [6] is aimed to store documents accessed by the We have compared LFU-1 policy with the most popular
user in current 24 hours, other documents are marked as policies of real proxy servers traces (see Figure 5 and
outdated and get removed. But it is evident that changes Figure 6). Results of two traces are given. They are
in document popularity are more complicated. traces of Boston University (BU) and Chelyabinsk State
University (CSU).
As the Figures 5 and 6 show LFU-1 policy reflects
3. LFU-1 algorithm.
higher effectiveness in Byte Hit Rate in comparison
We suggested using LFU-1 algorithm for proxy cach- with other policies
ing [7]. This algorithm takes into consideration specific
character of changes in objects popularity. LFU-1 5. Conclusion.
turned out to be sufficient to show high effectiveness in
Byte Hit Rate. In this paper we have researched into effectiveness of
Let us consider a succession of document requests LFU-K policy for the purposes of caching on a proxy
(later request corresponds with a smaller value of in- server. The analysis of the traces demonstrates disad-
dex). ri is a number of a document requested vantages of basic replacement policies in real systems.
In comparison with popular replacement policies
LFU-1 algorithm shows higher efficiency. The results
r1 , r2 , …, rh , rh+1 , …, rm , … of the experiments allow to make a conclusion about
expediency of LFU-K algorithm on proxy servers in the
Web.
ν(i) s(i) Evaluation of LFU-2 algorithm effectiveness repre-
sents a special interest in future.
Where ν(i) is amount of entries of page number i We plan to try to raise effectiveness of caching
into the latest h requests and s(i) is amount of entries of mechanism by means of combined replacement poli-
page number i into the latest h requests. cies.

Rating(i) = s(i) + v(i)*t; where t = m/h.


Reference:
[1] Arlitt M.,Cherkasova L., Dilley J., Friedrich R.,
Jin T. Evaluating Content Management Techniques
for Web Proxy Caches // SIGMETRICS '99, Inter-
national Conference on Measurement and Model-
ing of Computer Systems, May 1-4, 1999, Atlanta,
Georgia, USA, Proceedings. Performance Evalua-
tion Review 27(1), June 1999. P. 3-11.
[2] Breslau L., Cao P., Fan L., Philips G., Shenker S.
Web caching and Zipf-like distribution: evidence
and implications // IEEE Infocom XX (V). 1999. P
1-9.
[3] Cao P., Irani S. Cost-aware www proxy caching
algorithms // In Proceeding of the 1997 USENIX
Symposium of Internet Technology and Systems.
1997. P 193-206.
[4] Cherkasova L. Improving WWW Proxies Perform-
ance with Greedy-Dual-Size-Frequency Caching
Policy // Technical Report HPL-98-69R1, Hewlett-
Packard Laboratories, Nov. 1998.
[5] Cunha C., Bestavros A., Crovela M., Characteris-
tics of WWW client based traces // Technical re-
port TR-95-010, Computer Science Department,
Boston University, Boston, MA 02215, USA, April
1995.
[6] Pitkow J., Recker M. A Simply Yet Robust Cach-
ing Algorithm Based on Dynamic Access Patterns
Technical report VU-GIT-94-39, GVU Technical
Report, 1994.
[7] Sokolinsky L.B. LFU-K: An Effective Buffer Man-
agement Replacement Algorithm // Database Sys-
tems for Advances Applications, 9th International
Conference, DASFAA 2004, Jeju Island, Korea,
March 17-19, 2004, Proceedings. Lecture Notes in
Computer Science, Vol. 2973. -Springer, 2004 P.
670-681.

You might also like