Professional Documents
Culture Documents
net/publication/221157996
CITATIONS READS
11 60
4 authors, including:
Some of the authors of this publication are also working on these related projects:
2nd International Conference on Advanced Research Methods and Analytics – Internet and Big Data in Economics and Social Sciences View project
All content following this page was uploaded by Johann M. Marquez-Barja on 21 May 2014.
A. Prefetching Principles
Web prefetching is a technique for reducing web latency
based on predicting the next future web objects to be accessed
by the user and prefetching them during idle times. So, if
finally the user requests any of these objects, it will be already
on the client cache. This technique takes advantage of the
spatial locality shown by the web objects [1]. Fig. 2. Web prefetching type of requests
The prefetching technique has two main components: The
prediction engine and the prefetching engine. The predic-
tion engine runs a prediction algorithm to predict the next (i.e., extra traffic). Finally, User request (UR) refers to the
user’s request and provide these predictions as hints to the total amount of objects requested by the user (prefetched or
prefetching engine. The prefetching engine handles the hints not), and the User request not prefetched (URnP) represents
and decides whether to prefetch them or not depending on the number of objects demanded by the user that were not
certain conditions like the available bandwidth or the idle prefetched.
time. Both engines can work at any component of the Web As shown in Figure 2, the set of Prefetch request (PR) is a
architecture (clients, proxies or servers) [1]. Figure 1 shows a subset of the Prediction set (PD). The result of the intersection
generic prefetching scheme considering the prediction engine between the User request set (UR) and Prefetch request set is
located at the web server side and the prefetching engine at the the Prefetch hit subset (PH). This subset is the main factor to
client side. The prediction engine delivers the predictions to reduce the perceived latency. In Figure 2, A represents a User
the prefetching engine as hints attached to the server response. request not prefetched (URnP), which is a user request neither
These hints will be prefetched during idle times or whenever predicted nor prefetched. B is a Prefetch request made by the
there are enough available resources (i.e. bandwidth) [13], prefetching engine that is requested later by the user, thus
[3]. This scheme can be implemented in real world without becoming a Prefetch hit. C is a Prefetch Miss (PM) resulting
modifying the HTTP standard, as demonstrated in [14]. from an unsuccessful prediction that was prefetched but never
demanded by the user. This request becomes extra traffic and
B. Prefetching terminology extra server load.
The Predictions (PD), also known as Hints, are the number
of objects predicted by the prediction engine. Prefetch request C. Prediction algorithms
(PR) represents the number of objects prefetched. The number Previous studies [15] demonstrated that the Double De-
of objects prefetched that are requested later by the user pendency Graph (DDG) prediction algorithm presents, for
is the Prefetch hit (PH). The opposite of the Prefetch Hit the current web structure, a better cost-benefit relationship
is the Prefetch Miss (PM), which represents the number of than others from the final user’s point of view. It achieves a
prefetched objects that were never demanded by the user noticeable latency reduction over other traditional algorithms.
TABLE I
Therefore, our proposal to control the prefetching traffic is T RACES CHARACTERISTICS
designed for this algorithm.
The DDG algorithm is based on a graph that keeps track Trace Elpais Marca
of the dependencies among the objects accessed by the user. Year 2007 2007
It distinguishes two classes of dependencies: dependencies to No. of Accesses 505868 423559
an object of the same page and dependencies to an object No. of Pages 20253 29942
of another page. The graph has a node for every object that Avg. objects per page 24 14
has ever been accessed. There is an arc from one node (X) No. of Users 892 1180
Bytes transfered (GB) 1.48 2.06
to another (Y) if, and only if, at some point in time a client
Avg. objects size (KB) 3.08 5.10
accessed to node Y within w accesses to node X after node Y,
Avg. page size (KB) 77.08 75.93
where w is the lookahead window size. The arc is a primary Avg. HTML size (KB) 30.55 14.52
arc if X and Y are objects of different pages, that is, either Avg. image size (KB) 1.93 4.38
Y is an HTML object or the user accessed one HTML object
between X and Y. If there are no HTML accesses between X
and Y, the arc is secondary. The confidence of each primary the network when prefetching is employed divided by the
or secondary transition, i.e., the confidence of each arc, is objects transferred in the non-prefetching case. Nevertheless,
calculated by dividing the counter of the arc by the amount of this equation cannot be solved at the server side because in
appearances of the node, both for primary and for secondary the current HTTP implementation there is no mechanism to
arcs. notify the server when a prefetch hit occurs at the client
III. E XPERIMENTAL E NVIRONMENT side. Consequently, the number of Prefetch hits or Prefetch
Misses are unknown to the server. Notice that this latter term
In this section we describe the experimental framework used is directly related to the traffic increase and the server overload.
to develop and test the adaptive prefetching mechanism.
Fig. 3. Traffic estimation model error = ΔT raf f aim − ΔT raf f est (4)
TABLE II
T RAFFIC ESTIMATION VALUES T hreshold = T hreshold + [k ∗ error] (5)
The suitable value for the proportional constant parameter
Constant Value k in Equation 5 should be the value which takes the system to
a 0.90371720333486 the aimed range in a reasonable period of time and keeps the
b 1.04393086694572 system as stable as possible within that range; as a reasonable
c 1.00455739964836
period time we consider the equivalent time to 5,000 user
requests because the experiments showed that the system
becomes stable after this period. To evaluate the stability of the
experiment was performed over the simulation framework system under different values of k, we consider the minimum
described before, considering the DDG prediction algorithm error variance once the system has reached the reasonable
with a primary and secondary threshold from 0.1 to 0.9. The period time.
workload used is the same described in Section III-B but only The frequency to apply the control mechanism (control
taking into account the accesses of the first day. period) must be assessed based on the root mean square error
Figure 3 shows, on the horizontal axis, the prefetch rate (RM SE) of the Δtraf f icreal respect to the ΔT raf f aim .
calculated at the server side using Equation 2 and, on the Equation 6 shows the RM SE, where ΔT raf f aim and
vertical axis, the traffic increase at the client side calculated Δtraf f icreal are measured at the end of an experiment at
using Equation 1. The relationship between prefetch rate and the client side. We consider that a reasonable control period
traffic increase describes a quadratic behavior. is the value which achieves a lower RMSE.
Using a best fit recursive Least-Squares method we repre-
sent the behavior shown in Figure 3 by a mathematical expres- n 2
sion. The estimation traffic model obtained is mathematically i=1 ΔT raf f aim − ΔT raf f reali
RM SE = (6)
expressed in Equation 3 and graphically shown in Figure 3 n
as f (x). Table II presents the constants values for the set of In search of the best parameters for the adaptive prefetching
experiments obtained with the fit recursive method. mechanism, we performed a total of one hundred and ten
experiments modifying the value of k from 0.01 to 0.25 and the
control period equivalent to a certain number of user requests,
ΔT raf f est = a(P ref Rate)2 + b(P ref Rate) + c (3) varying this number between 50 and 250.
Table III shows the results for each experiment considering
The calculated model and its constants depend on the
the RM SE metric. To select the best pair combination of
prediction algorithm used. If other prediction algorithms are
k/control period we looked for the lower value of RM SE.
considered the traffic estimation must be recalculated. Once
Consequently, the best pair in the trace elpais is: k=1.5,
the traffic increase is estimated at the server side, the next
control period = 125 UR; for the trace marca it is k=1.5,
step is to control it.
control period=175. Therefore, we assume values for k=1.5
V. I NGELLIGENT C ONTROL M ECHANISM and control period=150 in our study.
In the prediction algorithms the prefetching aggressiveness VI. A DAPTIVE P REFETCHING B EHAVIOUR AND
can be controlled by the mentioned threshold cutoff parameter P ERFORMANCE
and consequently, the number of predictions given directly In this section we first present the environmental conditions
depends on this parameter value. By decrementing the thresh- where the experiments were performed and the metrics used
old value we increase the number of predictions and, in for the evaluation. Secondly, we present a set of experiments
consequence, the number of prefetch requests. to evaluate the adaptive web prefetching.
Fig. 4. Adaptive web prefetching mechanism
TABLE III
ROOT MEAN SQUARE ERROR RESULTS
Marca k=0.01 0.12017241 0.09391858 0.09363813 0.08443433 0.09016653 0.09107501 0.09148021 0.08965788 0.09093917
0.5 0.13128704 0.09211483 0.08116091 0.07737675 0.07636098 0.07510443 0.08113568 0.08042448 0.08494807
0.1 0.17200001 0.10473971 0.08889021 0.07463457 0.07316817 0.07610991 0.07295795 0.08086589 0.07832234
0.15 0.2380385 0.14506356 0.10687394 0.08527964 0.07580759 0.07153844 0.07933723 0.07452391 0.07849942
0.2 0.2593331 0.17036495 0.10605934 0.1395637 0.08934768 0.11475729 0.07537037 0.07815116 0.07468374
0.25 0.34932128 0.21289296 0.18603971 0.1337142 0.11953641 0.10615474 0.12548237 0.13252638 0.08195403
A. Prediction algorithm parameters index that better explains the latency per page ratio.
The parameter values used for the prediction algorithm are: • RM SE or the real error defined by Equation 6
• Primary threshold = adaptive but starting at 0.8 for a C. Performance Evaluation
conservative experiment startup. To evaluate the effectiveness of the adaptive prefetching
• Secondary threshold = 0.3. mechanism, we perform two sets of experiments for each trace.
The first set consisted of experiments with the non-adaptive
B. Performance metrics
prefetching for both traces, while the second set of experiments
From the taxonomy of prefetching metrics presented in [18], was performed with the adaptive prefetching mechanism.
we selected the following metrics to evaluate the performance In order to make a fair comparison between the non-adaptive
of our proposal: and the adaptive prefetching mechanisms, the aimed traffic
• ∇Latency P age : The latency per page ratio is the ratio value to be achieved by the adaptive prefetching mechanism
between the latency achieved with prefetching and the was set to the average traffic obtained in the experiment
latency with no prefetching. using non-adaptive prefetching. So, both sets of experiments
• ΔT raf f obj : defined by Equation 1. generate a similar mean traffic increase, and their performance
• ΔT raf f bytes : Same as ΔT raf f obj , but measured in can be evaluated through the rest of the measured parameters,
bytes. specially through the RMSE.
• P recision: The ratio of prefetch hits to the total number Figures 5 to 10 show the evolution over time, measured
of prefetched objects. in user requests, of the experiments for the trace Elpais and
• Recall: The ratio of requested objects by the user that the trace Marca. In these figures the traffic increase, the
were previously prefetched. This metric is the prediction threshold and the RM SE can be graphically observed and
1.4 0.8
Object Traffic threshold
Aimed Traffic
1.35 0.7
1.3 0.6
Object Traffic Increase Ratio
1.25 0.5
Threshold
1.2 0.4
1.15 0.3
1.1 0.2
1.05 0.1
1 0
0M 100M 200M 300M 400M 500M 600M 0 100000 200000 300000 400000 500000 600000
Time (User Request) Time (User Request)
1.4 1.9
Object Traffic Object Traffic
Aimed Traffic Aimed Traffic
1.35
1.8
1.3
1.7
1.25
1.2 1.6
1.15
1.5
1.1
1.4
1.05
1 1.3
0M 100M 200M 300M 400M 500M 600M 0M 50M 100M 150M 200M 250M 300M 350M 400M 450M
Time (User Request) Time (User Request)
Fig. 6. Elpais Adaptive Object Traffic Increase Fig. 8. Marca Object Traffic Increase
evaluated during the experiment. Concerning the trace Elpais, performance of the adaptive control mechanism is the RM SE
Figure 5 shows the non-adaptive prefetching traffic increase because it calculates the traffic increase deviation from the
and its average obtained from an experiment with a fixed aimed traffic. A higher RM SE value means that the traffic
threshold of 0.4In order to fairly evaluate both experiments, the is far from the aimed traffic, while a smaller value indicates
traffic increase average value in this non-adaptive experiment that the traffic increase is lower. Therefore, a smaller value
becomes the aimed traffic to be reached by the adaptive is always desirable when control is applied. The adaptive
prefetching experiment. On the other hand, Figure 6 shows prefetching experiment results show a RM SE value at least
the effect of the adaptive prefetching mechanism through the three times lower in comparison to the non-adaptive experi-
controlled traffic increase and its average; the variations of the
dynamic threshold can be seen in Figure 7.
1.9
Object Traffic
A secondary positive effect of the intelligent prefetching Aimed Traffic
1.7
and effects can be seen in Figures 8 and 9 for the marca trace.
1.5
Finally, as we can appreciate in Table IV, the adaptive
prefetching mechanism provides similar benefits to the non-
1.4
adaptive, decreasing the page latency. Since the aimed traffic
value set for the adaptive mechanism is the ΔT raf f obj 1.3
average from the non-adaptive, the cost for both techniques 0M 50M 100M 150M 200M 250M
Time (User Request)
300M 350M 400M 450M
R EFERENCES
Fig. 10. Marca Adaptive Threshold
[1] M. Rabinovich and O. Spatscheck, Web Caching and Replication.
TABLE IV Addison Wesley, 2002.
E XPERIMENTS - PREFETCHING PERFORMANCE EVALUATION [2] A. Eden, B. W. John, and T. Mudge, “Web latency reduction via client-
side prefetching,” in Proc. International Symposium on Performance
Analysis of Systems & Software ISPASS-2000, Austin, Texas, 2000.
Trace Metrics Pref non-adap Pref adap
[3] J. Domenech, J. Sahuquillo, J. A. Gil, and A. Pont, “The impact of the
Elpais ∇Latency P age [%] 7.8 7.71 web prefetching architecture on the limits of reducing user’s perceived
ΔT raf f obj [%] 11.9 11.7 latency,” in Proc. of the 2006 IEEE/WIC/ACM International Conf. on
Precision [%] 40.23 40.67 Web Intelligence, 2006.
[4] C. Bouras, A. Konidaris, and D. Kostoulas, “Efficient reduction of web
Recall [%] 6.16 5.94 latency through predictive prefetching on a wan.” in Proc. of the 4th
Avg. Threshold 0.4 0.4317 International Conf. on Advances in Web-Age Information Management,
RMSE 0.1376 0.0363 Chengdu, China, 2003.
[5] M. Crovella and P. Barford, “The network effects of prefetching,” in
Marca ∇Latency P age [%] 14.1 13.49 Proc. of the IEEE INFOCOM’98 Conf., San Francisco, USA, 1998.
ΔT raf f obj [%] 53.5 52.76 [6] L. Shi, L. Shi, B. Song, X. Ding, Z. Gu, and L. Wei, “Web prefetching
Precision [%] 28.67 28.69 control model based on prefetch-cache interaction,” in Proc. First
Recall [%] 21.33 21.19 International Conf. on Semantics, Knowledge and Grid SKG ’05, 2005.
[7] L. Wang, L. Zhang, Y. Shu, M. Dong, and Y. O.W.W., “A study of
Avg. Threshold 0.2 0.2281 measurement-based web prefetch control,” in IEEE Canadian Conf. on
RMSE 0.1931 0.0758 Electrical and Computer Engineering, Halifax, Canada, 2000.
[8] Z. Jiang and L. Kleinrock, “An adaptive network prefetch scheme,” IEEE
Journal on Selected Areas in Communications, vol. 16, no. 3, 1998.
[9] B. Liang and S. Drew, “Multiuser prefetching with queuing prioritiza-
ments. tion in heterogeneous wireless systems,” in QShine ’06: Proc. of the
Based on the mathematical and the graphical results we 3rd international conference on Quality of service in heterogeneous
can conclude that the adaptive prefetching technique presents wired/wireless networks. New York, NY, USA: ACM, 2006, p. 34.
[10] Z. Jiang and L. Kleinrock, “Web prefetching in a mobile environment,”
almost the same performance as an aggressive prefetching but Personal Communications, vol. 5, no. 5, 1998.
fewer peaks of traffic increase thanks to the control mechanism [11] S. Drew and B. Liang, “Mobility-aware web prefetching over hetero-
thus helping to reduce the negative effect of prefetching. geneous wireless networks,” in 15th IEEE International Symposium on
Personal, Indoor and Mobile Radio Communications, 2004.
The adaptive technique does not add complexity to the [12] S. Drakatos, N. Pissinou, K. Makki, and C. Douligeris, “A context-
algorithm; consequently, it does not add time to the server aware prefetching strategy for mobile computing environments,” in
response service time. Due to the simplicity of the traffic IWCMC ’06: Proc. of the 2006 international conference on Wireless
communications and mobile computing, New York, NY, USA, 2006.
estimation and the adaptive mechanism proposed, there is no [13] T. M. Kroeger, D. D. Long, and J. C. Mogul, “Exploring the bounds of
need for extra computing resources to apply it. web latency reduction from caching and prefetching,” in Proc. of the 1st
USENIX Symposium on Internet Technologies and Systems, Monterey,
VII. C ONCLUSIONS USA, 1997.
Even though web prefetching is a successful latency re- [14] B. de la Ossa, J. Sahuquillo, J. A. Gil, and A. Pont, “Web prefetch
performance evaluation in a real environment,” in IFIP/ACM Latin
duction technique, there is a risk to decrease the system America Networking Conf. 2007, 2007.
performance because it can generate extra requests at the [15] J. Domenech, J. A. Gil, J. Sahuquillo, and A. Pont, “DDG: An efficient
server and extra traffic in the network. With the aim of prefetching algorithm for current web generation,” in Proc. of the 1st
IEEE Workshop on Hot Topics in Web Systems and Technologies, 2006.
reducing these negative effects we have developed an adaptive [16] J. Marquez, J. Domenech, J. Gil, and A. Pont, “A web caching and
prefetching mechanism at the server side to control the traffic prefetching simulator,” in Proc. of the 16th Int. Conf. on Software,
increase and its impact on the system. Telecommunications and Computer Networks, Split, Croatia, 2008.
[17] D. Fisher and G. Saksena, “Link prefetching in mozilla: A server driven
In order to develop the intelligent mechanism, we also approach,” in Proc. of the 8th International Workshop on Web Content
developed a model to estimate the overall traffic increase Caching and Distribution (WCW 2003), New York, USA, 2003.
generated by the prefetching. Both the traffic estimation and [18] J. Domenech, J. A. Gil, J. Sahuquillo, and A. Pont, “Web prefetching
performance metrics: A survey,” Performance Evaluation, vol. 63, no.
the adaptive mechanism are performed at the server side with 9-10, 2006.
the scant information available at this element (i.e. prefetch
request and user request not prefetched).