You are on page 1of 47

Web Performance Insights

2011
1

By Catchpoint Systems

Share this eBook

Foreword
This is a collection of our most insightful and popular web performance blog posts from 2011. We created this anthology as a way of sharing web performance knowledge. Today, as the vox populi demands universal speed and reliability across the internet, it is more important than ever for companies to meet end user expectations. Moreover, as businesses continue to lease out their cyber real estate to 3rd Party Providers to increase revenue through advertising, track and analyze user behavior, or save money in infrastructure costs, it is crucial for us to understand the impact of each one to the overall performance of the website. Enjoy the read and take a look at our blog for more insights on operating fast websites.

-- The Catchpoint Team

Contents

Webpages turning into Airports without Traffic Controller! ...................................................................... 4 The Biggest Misconception about Google Page Speed ......................................................................... 8 Free DNS Can Hurt Web Performance! ............................................................................................... 13 Relying on Web Performance Monitoring to Discover Release Problems ............................................ 18 Getting the Most Out of Performance Monitoring: Setting Alert Thresholds.......................................... 23 Three Key Steps to Successful Web Performance Benchmarking ....................................................... 26 A New Blind Spot for Web Performance Tools ..................................................................................... 33 My 2 cents on the AWS Failure and lessons learned from the past...................................................... 37 Royal Wedding and the Internet ........................................................................................................... 40 WPO Resolution for 2012! ................................................................................................................... 44

Webpages turning into Airports without Traffic Controller!


Posted October 10, 2011

he other day I tried to book a trip on my favorite airline, Virgin America. I love everything about them: their planes, the service, the staff, the entire experience is just amazing. Plus I was lucky enough to purchase some of their special vouchers on Gilt (another company I love).

However, this time I faced frustration on their website as for some reasons I could not get the date picker to work. The entire page was visible but I could not interact with it. No love back! Annoyed, I started digging into what was getting on the way. Fired up the Chrome dev tools and also ran tests on Catchpoint to see what was being loaded and what was failing to load. To load http://www.VirginAmerica.com my browser had to: Download 167 Objects (Images, CSS, JavaScript files) Connect to 51 unique hosts Download 1 million bytes.

What are those 51 hosts? Two of the hosts were required to load the content that mattered to me on the page: static.virginamerica.com & www.virginamerica.com. The other 49 hosts did not deliver any content required for me as a user to buy my tickets (I tracked SSL and Non SSL hosts as separate hostnames). They were all tracking pixels for ads/marketing and analytics tags.

--- Webpages turning into Airports without Traffic Controller! ---

By looking at the various requests, I discovered that a call to one of the analytics tags on the page was hanging and made the page 100% unusable. After a few unsuccessful refreshes of the page I decided to block the offending domain using OpenDNS to get the site to work again and purchased my ticket! I am a big believer in Online Advertising and in Analytics. Heck, I worked for DoubleClick & Google for 11 Years and I know that many if not all these companies spend a great deal of money on monitoring their performance. However, lately I have observed an interesting trend: Webpages are becoming bloated with third party tags which often cause user experience problems, all of this culminating in a battle between the IT and Marketing teams on what actions should be taken. For many companies the Marketing team has direct access to the site content and can place third party tags live without proper testing or asking themselves: How is this tag going to impact my performance? What will the impact be on the experiences of my end users? Is the revenue generated by this tag worth the user frustration? The IT and web development teams are constantly trying to do more with less money or fighting battles they know they will lose and they give up. I have also found out that for several companies the IT operations teams ignore problems from third party tags (even when reported by third party monitoring solutions like ours). Main reason is simple; they do not have the means to correct the problem. Yet end users are impacted and action is not taken until someone else in IT notices performance numbers creeping up or users complain on Twitter, Facebook, or Forums. This is a dangerous path. There are several tools and techniques out there to make sure these 3rd party tags do not impact your end user experience: Ghostwriter ControlJS The BrightTag Tagman Outclip

Another important point to keep in mind is that even if you have optimized the tags so they do not impact performance, all these JavaScript requests might not play nice with each other inside the

--- Webpages turning into Airports without Traffic Controller! ---

browser. Hence, you might want to have some approval process which includes extensive testing of the vendors tags in actual webpages. The Virgin America page looked like a busy airport with 167 planes from 49 different airlines landing at the same time and no Air Traffic Controller. Safe travel and please do care about your End User Experience.

Photo Credit: Artist Ho-Yeol Ryu - Check out his amazing Gallery.

--- Webpages turning into Airports without Traffic Controller! ---

The Biggest Misconception about Google Page Speed


Posted December 27, 2011

n my conversations with customers and prospects we often talk about one of the biggest IT problems: How can we make the website faster for end users. Obviously Web Performance Optimization (WPO) techniques are key to faster webpage and

Google Page Speed score is a key tool on measuring how well they are applied. However, quite often I hear comments about the score that make no sense like We have spent a lot of $, time and efforts to get amazing Google Page Speed scores but our site is still slow. what did we miss? or Competitor X has a score 50, we have 85, yet they load twice as fast! These concepts clearly show a misconception of what Google Page Speed score does. Certain people fall prey to the idea that pages with HIGH Page Speed scores should be faster than pages with LOW scores. There are probably various causes to this misconception, from the name to how certain individuals or companies sell products or services. Our goal with this post is to clarify why this belief is false, and why you should not rely on it. Google Page Speed is a score calculated based on Web Optimization rules that would improve the rendering and execution of the front end code (HTML, CSS, etc) on browsers. Per Googles definition: These rules are general front-end best practices you can apply at any stage of web development. In other words, the score does not look at your Infrastructure, Application Performance, DB Queries, datacenter distribution, load handling, content loaded on the browser, etc. Most importantly it does not include the actual speed of the page and therefore cannot be used a yard stick about who is faster Page A or Page B. To illustrate the point that there is no correlation between Page Speed Score and how fast a page loads lets take a look at the performance data we collected over Black Friday and Cyber Monday 2011 for the top Internet Retailers. We measured several speed metrics and the Google Page Speed score utilizing Catchpoint synthetic agents in major US cities. We will focus on two key metrics which are most commonly used and that are the best gauges of webpage speed.

--- The Biggest Misconception about Google Page Speed ---

Document Complete Time & Google Page Speed Score

As you can clearly see from the chart there is no relationship on the Document Complete and Google Page Speed Score. The fastest site, J.C. Penny and the slowest site, Target, had an identical score of 84. Web Page Response Time (fully loaded) & Google Page Speed Score

--- The Biggest Misconception about Google Page Speed ---

10

Even when looking at the fully loaded page, we still see that there is no correlation with Google Page Speed. Google Page Speed score should be treated as a grade of how good of a job your front end developers (or optimization solution) has done in making a page that renders as quickly as it can given the content it needs to display. When you compare it with your competing sites, use it to compare how good of a job they have done -do not use it as measuring stick for speed. To measure speed use metrics like Response, Document Complete, Fully Loaded, etc. The same reasoning holds true for the YSlow score from Yahoo. So great we clarified the Page Speed misconception, your front end team does an awesome job and gets the score in the 90s for the entire site but somehow the site is still under-performing competition in speed. Well take a look at the rest of your system, analyze the data, review it carefully and I am sure there is plenty you can do to make it faster: Trim down bytes. You could have a well optimized page, but if you are loading 2mb+ of data it will be slow for the average Joe. Do you really need that rather large SWF file? Should every visual effect be an image? Analyze application performance. Take a look at every page and process. Are there particular queries taking too long? Is the mid-tier/backend code that needs optimization? Can you achieve better caching rather than reading from disk for every pageview? Are certain processes impacting performance? Analyze how application does under user load. On a developers machine might be fast, but get 100 users on it and maybe there are problems related to code, hardware, etc. Analyze your infrastructure. Take a look at Routers, Switches, Load Balancers, various machines, are they over utilized? Are they misconfigured? Evaluate your datacenter location and ISPs. Are your users mainly on West Coast, but serving all content from East Coast? Maybe you need a new datacenter, or move it to Midwest so it has better coverage of US. Evaluate your third party vendors. If you are relying on CDNs, Adserving, etc ensure they are not impacting your speed.

--- The Biggest Misconception about Google Page Speed ---

11

In conclusion, the speed of your page is not dependent only on front end code or Page Speed score it is dependent on your entire system and infrastructure. The engineering and operations team must talk and work with each other and get a better understanding of what is impacting speed. Everyone in an organization is responsible for a Fast Web Site / Application.

--- The Biggest Misconception about Google Page Speed ---

12

Free DNS Can Hurt Web Performance!


Posted July 4, 2011

13

fter working with one of our clients earlier in August, I tweeted the following: I am just amazed how many companies use their registrars DNS as primary DNS not GOOD! In reply to the tweet I received several questions, and it became clear that registrar-

provided-DNS needed a discussion all of its own. (I have previously talked in our blog about the importance of DNS on web performance) Usually a company buys a domain from a registrar, (such as Godaddy,Network Solutions, 1and1, etc.) Then they either delegate that domain to their own DNS system, or rely a 3rd party service to manage it (suchas Dyn,Cotendo, Verisign, Nominum, Cloudfloor, UltraDNS, DNSmadeeasy, etc.), or rely on the registrars DNS services. Dont get me wrong the DNS services offered by a registrar are more than sufficient for the great majority of the websites in the internet like blogs, personal sites, or sites with small presence. Even if you are medium size website, a registrars DNS could work just fine if you rely on long TTLs and dont need any advanced features like geographical load balancing or fast failovers capabilities. On the other side, a registrars DNS might not be your best choice if you are a website with global presence and web performance is key to your success, or you are a third party service that impacts the performance of your clients (like adserving) and have SLAs. In addition if you rely on CDNs to serve the static content, why rely on a registrar for the DNS entries pointing to the CDN? You are investing into speed might as well invest on all the components impacting speed and DNS is the first one to impact it. Registrars offer their services for free and often the price reflects in their performance. Keep in mind not all registrars are equal their level of investment in their infrastructure varies and so does their quality. Either way, the most common reasons as to why the DNS performance of a registrar could be poor are:

Their DNS Servers are not well-distributed geographically and/or not relying on technologies like IP Anycast to route DNS queries to the closest servers. Their ISP peering points might be limited.

Their DNS servers are not the fastest or not reliable. We have seen many timeouts as a direct result of poor performance from registrar-provided DNS. --- Free DNS Can Hurt Web Performance! ---

14

At Catchpoint we monitor the DNS performance from multiple geographical locations relying on three distinct methods: Measure DNS Resolution as part of a web performance monitoring. Relies on a DNS resolver and it respect TTLs Emulate a DNS Resolver (performs recursive queries to resolve the domain) with a clean cache. Directly query a specific NS server, and measure the performance of that server.

To illustrate the performance problems, let me present two actual client cases we dealt with this year. (To protect the privacy of our clients we are not making public who they are, the domains, or the registrars): Example 1: A Catchpoint client observed multiple DNS failures through our IE8 browser based monitoring. The client relied on a registrar to host the CNAME to their CDN. We analyzed which NS servers involved in the domain resolution and ran a performance analysis for each server. The following scatterplot displays the raw data collected on IE8 Agent on a 3 day period in February/March 2011:

Each one of those red dots represent a failure to resolve DNS and they were all caused by a registrar used.

--- Free DNS Can Hurt Web Performance! ---

15

Example 2: An adserving company was relying on a Registrar for their DNS. They were experiencing slow performance and had high impressions discrepancies with other adserving solutions. The following chart shows the Response time of a simple ad call with the DNS resolution time.

At Webperf meetups I emphasize that when monitoring web performance it is vital to see the entire picture, and that picture includes DNS DNS is the first, critical link between you and your customers.

And finally, some of the recommendations we give regarding DNS handling: Avoid Short TTLs where possible. (especially if you must rely on registrar DNS infrastructure) Avoid multiple CNAMEs. Use distributed DNS infrastructures based on your user base, or use third party that specialize in DNS resolution. When hosting your own DNS infrastructure, make sure you have the capacity to handle DDOS attacks & traffic surges. Use Catchpoints tools to effectively and reliably monitor your complete DNS response paths. Make sure to keep your internal LAN DNS records separate from your production DNS. You can also make sure your CDNs and other 3rd parties rely on Anycast. Article from Patrick Meenan about the importance of Anycast and its impact on Web performance.

--- Free DNS Can Hurt Web Performance! ---

16

In conclusion, make sure you rely on the right DNS service based on your needs. Just like any other purchase, there is correlation between price, features and quality free or cheap services do not offer the best speed and reliability and might lack some of the features you need. If speed is key to the success of your company, invest money into a third party DNS service and make sure you configure it right. .

--- Free DNS Can Hurt Web Performance! ---

17

Relying on Web Performance Monitoring to Discover Release Problems


Posted on March 25, 2011

18

n the 1990s the websites were quite simple, served by a single server talking to a single database, JavaScript and Flash had just been introduced, AJAX was being developed, and HTTP 1.0 protocol was prevalent across the World Wide Web. Now, years later and that same webpage has turned

into a complicated web of services, servers, and applications all working together to serve content to the end-user. Most websites rely on 2+ servers and services just to get the content of the base URL! Once the base URL is loaded, its HTML has calls to even more internal and third party services like adservers, CDNs, Content Personalization, Page Optimization, Tracking Pixels, Widgets, etc. The smallest mistake from any of these services, internal or external, and the end user pays the price of bad experience and frustration. The bad news for the company is that unlike in the 90s, when a user might not have a choice to get the content elsewhere, today that same user can go to one of the 100s of competitors out there in a blink of an eye. Therefore optimizing the webpages and services for faster website performance and better fallback in case of failure, has become very important, however, it is not enough. Continuous performance monitoring of all the services involved in delivering you website has become a must for all companies. Any un-expected performance degradations needs to be analyzed carefully and action taken before there is any impact to business.

Case Study: New Website Release Impacts Web Performance for IE Users We recently observed a major performance degradation with a very popular website in US, which we were monitoring. The website performed a release on the night of March 22nd, during which time it was down for about 2 hours. The day after the release the performance of the webpage slowed down by 100% going from 4.5 seconds to 9 seconds.

Response for the Base URL and the Webpage (Hourly Average)

--- Relying on Web Performance Monitoring to Discover Release Problems ---

19

Not only the response for entire webpage doubled, but also the base URL response slowed down by 80%. Looking at the requests and connections the webpage made, there was a jump in the number of connections, however no increase in number of the items loaded on the page.

HTTP Connections and Hosts (Hourly Average)

Number of Items Requested (Hourly Average)

This was a clear sign that the hosts on the webpage were closing connections on every request. We also confirmed the cause by looking at the waterfall charts which showed 11 requests (including base URL) utilized HTTP 1.0 and resulted in 11 different connections.

Number of Requests and Connections by Host

--- Relying on Web Performance Monitoring to Discover Release Problems ---

20

The issue is also clear from the http headers of the request and of the response we can clearly see that the site is utilizing HTTP 1.0 and closing the connection with the Connection: close HTTP header:
GET /layout/css/style-8194-1234.css?v=1234 HTTP/1.1 Accept: */* Referer: https://www.SomeSite.com/ Accept-Language: en-us User-Agent: Mozilla/5.0 (Windows; MSIE 9.0; Windows NT 6.1; Trident/5.0; BOIE9;ENUS) UA-CPU: x86 Accept-Encoding: gzip, deflate Host: www.SomeSite.com Connection: Keep-Alive Cookie: VisitorId=002....... HTTP/1.0 200 OK Date: Fri, 25 Mar 2011 14:26:18 GMT Server: Apache-Coyote/1.1 Last-Modified: Fri, 25 Mar 2011 08:13:57 GMT Content-Type: text/css Vary: Accept-Encoding Content-Encoding: gzip Expires: Thu, 15 Apr 2020 20:00:00 GMT Cache-Control: private Connection: close

The use of Connection: close had a bigger impact on the website performance, because the site was utilizing HTTPS. As a result on every HTTP request the browser not only had to open a TCP connection, but also had to establish an SSL handshake. The other interesting fact we noticed was that the problem occurred only on Catchpoints Internet Explorer agent, but not in the other agents we were testing from! The same requests were made by all agents, however for IE the site used HTTP 1.0 while for the other browsers HTTP 1.1 . We repeated the test on the IE agent and modified the user agent to exclude the MSIE string and voil the server went back to using HTTP1.1 .
GET /layout/css/style-8194-1234.css?v=1234 HTTP/1.1 Accept: */* Referer: www.SomeSite.com Accept-Language: en-us User-Agent: Mozilla/5.0 (Windows; Windows NT 6.1; Trident/5.0; BOIE9;ENUS) UA-CPU: x86 Accept-Encoding: gzip, deflate Host: www.SomeSite.com Connection: Keep-Alive Cookie: VisitorId=002....... HTTP/1.1 200 OK Date: Fri, 25 Mar 2011 14:35:10 GMT Server: Apache-Coyote/1.1 Last-Modified: Fri, 25 Mar 2011 05:32:33 GMT Content-Type: text/css Vary: Accept-Encoding Content-Encoding: gzip Expires: Thu, 15 Apr 2020 20:00:00 GMT Cache-Control: private

--- Relying on Web Performance Monitoring to Discover Release Problems ---

21

Keep-Alive: timeout=15, max=94 Connection: Keep-Alive Transfer-Encoding: chunked

The issue was caused by an old Apache configuration setting which by default forced HTTP 1.0 and turned off Keep Alive for browser containing MSIE in the user agent string. Summary Websites have become more and more complicated relying on multiple services, servers, application both managed by the website owner or outsourced to other third parties. These internal an external dependencies have a direct impact on the web performance of the pages with varying impact. Monitoring the web performance of the website continuously is key to ensuring its reliability.

--- Relying on Web Performance Monitoring to Discover Release Problems ---

22

Getting the Most Out of Performance Monitoring: Setting Alert Thresholds


Posted July 4, 2011

23

common question with our customers is, Whats the best way to choose an alert threshold for

analyzing my webpage response time? It is a tricky question, and one whose answer varies case by case. Set the threshold too low and youll be distracted by or worse, dismissive of

alerts as they fill up your inbox. But set it too high, and you might not know when some of your end users are having an unacceptably slow site experience. Choosing response time alerts is very much a balancing act. To illustrate this point, lets look at a case from an actual Catchpoint customer who recently went through the exercise of setting alert thresholds. First, they looked at their sites average response times over the course of a week. A common practice is to take the average, add a little extra as a buffer, and presto alerts are set!

For this customer, the average (Chart 1) was a little under 7 seconds 6,834 ms, to be exact. Adding a little buffer, they set the alert threshold at 11 seconds. Unfortunately and unexpectedly the 11second threshold yielded about a gazillion alerts for our customer. So what happened? The problem in this case has to do with variability of site usage and deviation from the mean. If you look carefully at Chart 1, you will see that the valleys occur during off business hours, and the peaks occur during the day. What the chart is not showing is that during business hours, there is significant variability in response time. Looking at Chart 2, a scatterplot of the values measured over the same period, you can see that the distribution of response times is far wider than Chart 1 would have you believe. In fact, the averages in Chart 1 never exceed 18,000 ms, whereas in Chart 2, we plainly see that there are dozens of instances of response times in excess of 20,000 ms.

--- Getting the Most Out of Performance Monitoring: Setting Alert Thresholds ---

24

Its obvious from Chart 2 that an 11 second alert threshold will trigger a lot of alerts. When youre using simple average over a period time to set alerts, youre ignoring the fact that the average is only an average. To set an alert you have to understand the data better and you need to dig deeper. In Chart 3, we see the 95th percentile meaning that 5% of the samples had response times as slow or slower. This is where you can look to get a better picture of a sites performance at worst-case scenario. In the worst cases, the page is taking 24 seconds to load! So, what would you do? Would you set the alert level at 24,000 ms? 20,000 ms? 15,000 ms? Its a balancing act.

An alternative to the 95% is to rely on moving average, which relies on a subset of data based on a time frame. Catchpoint alerts support the ability to specify a dynamic threshold based on the average of a previous set of time. For example alert if response is 50% above the last 15 minute average. This solution allows you to take into consideration recent data to determine if the application performance went down. At the end of the day, its going to be a judgment call. Only you can decide what the proper level is for alert threshold, but we can tell you one thing for sure: you wont find the answer by just looking at your averages.

--- Getting the Most Out of Performance Monitoring: Setting Alert Thresholds ---

25

Three Key Steps to Successful Web Performance Benchmarking


Posted May 26, 2011

26

ne of the frequent questions I receive from clients is on How do I benchmark my performance against the competition?. There are different approaches to benchmarking, some better than others. The key to a successful benchmark, is to plan it carefully and collect

the right data points. I recommend companies to follow the following 3 steps: 1. Define the end goal of the benchmark. Ask yourself what will you do with this data? Are you trying to improve your website, a webpage, or a process? Are you trying to build a business case for a project or initiative? 2. Determine which areas of the site/application/system you will need to benchmark. If you are benchmarking to figure out your infrastructure distribution, you might care more about DNS and performance on the geographical areas of your end users. If you are planning on redesigning/rebuilding the site, you might care about the full performance of key webpages or key processes like a shopping cart. 3. Determine what tests to run, from what locations, and the frequency. Based on the purpose of the benchmark, and benchmark areas, you can determine how and where to test from. For example you might decide to benchmark DNS performance from key US states that account for the majority of your end users. You might decide to run the tests every 10 minutes, if you are planning major changes, or every 30 minutes if you are simply using the data for a business case. Over the years I have come across several benchmarks that failed for various reasons. Some of the major pitfalls are: Comparing Apples and Oranges. Sadly one of the biggest mistakes is not comparing the correct items. If you are benchmarking DNS performance, you cant simply average the DNS time of HTTP requests. If you have a DNS TTL of 5 minutes, and your competitor has a TTL of 15 minute, the averages will lie. Looking only at averages. If you are looking at an average across different cities, you might lose sight of issues. Looking at a metric without understanding what it means. Quite often people just pay attention to the full load time of a webpage, and ignore the rest of the data. However, webpages are different and the full time to load the page might not be the same across especially when pages are dynamically modifying the content.

--- Three Key Steps to Successful Web Performance Benchmarcking ---

27

Looking only at one metric. You collected all this data, but looking only at one metric is not going to help you. Dig deeper into the data so you can understand why others are better or worst. Lear from your competitors success or failure, so you can improve.

Case Study: E-commerce Benchmark Recently we assisted an e-commerce customer that had created a benchmark in Catchpoint to compare how the homepages of key competitors ranked. The benchmark included the homepages of BestBuy, Amazon, Apple, and Newegg. The goal was to understand where their homepage ranked relative to their competitors, and to determine the steps to improve their web performance.

Based on the data collected they came to the conclusion that the homepage of Apple.com was the fastest. There are several factors on why Apples homepage is fast: The Response of the url, the total time from issuing the request to receiving the entire HTML of the page, was really fast. Less Downloaded bytes, Apples homepage was 30-50% lighter. Less Requests and Hosts on the page.

--- Three Key Steps to Successful Web Performance Benchmarcking ---

28

This might seem like a successful benchmark, however, there was one little issue that made the benchmark inaccurate. The goal of the client was to compare the homepages of the competing ecommerce sites. But in the case of Apple they were testing the corporate homepage, which had a different business goal and therefore different design and implementation. The homepage of the e-commerce site for Apple is store.apple.com and not www.apple.com. When benchmarking to the correct e-commerce site of Apple, the picture changed. Apple was not much faster than the rest of the stores. (I kept the home page of Apple to show the differences).

--- Three Key Steps to Successful Web Performance Benchmarcking ---

29

--- Three Key Steps to Successful Web Performance Benchmarcking ---

30

To get a better look at the impact at user experience, we also looked at other metrics like time to title and render start time.

Visually, this is what it looked like loading those 5 sites from a node located in New York on the Verizon Backbone (using a 400 ms timer, the blink of an eye is 400 ms).

--- Three Key Steps to Successful Web Performance Benchmarcking ---

31

We also implemented the use of Apdex, an excellent way to score and compare numbers from diverse pages. Apdex normalizes the data based on target goals, which vary from webpage to webpage (as we saw with Apple). For demonstration purposes I used an Apdex target response of 5,000 ms (5 seconds) for all the tests above.

To sum it up, a successful benchmark depends on clear end goals, everything else depends on it. Happy Benchmarking!

Methodology: all sites were measured using 26 US Nodes, every 10 minutes with Internet Explorer 8

--- Three Key Steps to Successful Web Performance Benchmarcking ---

32

A New Blind Spot for Web Performance Tools


Posted on March 2, 2011

--- A New Blind Spot for Web Performance Tools ---

33

eb performance tools rely on actual browsers to capture the performance data of a webpage and any requests made by a webpage. Monitoring from the browser, comes with some limitations due to the complexity of the browser and the internet, causing at times

what we call Blind Spots. A blind spot occurs when the data provided by a tool lacks clarity. The main blind spot with external monitoring is that you cannot always distinguish between network and application performance. At Catchpoint we introduced Catchpoint Insight and other features to remove this limitation and facilitate the understanding of the performance factors. Recently we came across another blind spot related to monitoring tools built on top of Internet Explorer. We internally refer to it as Objects in IE might be faster than they appear. It all started when a client of ours engaged us in a performance study regarding the impact of their Iframe tag on the pages of one of their clients. Their client was observing network times on the Iframe call that were quite high on an IE7 based performance monitoring service. We were able to reproduce the problem on the webpage with various tools like HTTPwatch,DynaTrace Ajax, Webpagetest, IE9 Developer tools, and even in the Catchpoint IE monitor. The performance numbers we observed made no sense! The response content of the Iframe URL was less than 1000 bytes (it fits in a single TCP packet), yet the tools were displaying 500ms+ for the time from the 1st byte to last byte of the HTTP Content. The only way this could happen is if there was something wrong at the TCP level and packets were fragmented and lost. To ensure it was not an issue at the TCP level, we utilized Wireshark to capture the TCP packets as we were monitoring the pages with the other tools and mapped the data from Wireshark to the metrics displayed in the tools. The data confirmed that the URL content was delivered always in a single packet and the URL response was less than 100ms. However, the monitoring tools built on top of IE still showed that 1st to last byte was 500ms or more for the same request! Clearly a new blind spot with IE! Since we proved it was not the network, the only other possibility was that something happened during the browser execution! We looked through the 20+ JavaScript files referenced on the webpage and we determined that the page executed JavaScript code when the DOMContentLoaded event was reached. The event is not native in pre IE9 browsers, and the page relied on one of two solutions: doScroll() or script defer to approximate when the event has been reached. Once the event was fired, JavaScript on the page made DOM modifications that were time consuming. However, this JavaScript execution time was not being displayed on the tools as gap.
--- A New Blind Spot for Web Performance Tools ---

34

To test what was happening, we created several simple pages that contained an Iframe pointing to URL. The pages also contained a JavaScript that created 10,000 spans and appended them to a DIV on the page. The JavaScript execution would vary on each page and rely on:

1. the doScroll() method to detect DOMContentLoaded and execute 2. the Script Defer method to detect DOMContentLoaded and execute 3. the native DOMContentLoaded for IE9 to execute the script 4. inline execution below the Iframe tag In all four test cases we observed that all the tools, including IE9 developer tools, always included the time of the JavaScript execution to the network time of the Iframe request! We replicated the test cases with an image in place of the Iframe, and were unable to reproduce the same results. Interestingly the issue did not occur on Firefox and Chrome on Windows but both clearly showed there was JavaScript executing and delaying rendering of the Iframe content.

Dynatrace IE7 Inline

HTTPWatch IE7 Script Defer

--- A New Blind Spot for Web Performance Tools ---

35

IE9 Developer Toolbar IE9 DOMContentLoaded

Webpagtest IE8 doScroll()

We believe the problem occurs due to the fact that the browser executes JavaScript in a single threaded mode and it takes precedence over the Iframe creation. The monitoring tools are relying on the browser to tell them when the Iframe is complete, but the IE browser does not mark the Iframe complete until the JavaScript execution is complete. Hence, the JavaScript execution time is included in the Iframe response time! This means that monitoring tools relying on Internet Explorer might append the time to execute JavaScript to the Iframe request time, if the JavaScript executes right after the Iframe request starts. This does not mean that the server serving the Iframe is slow and it does not mean that the Iframe slowed down the page. It simply means the JavaScript time was incorrectly attached to the iframe request. So the next time you see a very slow request in a monitoring tool, try the request standalone to ensure it is the request and not something else on the page. At Catchpoint we understand such blind spots have an impact on our users, therefore we have already started development work to address this issue on our waterfall charts. Our IE based monitor will be able to clearly distinguish between the network request time, and the JavaScript execution time. - Catchpoint Team
--- A New Blind Spot for Web Performance Tools ---

36

My 2 cents on the AWS Failure and lessons learned from the past
Posted April 25, 2011

37

lot has been published already about AWS EC2 failure, I wanted to add my 2 cents on the

issue as it reminded me of a notorious event that happened to DoubleClick in August 2000. What AWS and their customers experienced is unfortunate, but it will and it can happen to

anyone! In IT we are dealing with various complex systems hardware, software, and people things are bound to break at some point. Failure is not limited to IT, human history is full of such failures with automobile recalls, bank failures, nuclear disasters, collapsing bridges. What people should understand is that failure is bound to happen, be ready for it, and learn from it to avoid it in the future. Lets be real very few companies out there have the money and resources to have a redundant transactional systems running in parallel which can act as back up. For most companies you just have to fail nicely. You should have plans and processes to deal with everything from troubleshooting the failure, recovering from failure, to notifying customers of the failure, and most importantly architect your application and systems so they fail nicely and can recover from such failures. Companies that have websites or web application must be able to redirect all requests to Service is down webpage. Mobile or desktop applications relying on APIs might need to have special logic builtin for such failures. However, if you are a company delivering services to other website via tags, like adserving or widgets, things get a little more complicated. You cannot remove the tags from the webpages, unless your clients build it in their pages. You need to ensure you can deliver from another location enough to ensure your tags do not impact the web performance and usability of your clients websites! Back at DoubleClick we ran a fairly large infrastructure delivering billions of impressions, the DART tags are present on almost every major website. One day in 2000 we had a really bad outage and our tags stopped working because the adserving system experienced a catastrophic meltdown.Customers were not happy, but they understood that technology fails sometimes, and they had SLAs to protect them. What they were most unhappy about was that the DoubleClick ad tag had such an incredible impact on the performance of their sites. Webpages came to a crawl or stopped loading, the user experience was horrible! Our client couldnt recover from our failure some were able to remove the tags via their Content Management Systems but others just had to suffer from our failure. So we went back to the drawing board and built a complete secondary system capable of handling the billions of ad calls but that will only deliver 11 pixels or empty JavaScript. So in case of a major outage the ads would not work but at least would not take down the entire customers site with us and their

--- My 2 cents on the AWS failure and lessons learned from the past ---

38

user experience. That Dot system was never used in real life, but was always there in case we needed it. The first lesson for companies that provide services to other websites is to not rely on a single vendor for hosting and spare a few hundred dollars and get a backup plan. So next time AWS or anyone else goes down, you will not have impacted the user experience of the folks visiting your customers site. And once you have that backup system in place, test it every frequently! Make sure the right folks know when to pull the trigger and the system can handle it (capacity). The second lesson is about diversification; do not put all your eggs in one basket. If you go with vendor A for hosting, choose vendor B for DNS, choose vendor C for CDN Lastly, if you are website relying on 3rd party vendors, make sure you monitor them. Also learn about their technology and their vendors, who they are relying on for hosting their technology, who is their DNS provider, and most importantly what are their back up plans in case that tag comes to a crawl! The cloud is great, it is the future of IT -but do not drink too much of the kool-aid or cloud-aid, be ready for outages and failures! Mehdi - one of the guys who handled those angry customer phone calls in 2000. For more about the AWS issue : The Big List of Articles on the Amazon Outage

--- My 2 cents on the AWS failure and lessons learned from the past ---

39

Royal Wedding and the Internet


Posted April 29, 2011

40

read an article yesterday predicting that the Royal Wedding was going to be a big stress test to the Internet. To observe the impact of the Royal Wedding on the Internet, I decided to monitor some of the popular websites that provided coverage for the Royal Wedding: Yahoo, CNN, Youtube,

the Official Site, Facebook, Twitter, BBC, MSNBC & the Telegraph We monitored the web performance of each website from all of our global agents, using Internet Explorer 8 as a browser. The goal was not to compare the performance of the sites, but to see how well each website handled the traffic during the ceremony. Based on the collected data BBC and Yahoo experienced slow performance and had availability issues. Youtube and Facebook also experienced slowness later in the day starting around 6 am ET, when the US East Coast woke up. News Outlet Category:

Social Media Category:

--- Royal Wedding & The Internet ---

41

Other Sites:

BBC Scatterplot view:

Yahoo Scatterplot view:

--- Royal Wedding & The Internet ---

42

We also monitored the performance of the major CDNs (Edgecast, Cotendo,Akamai, Limelight, CDnetworks) but the data did not reflect any major impact on their performance. URls monitored: BBC: http://www.bbc.co.uk/news/uk-11767495 CNN: http://edition.cnn.com/SPECIALS/2011/royal.wedding/live/ MSNBC: http://windsorknot.today.com/ Facebook: http://www.facebook.com/event.php?eid=101946883225381 Twitter: http://twitter.com/#!/ClarenceHouse Youtube: http://www.youtube.com/user/TheRoyalChannel Official Site: http://www.officialroyalwedding2011.org/ Telegraph: http://www.telegraph.co.uk/news/uknews/royal-wedding/ Yahoo: http://royalwedding.yahoo.com/

Definitions: Response Time: The time it takes from the request being issued by the browser, to the Last Byte received from the server for the primary URL. Web Page Response Time: The time it takes from the request being issued, to receiving the Last Byte of the final element on the page. It reflects the impact of all the requests in the webpage. Wait Time: The time from the connection to the server established and the request sent, to the First Byte of response from the server. It reflects the performance of the server in processing the request.

--- Royal Wedding & The Internet ---

43

WPO Resolution for 2012!


Posted December 19, 2011

44

A
field.

s I look back at the state of 2011 web operations, the thing that impressed me the most was the success of the Web Performance Optimization (WPO) movement. Comparing it to recent world events, I think this movement is the Arab Spring of the Web Development and

Operations community. Web Performance meetups launched everywhere around the world, Velocity Conference got a passport and travelled to Europe and China, the number of people interested and invested in this subject have exploded, and so have the number of companies and investments in this

The success of this movement is mostly due to the hard work of several individuals and companies in the last 5 years Steve Souder, Patrick Meenan,OReilly, Google Speed initiatives, Joshua Bixby, Sergey Chernyshev, Stoyan Stefanov, Alexander Podelko, and so many others. Thanks to them WPO and Web Performance Monitoring is no longer reserved to the few 1% who can afford fast servers and bright engineers. The techniques to speed web sites have been documented (Yslow, Google Pagespeed), books have been published (High Performance Web Sites: Essential Knowledge for Front-End Engineers by Steve Souders, Web Performance Tuning: Speeding Up the Web, Building Scalable Web Sites: Building, Scaling, and Optimizing the Next Generation of Web Applications, Even Faster Web Sites: Performance Best Practices for Web Developers), and automated optimization tools have flourished like aiCache, Strangeloops, Blaze.io, etc. This movement has been amazing as it makes the web experience for end users faster and better. Another positive development has been also sharing that is taking place in this industry thanks to sites like PerfPlanet, twitter and engineerings teams are quick to share their latest experiments and results like Wayfair, Netflix and Etsy. But as 2011 is winding down I am still perplexed by the lack of implementation of 2 major WPO best practices that give the most performance boost without any development efforts or new hardware: HTTP Compression and Persistent HTTP Connections. I am not sure if this is by choice or negligence or a combination of the two. In regards to compression, when there is a CDN involved I think its mostly negligence because CDNs do not automatically turn on compression. I have been on way too many calls where I have heard oh we forgot to turn that on. Please compress HTTP on your own servers and ensure your CDN has compression enabled for your account. While we monitored the top 50 retailers on Cyber Monday, the Sony.com homepage downloaded around 2.6 mb of data, of which 1.2 mb where un-compressed CSS and JS! In this case their CDN is --- WPO Resolution for 2012! ---

45

Akamai. (See link to HTTP ARCHIVE 11/15/2011). The compression at the CDN level must be ON by default and not the other way around. Persistent Connection or Keep Alive is a feature of HTTP 1.1 which allows a browser to re-utilize an existing connection with the server. Today almost all web servers and browsers support HTTP 1.1 Keep Alive and there is no reason why so many sites still do not have it enabled. The biggest advantage is that it eliminated the need to establish an HTTP connection for every request to the server which can quickly add up. So on the 2nd of January 2012, please take the time to make a call, send an email, fax, or even send pigeons with these 2 instructions to your operations or devops and CDN account rep: Turn on Compressions for all HTML, scripts, css and text resources; and make sure Keep Alive is on! You will save money on bandwidth, your end-user will be happier and your systems and network will be less bloated!

--- WPO Resolution for 2012! ---

46

Make Users Happy, Have a Fast Website.

Don't let slowness and poor performance impact your business. Measure and monitor the performance of your website, to ensure a fast and highly available website. Get a free demo and trial of Catchpoint Web Performance solution.

www.catchpoint.com/trial

Share this eBook

You might also like