You are on page 1of 48

Improving Web Application

Performance Using Six Sigma

Mukesh Jain
Principal Quality Manager
Global Online Services
Microsoft Corporation

1
We will talk about…
z What is Quality?
z Types of Performance Measurement
z How to Measure Web App Performance?
z Traps we fall into and how Six Sigma
showed us the right path
z Overview of Six Sigma
z Using Six sigma to Improve Performance
z 5 Steps to Improve Web App Performance
z Ensuring problems do not come back
z Questions
2
What’s the biggest problem today

Today’s Market demand best of everything


z Quality
z Cost
z Schedule

Schedule

Reliable, Responsive and Secure service 24X7


is the demand of the global customer

3
Defects are inevitable

z Fact: No software can be guaranteed


100% defect-free
z Action: No action
z Result: We make it horribly true
z Ask: Why the defect happened?
z Do: Analyze data and improve the
process to prevent it
And Sustain it Æ Six Sigma
4
What is Quality?
z Meets expectations
z Serves the Purpose / Needs
z Intuitive / Usability

z Reliable
z Responsive / Performance
z Security / Privacy

z High Quality/Low Defect

z Getting it right the first time, every time

z Think Global
5
Why is Performance important?

z More online activities


z Competitors are faster
z Users have more options
z Attracting and retaining users
z We cannot overcome speed of light
z Release it and then fix it – no longer
an option – we may loose Mind-Share
and we may not get second chance

6
Solving Performance Problems
Your app is having performance issues
z Traditional way
z Put more people
z Do more performance testing
z But Performance Testing tools
z Outcome/Results
z We often march on a journey to solve a problem without
understanding the problem
z If we don’t know what we are improving, how would we
know if we have improved?
z Structured way of solving the problem
z Use Six Sigma, we can define the problem, measure the
right thing, analyze root cause and then solve the right
problem – the right way…
7
What is Six Sigma?
z Structured Problem solving methodology
z Solving the right problem – the right way
z Focus on finding and fixing the root cause
z Ensuring problem does not come back
z Drive Continuous Improvement
z DMAIC

8
Identify major user
Monitor to scenarios & Perf
ensure perf do 1.DEFINE Goals
not degrade with
future changes to
the code
Use right set of tools
to measure & baseline
5.CONTROL current Performance
2.MEASURE Measure the following
(SUSTAIN) WebPage Page Load Time (1 & 2)
Time to First Byte
Performance # of round trips
Avg Bytes Download
Improvement
DMAIC cycle

Analyze data, find reasons /


4.IMPROVE 3.ANALYZE root-cause for slow perf
Consult Perf Experts

Fix root cause of perf


problems in your code/
architecture/deployment
Six Sigma - DMAIC

z Define (D) : Zero in on specific problem with defined return on effort


z Measure (M) : Determine current performance of process
z Analyze (A) : Validate key drivers of performance (root cause of problem)
z Improve (I) : Improved performance and validated realized results
z Control (C) : Implement controls to ensure continued performance

Project Phases and Deliverables

Define Measure Analyze Improve Control

¾ Project selection ¾ Key output variables ¾ Key causes (vital few) ¾ Improvement strategy ¾ Lock in the results
¾ Project charter (metrics or Y’s) of defects (X’s) ¾ Prioritize solutions (control plan)
¾ Critical to Customer ¾ Possible causes of ¾ Tested & measured •Mistake proofing
(CTQ) needs defects (X’s) solutions •Control points
¾ High level process map ¾ Data collection and ¾ Final solutions •Monitoring Plan
presentation plan •Positive hand off
¾ Current Performance of control plan
¾ Internal/ external ¾ Financial impact of the
benchmarking project

10
The Six Sigma Project - challenges

z Bottom-up approach
z Difficult to get buy-in
z Myth: Six Sigma cannot be used for software
z We don’t need a Six Sigma project to
understand this problem and to fix it
z Its common sense – we can fix it
z Do it if you have time – on your personal time
z Measuring the right things
z Avoiding temptation of jumping to solution

11
Six Sigma – Define Phase
z Project Selection
z The right project (Strategic, ROI, etc.)
z Project Charter
z Why are we doing this? Business Case
z Goal, Scope, Time-line
z Exec Sponsors, Team & Resources
z Web App XYZ (which drives 40% of revenue to the company) have
performance problems (based on survey). This is impacting significant
% of users and we are seeing decline in users to the site.
z Customer Focus
z Voice of Customer (VOC) – what do customer need
z Critical to quality (CTQ)
z Customers have repeatedly complained about the performance of the
XYZ app. They are not able to complete transactions in timely manner.
z Current Situation
z Process Map
z Current Performance

12
The Define Phase: Problem Statement

Testers did not do performance testing



Performance bugs were not fixed

Performance goals/expectation not clear

Performance of any product/feature
should managed in a better way

13
The Define Phase: Goal

Ship Zero Performance Bugs



Find twice # of performance bug

Performance goals should be clear

Improve test’s ability to find 95% of
the performance bugs before Beta1
14
Define Perf User Scenarios
z Understand the User (Voice of the Customer)
z First time visitor / Guest user / authenticated user
z Returning users with cache/no cache
z User on the same site/session
z User from other MS sites/domains (w/ passport)
z User Demographics (Geo, Home/Office, machine config,
consumer/info worker/social, connection speed)
z Typical user transactions, Back/Forward/Refresh usage
z Perf goal
z Regional competitor performance
z What is acceptable performance
z Do not use “it should be fast”, try “JP broadband users
should be able to get the page in 4 seconds (75th percentile), when
they visit for the first time (PLT1), 2 seconds for PLT2”

15
The Define Phase: Voice of Customer
z Performance at-least as good as last release

z Comparable to the competitors

z UI Responsiveness

z Notification in case of delays

z If it appears slow – it is slow, irrespective of


what the data says

16
The Define Phase: Critical To Quality (CTQ)

Examples:

z For common scenarios, Improve performance by


atleast 10% (compared to last release)

z Any action taking more than 3 seconds should have


a progress bar/notification

z Similar/predictable performance in any connection


mode (speed, latency)

z Handle failures/major issues gracefully

17
International Internet Routes

Notes
•Minimum bandwidth to be seen on this map is 14 Gigs
•Does not report bandwidth within a country
Latency and the impact to page load
times based on number of round trips

RT = Round Trips
Red line is 34 RT

Green line is 21 RT
Dotted Green is 10
RT

Blue line is 3 RT

Source data for timings is 75th percentile for country in question from: http://msncore/performance/netsmart/Netstats.asp
Microsoft Confidential 19
Network Round trip delays…

Microsoft Confidential 20
Performance/Load/Stress Testing
z Performance testing
z User-Scenarios testing (typical case & best case)
z Establish Baseline & perform trend analysis
z Detect performance issues
z Tools: WebRunner, WANSim, HttpWatch, etc.

z Load testing (volume/longevity/endurance)


z Expected MAX # of concurrent users
z Volume of data
z Very Long active sessions

z Stress testing (negative testing)


z What happens with the load exceeds significantly OR
system goes thru resource constraints/failures
z Does the system gracefully recovers from failure?
(e.g. Tiwan Earthquake in Dec 2006)
Six Sigma - Measure
z Measurement Process – Data Collection Plan
z Testing, Monitoring, Measurement system analysis, sampling
z Clear, concise definition of variables, process involved
z Identify Key measures/drivers of performance
z Y = F(X1,X2,X3,X4,…)
z Ishikawa (Fishbone) diagram – Cause & Effect diagram
z Internal / External Benchmarking
z Baseline Current performance & Impact
z Identify and measure current performance and its impact on
customer, collect more data if required
On 300kbps connection & 300ms Round-Trip-Time, it takes 6 seconds to load the
page for the PLT1 case, 2.5 seconds for PLT2. 20% of our user abandon the page
before it loads.
Web Page X have 2 HTML files, 3 .js files, 3.css, 5 images, the web App opens 2
parallel tcp ports to download them.

22
Types of Performance Measurements
z Client UI Response Time
z Server Response Time
z Load/Stress
z Bytes over wire/Throughput
z Availability
z Latency (anywhere in the world)
z Browser Processing & Rendering time
z User Machine - Resource utilization
z Perceived Performance

If the user feels the product is slow, your product is slow –


no matter what our data says…

z PLT1: The user visiting the site with NO CACHE


z PLT2: Returning user with CACHE
23
What is Performance

z PLT1: The user visiting the site with


NO CACHE
z PLT2: Returning user with CACHE

Microsoft Confidential 24
Measure
z Business Measurements
z # of unique users
z # of page views
z Click-Thru-Rate (CTR)
z Revenue
z Market Share
z Net Promoter
z Other Measurements
z % of people on PLT1 / PLT2
z Errors
z Abandon rate (Incomplete/Closed/Click-Away)

25
Measure: Key variables for Performance
z Factors attributing to Web Page Performance
z # of Files, Static/Dynamic content
z Page Load Time
z Bytes download
z DNS Lookup time
z Peak hours / Load & Stress
z User spread / Global?
z Data Centers / CDN / Redirects
z Multiple versions of the app
z Web Page Architecture (parallel/sequential download)
z Compression
z Expiry dates
z Keep-Alive
z PLT1 / PLT2 (Caching?)
26
Performance – Cause-Effect Diagram
Application / Web Page End-User

# of Files
Machine config
JS/CSS processing

# of TCP ports Bandwidth


Caching/Keep-Alive

Static/Dynamic content Latency

Page Architecture

Total Bytes/ Compression


End-User
Page Load Time
DNS Setup
GeoLocation DataCenter/CDN

Load/Stress Peak Hours

# of Servers

App versioning

Server/Infrastructure

Best Practices @ http://MSNPerf 27


Six Sigma - Analyze
z Find Root Cause of the problems
z Analyze the data from Measure phase
z Identify vital few variables (x)
z Perform correlation and regression analysis
z Data stratification
z Use 5 Why techniques
z Hypothesis testing
z Sources of variation
z Use Cause-Effect diagram
z Plot data on graphs (trend, releases)
z Special Cause / Common Cause
# of files on the site is 10, some of these files can be combined, 2 files are not
compressed. Majority of the users who abandon the site are from UK (Latency)
and Dial-up users from US (slow connection). The problem started to happen from
March 01 (when release 2.2 went live)

28
PLT1/PLT2 User Distribution

Analysis
•From the data it looks like 17% of the users are in PLT1 and 83% in PLT2
•There are several users in a middle category (16K to 21K)

29
Impact of slow performance on user

As it takes more time to display page – users STOP the page


Typically after 5 seconds of wait, 15% user stops the page from loading and the % grows with the
time Microsoft Confidential 30
User count & Performance across states

31
Bytes & Page Load Time distribution

32
Hourly Users & PLT distribution

33
Six Sigma - Improve
z Improvement strategy & Plan
z Improvement solution selection
z Generate ideas – Involve diverse team
z Identify and rank solution alternatives
z Pilot solutions and select final solution
z FMEA
z Design of experiments
z Test and implement final solution
z Communication plan
z ToBe Process Map
z Track improvement, monitor trend
z Share success/failure stories Æ Best Practices 34
Six Sigma – Control (Sustain improvements)

z Process Monitoring/Control Plan


z Mistake Proofing (Poka-Yoke)
z Control Chart
z Response Plan
z Document standard process/procedure
z Train resources
z Share Learning
z Project closure
z Positive Handoff
z Measure Benefits (Financial / Customer Sat)
z Celebrate
35
Sample Control Plan
z We will get user feedback on Beta1 build and evaluate if we
need any more improvements

z We will continue to have meeting (every other week) to


discuss any perf test related issues

z Person ABC will do the measurements and monitor the app


performance until Beta1 and analyze/report user feedback

z A monthly status mail will be sent to (10th of each month) the


Project team, Product Manager and VP, reporting the stats
related to the project (Perf – for various category of users,
improvement over time, survey/feedback from users, etc.)

z If we find that we are deviating from our goal, we will call a


meeting (within 2 days) and analyze problem and develop
solution
36
Lessons Learned / Suggestions

Lessons Learned:
z Six Sigma Process Measurements helped us uncover
problems in a structured way and come up with
solutions to eliminate them or minimize its impact.
z Implementing improvements requires involvement
from everyone at all levels and all disciplines

Suggestions:
z By getting all disciplines (dev/pm/test) involved we
can focus on preventing the problems rather than
relying on finding and fixing them.
z Integrate new practices into the development cycle
once the improvements have been validated
37
Results
Note: Specific % & # are purposely removed from this presentation

z Product benefits:
z Page size drastically reduced
z Site performance improved

z User satisfaction improved

z Increased click-thru-rate

z Overall impact:
z Increased focus on Performance
z More investment in performance
(expanding the performance group)
38
Visual Round Trip Analyzer

39
Visual Round Trip Analyzer

40
Visual Round-Trip Analyzer (VRTA)
Client Port – Browser to
Server Connection
Servers

Bandwidth
Utilization

File Transfer Duration

Color coded
By File Type

Time in Seconds

http://msncore/performance/netsmart/VRTA_animation/sample1/main.html
Use more parallel TCP ports

Bad Good

42
Unblock Java Script

Standard JS downloads in Serial and


creates bandwidth bottlenecks

Use a binding methodology to get


around this issue

43
Unblock JS
Solution I
This method has been
successfully used by the
Windows Live Hotmail Team
==================
function AsyncLoad()
{
var l = arguments.length;
for (var i=0;i<l;i++)
{
document.write("<script src='" + arguments[i] + "'></" +
"script>");
}
}
AsyncLoad(
"file1.js",
"file2.js",
"file3.js");
=====================

From: WR-Client VRTA 44


Expiration dates
Relative Content
Time URI Len Status Code
0.00http://groups.msn.com/people/ 27659 200 -- OK
0.70http://c.msn.com/c.gif 42 200 -- OK
4.53http://groups.msn.com/global/css.htm 0 304 -- Not Modified
4.88http://groups.msn.com/spacer.gif
304 = Bad! 0 304 -- Not Modified
4.89http://groups.msn.com/home_icons_chat_48x40.gif 0 304 -- Not Modified
4.90http://www.match.com/msnprofile/profile.aspx 2481 200 -- OK
5.22http://groups.msn.com/home_icons_IM_48x40.gif 0 304 -- Not Modified
5.55http://groups.msn.com/msnmess_themes_65x60.gif 0 304 -- Not Modified
5.57http://groups.msn.com/home_icons_heart_42x39.gif 0 304 -- Not Modified
5.81http://www.match.com/lib.msnprofiles.master.style.css 383 200 -- OK
5.81http://www.match.com/lib.msnprofiles.style1.css 214 200 -- OK
5.89http://groups.msn.com/home_icons_MD_48x40.gif 0 304 -- Not Modified
5.91http://groups.msn.com/home_icons_groups_48x40.gif 0 304 -- Not Modified
6.20http://www.match.com/libraries/lib.template.globaljs.js 0 304 -- Not Modified
6.22http://groups.msn.com/msn_phone3n_48x40.gif 0 304 -- Not Modified
6.24http://groups.msn.com/whitepages/msnwhitepages.htm 0 304 -- Not Modified
6.57http://groups.msn.com/home_icons_msnbutterfly_48x40_c.gif 0 304 -- Not Modified
6.61http://xml.eshop.msn.com/xmlbuddy/eShopOffer.aspx 1227 200 -- OK
6.86http://images.match.com/match/matchscene/articles/spotlight1617.jpg 8708 200 -- OK
7.35http://groups.msn.com/match_com_header_blue_matte.gif 0 304 -- Not Modified
7.35http://groups.msn.com/spacer.gif 0 304 -- Not Modified
7.36http://view.atdmt.com/MSN/iview/msnnkhac001300x250xWBCK4000109msn/direct;wi.300;hi.250/01 320 200 -- OK
7.43http://rad.msn.com/ADSAdClient31.dll 489 200 -- OK
7.69http://groups.msn.com/1195B_goBtn.gif 0 304 -- Not Modified
7.71http://groups.msn.com/gfol_180x150_survey_express_jan03_2.gif 0 304 -- Not Modified
8.91http://att.atdmt.com/b/MSMSNMATCVON/Harmonics_WorstNightmare_2499_300x250.gif 14852 200 -- OK
10.58 56375

Max-Age: Supersedes Expiration

45
Use Compression

Types of Compression: Static, Dynamic


Levels 0 through 10
26KB/5.5 = 5KB; 20KB BW savings

46
Keep-Alive TCP ports

47
Questions?

Mukesh Jain
Mukesh.Jain@microsoft.com

48

You might also like