You are on page 1of 59

Lean Six Sigma Green Belt Training

ANALYSE PHASE

TCS Internal

DMAIC Roadmap
Define
Define

Measure
Measure

Identify
Project CTQs

Establish
Performance
Standard

Develop
Project Charter

Understand
As-Is
process

Prepare High
Level Process
map, SIPOC

Assess
Measurement
System
Variation

Complete
Stakeholder
analysis

Estimate
Current
Capability

Analyze
Analyze
Identify variation
using Graphical
analysis

Prioritize &
Validate
causes

Improve
Improve
Define y= f (x)

Identify
Solutions

Prioritize
And
Implement
Solutions

Measure
improvements

Control
Control
Optimize &
refine solutions

Control X's &


Monitor Y's

Measure
actual benefits

Close &
Hand-over project

Identify
Potential
Causes

Sampling &
Data Collection

2
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Why Analyze ?
To understand the problem and identify
root causes
To avoid actions based on intuition,
preconceived ideas & symptoms
To develop sustainable process
improvements for long term benefits
Recalibrate project scope
Establish performance goals for the
process
Find the Xs that affect Y most
3
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Identify The Vital Few


Process

Input
Measures
(Xs)

Outputs
(Ys)

Process Measures ( Xs)

Variation in Output Y depends on


process as well as input variables (Xs)
4
Copyright 2013 Tata Consultancy Services limited

22 November 2013

The Funnel Effect

Y=

f(x1,x2,x3,x4,,xn)
30+ variables

A
N
A
L
Y
S
E

15-20
variables
10-15
variables
5-10
variables
3-5
variables

Root-cause identification is the task of elimination


5
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Analyze Phase FLOW :


Identify
Variation using
Graphical Tools

Box Plot

Validate
Causes

Hypothesis testing

Scatter Plot
Pareto Analysis

6
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Why Graphical Analysis


Graphs help us understand the nature of variation
Graphs make nature of data more accessible to the human
mind
Graphs help display the context of the data
Graphs should be the primary presentation tool in data
analysis
If you cant show it graphically, you probably dont have a
good conclusion

Source: Donald Wheeler: Understanding Variation


7
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Box Plot
Purpose:
To begin an
understanding of the
distribution of the data
To get a quick, graphical
comparison of two or
more processes

Outlier
any point outside the lower
or upper limit
Maximum Observation
that falls within the upper
limit
= Q3 + 1.5 (Q3 - Q1)

When:
First stages of data
analysis

75th Percentile
(Q3)
Median (50th
Percentile)

25th Percentile
(Q1)
Minimum Observation
that falls within the lower
limit
= Q1 - 1.5 (Q3 - Q1)

8
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Box Plot
Things to look for in a Box Plot:
Are the boxes about equal or different?
Do the groups appear normal (symmetrical box
halves and whiskers) or skewed?
Are there outliers?
Boxplots of Op1 Cycl and Op2 Cycl
(means are indicated by solid circles)
20

10

0
Op1 Cycl

Op2 Cycl

9
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Box Plot Example


Minitab Command: Graph > Box plot
Graph > Histogram

10
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Box Plot - Example


Histogram of TAT- Agent 1

Boxplot of TAT- Agent 1, TAT- Agent 2

Histogram of TAT- Agent 2

70

10

8
8

60

50

F r equency

F r equency

Data

6
5
4

40
3
2

30

20

TAT- Agent 1

TAT- Agent 2

30

40

50
60
T A T - A gent 1

70

20

30

40
50
T A T - A gent 2

60

Can you now interpret Box Plots?

11
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Scatter Plot
Scatter Plot tool can be used when
Both X and Y are in continuous format
If we want to associate Y with a single X
To judge the strength of relationship between Y and X

Statistical significance of that strength is


denoted by,
Coefficient of Correlation r

12
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Correlation
r is always between 1 & +1.
Positive value of r means direction of movement in both
variables is same
Negative value of r means direction of movement in both
variables is inverse
Zero value of r means no correlation between the two variables
Higher the absolute value of r, stronger the correlation between
Y & X

13
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Types of Correlations
r=0.9

n=30

r=0.6

n=30

r=-0.6

y-effect

n=30

x-cause

Positive Correlation May Be Present

Negative Correlation May Be Present

Positive Correlation

r=-0.9

n=30

r=0.0

n=30

r=0.0

y-effect

n=30

x-cause

No Correlation

No Linear Correlation

Negative Correlation

Correlation measures the linear association between the


Output (Y) and one input variable (X) only

14
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Scatter Plot & Correlation - Example


Minitab Command: Stat > Basic Statistics > Correlation
Variables: On-boarding Test score & Floor Performance Score

15
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Scatter Plot & Correlation - Example


Minitab Command: Graph>Scatter plot
Y variables: Floor performance Score, X variables: On-boarding Test scores

16
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Scatter Plot & Correlation - Example


Minitab output:
Correlations: On-boarding Test Score, Floor Performance Score
Pearson correlation ( r ) of On-boarding Test Score and Floor Performance Score = 0.786
Scatterplot of Floor Performance Score vs On-boarding Test Score
100

Floor Performance Score

95
90
85
80
75
70
50

55

60
65
On-boarding Test Score

70

75

r value is indicating reasonably strong Positive Correlation.


17
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Scatter Plot Vs Correlation Analysis

Scatter Plot Suggests relationship between two variables but


does not quantifies

Correlation Analysis Quantifies strength or degree of


relationship in terms of Correlation of Coefficient r

18
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Pareto
What is it ?

Why use it ?

The Pareto Principle states


that only a "vital few" factors
are responsible for producing
most of the problems. This
principle can be applied to
quality improvement to the
extent that a great majority of
problems (80%) are produced
by a few key causes (20%). If
we correct these few key
causes, we will have a greater
probability of success.

For the team to quickly


focus its efforts on the key
causes of a problem.
When to use it ?
Data is Discrete,
i.e., Classified into types
with frequencies for each
type.

19
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Pareto - Example
Minitab Command: Stat > Quality Tools> Pareto Chart
Chart defects table: Query Type for Labels in & Total received for Frequencies
in

20
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Pareto - Example
Pareto Chart of No. of Queries rec'd
20000

Total

15000

80
60

10000

40

5000
Sub type

20

r
GE UT E ION IO N TED T ED AIL ERY PIN US the
N
O
M
T
T
U
P A
&
O
A
M
LA
LA
A
C H DIS RM ELL RE RE ION T Q RD NY
N
A
O
S
T
FO ANC A UD RDS C A ME T C A N
ES
N
I
R
Y
I
S
F
C FR CA
D
PA FA
RI
ER RD
AD
A
M
CL
CA
TO
S
CU

Total
Percent
Cum %

Percent

100

4116 3431 2709 2685 1749 800 506 269 247 234 864
23.4 19.5 15.4 15.2 9.9 4.5 2.9 1.5 1.4 1.3 4.9
23.4 42.9 58.2 73.5 83.4 88.0 90.8 92.4 93.8 95.1 100.0

Which factors do you consider as vital from the above Pareto


chart?

21
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Analyze Phase FLOW :


Identify
Variation using
GraphicalTools

Box Plot

Validate
Causes

Hypothesis testing

Scatter Plot
Pareto Analysis

22
Copyright 2013 Tata Consultancy Services limited

22 November 2013

What is Hypothesis Testing


Measurements are organized into statistics to provide insight into
spread, shape, consistency and location of the process
A hypothesis test is simply comparing reality to an assumption and
asking Did things change ?
A hypothesis test is testing whether real data fits the model
A hypothesis test is comparing statistic to a hypothesis

23
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Core of Hypothesis Testing


Statistical Inference Relies On Sampling From A Population Of Data
Sampling saves costs and time.
Sampling provides a good alternative to collecting all the data.
Identifying a specific confidence level allows us to make reasonable
business decisions.

Sampling From a Population


Parameters:

= mean of the population


= standard deviation of
the population
x = mean of the sample
s = standard deviation of
the sample

Entire
Population
of Data

Sample

Analysis
Statistical
Inference

Statistics:
x, s, etc.

24
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Common terms in Hypothesis Testing

The Null Hypothesis (H0)


There is no evidence of difference.
It is assumed to be true unless proven otherwise.
You never prove it, you only fail to reject it.
The Alternative Hypothesis (Ha)
The statement that we would like to show is true.
It usually defines the direction of desirable change. The alternative hypothesis
can be : >, <, or .
You gather data to show that this difference really exists.
Statistical Significance:
If the probability value (P-value) is as small or smaller than a set level , we say
that the data are statistically significant at the level of significance.
Level of Significance ():
The probability of rejecting the null hypothesis when the null hypothesis is really
true. The level of significance is always set before the hypothesis test is done. It
() is most often set to be 0.05.
Decision Rule:
We reject the null hypothesis in favor of the alternative hypothesis if our P-value
< (less than)
If P is low, H0 must go!

25
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Hypothesis Testing P Value

Alpha is the maximum acceptable probability of making type I


error. (In other words, USL for type I error).
The p-value is the probability that you will be wrong if you select
the alternative hypothesis. This is a Type I error.
For most decisions, acceptance level of a Type I error is set at =
0.05.
Thus, any p-value less than 0.05 means we reject the null
hypothesis.

P < : Reject Ho
P > : Accept Ho

26
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Hypothesis Testing Road Map


Hypothesis
testing

Continuous
data

Discrete
data

Comparing
Proportions

1Proportio
n test

2Proportio
n test
Chisquare
test

Determining
statistical differences
within and between
populations

Comparing
Means

two
samples

2sample
t-test

one
sample

1sample
t-test

Comparing
Variances

multiple
samples

two
sample

ANOVA

Test of equal
variances

27
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Process Scenarios for Hypothesis tests


Tool

Process Scenarios

1 Sample t-test

To compare a teams performance against target


Data set containing performance scores like Daily/ Weekly scores
Sample size can be less than 30 as well but higher is better.

2 Sample t-test

To compare one teams performance against other or


To compare performance of a team before and after improvement.
Data set containing performance scores like Daily/ Weekly scores
Sample size can be less than 30 as well but higher is better.

ANOVA

To compare performance of multiple teams on a metric like Quality score.


Data set containing performance scores like Daily/ weekly scores of multiple teams.

Test of equal
variances

To compare variance or Std deviation of one teams performance with another.


Data set containing performance scores like Daily/ Weekly scores

1-Proportion
test

To compare proportion defects/ defectives of a team against a target

2-Proportion
test

To compare proportion defects/ defectives of a team against another team.

Chi-square test

To check association between variables like whether there is any association between two teams w.r.t.
their Error types.

28
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Hypothesis Testing t-Test Procedure

t Test is mainly used to calculate differences in means.


Theoretically t test can be used for even small sample sizes (as
small as 10) when data is normally distributed.
Null hypothesis is averages of two groups are same.

Ho
Ha

: 1 = 2
: 1 >< 2

29
Copyright 2013 Tata Consultancy Services limited

22 November 2013

One Sample T-test


One sample T- test is used to compare the performance of a process
with the set standard/ historical data/ target.
e.g. The historical average CSI of a process is 4.35. Process Manager
is interested in understanding the present CSI based on the data
collected in last 15 days.

30
Copyright 2013 Tata Consultancy Services limited

22 November 2013

One-sample T-test
Example:
Organization ABC is measuring the no. of days to get money from XYZ
after invoices are sent. Historical data suggests that earlier payments
were received within 25 days, however some improvement actions were
implemented. Process wanted to check whether improvement plans have
any impact on the performance.
The sample data was collected. The time taken for receiving the
payments are : 22, 23, 22, 25, 28, 27, 28, 25, 23, 21 days.
Establish whether we get money in 25 days with 95 % Confidence.
Instructions
Stat > Basic stat > 1 sample t
Enter data as: Variable C1 Days
Test Mean: 25 , Alternative Not Equal

31
Copyright 2013 Tata Consultancy Services limited

22 November 2013

One-sample T-test using Minitab


Stat > Basic Stat > 1 Sample t.
Since P is >0.05,
_________
Null Hypothesis

Minitab Output
T-Test of the Mean
Test of mu = 25.000 vs mu not = 25.000
Variable
Days

N
10

Mean
24.400

StDev SE Mean
2.591

0.819

-0.73

P
0.48

Interpretation: Since p > 0.05, the improvement plan


did not make any difference in the process
performance.
32
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Hypothesis Testing 2 Sample T Test


2 Sample T test is used for comparing the averages of 2 sets of readings
Test is used when the dependent variable (response or Y) is continuous and
the independent variable (factor or X) is discrete.
Test can be performed on data from independent samples stacked in a
single column with a second discrete variable in another column.
Variances may be equal or unequal.
The null hypothesis is that the sample means are not different.

H0: m1 = m2
Ha: m1 > < m2

33
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Hypothesis Testing 2 Sample T Test


Example :

The time required for installing a software by new and


experienced engineers is given below. Establish whether
experienced engineers are better.
Experienced 15.80,14.19, 15.32, 14.65, 12.25, 15.42,
12.92, 13.98, 16.28,14.53
New 16.10, 17.24, 17.65, 16.8, 18.42, 18.12, 15.24,
16.14, 15.26, 14.65

34
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Hypothesis Testing 2 Sample T Test


Stat > Basic Statistics > 2 Sample t..

35
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Hypothesis Testing 2 Sample T Test


Two-Sample T-Test and CI: Experienced,New
Two-sample T for Experienced vs New
N Mean StDev SE Mean
Experienced 10 14.53 1.26
0.40
New
10 16.56 1.29
0.41

Alternate Hypothesis:
Ha: m1 = m2
Interpretation:
Since p < 0.05, the experienced
engineers are different from the new
engineers

Difference = mu (Experienced) - mu (New)


Estimate for difference: -2.028
95% CI for difference: (-3.234, -0.822)
T-Test of difference = 0 (vs not =): T-Value = -3.55 P-Value = 0.002 DF = 17
Two-Sample T-Test and CI: Experienced, New
Two-sample T for Experienced vs New
N Mean StDev SE Mean
Experienced 10 14.53 1.26
0.40
New
10 16.56 1.29
0.41

Alternate Hypothesis:
Ha : m1 < m2
Interpretation:
Since p < 0.05, the experienced
engineers are taking less time than
the new engineers

Difference = mu (Experienced) - mu (New)


Estimate for difference: -2.028
95% upper bound for difference: -1.034
T-Test of difference = 0 (vs <): T-Value = -3.55 P-Value = 0.001 DF = 17

36
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Analysis Of Variance (ANOVA)


One-way ANOVA is used to compare several sample means for two
or more levels of a single factor (groups of data). In this sense, it
is an extension of a two-sample t-test.
Comparing all groups at once with ANOVA is preferable to
comparing two groups at a time with the two-sample t-test (pooled
variance).

Hypothesis:
H0: m1 = m2 = m3 =
versus
Ha: there is at least one difference

37
Copyright 2013 Tata Consultancy Services limited

22 November 2013

ANOVA Assumption
The purpose of one-way ANOVA is to compare means. The means
of different groups of data can only be compared if the variances
within each group are statistically the same.
ANOVA has two assumptions:
Data for each group should be normal
The data sets have equal variances.

H0: s12 = s22 = s32 =


versus
Ha: there is at least one difference

Test of ANOVA is robust enough to give good result


even if the assumptions are not met.

38
Copyright 2013 Tata Consultancy Services limited

22 November 2013

ANOVA: Example
A contact centre used to receive call
for different processes within
organization. The Contact Centre
head wanted to understand whether
the response time is affected by
different processes.
Response time data was collected
for the 3 processes for doing ANOVA
analysis.

Process A

Process B

Process C

3.5

6.5

3.5

5.5

4.5

5.5

5.5

7.5

6.5

4.5

6.5

5.5

5.5

4.4

6.5

5.4

6.5
39

Copyright 2013 Tata Consultancy Services limited

22 November 2013

ANOVA: Assumptions Testing


Stat > Basic Statistics > Graphical Summary
Summary for Process C
A nderson-D arling N orm ality T est

Summary for Process B


A nderson-Darling N ormality Test

4.0

4.5

5.0

5.5

6.0

6.5

A -S quared
P -V alue

0.46
0.212

M ean
S tD ev
V ariance
S kew ness
Kurtosis
N

5.9615
0.8282
0.6859
-1.02716
1.44419
13

M inimum
1st Q uartile
M edian
3rd Q uartile
M aximum

7.0

4.0000
5.5000
6.0000
6.5000
7.0000

5.5

6.0

6.5

7.0

7.5

5.5000
9 5 % C onfidence Inter vals

0.5939

M inim um
1st Q uartile
M edian
3rd Q uartile
M axim um

5.5000
6.0000
6.0000
7.0000
7.5000
6.7240

6.0000

7.0000

95% C onfidence Interv al for S tD ev

9 5 % C o nfide nce I nte r v als

6.5000
1.3671

6.3462
0.6253
0.3910
0.387879
-0.844201
13

5.9683

6.4620

Mean

M ean
S tD ev
V ariance
S kew ness
K urtosis
N

95% C onfidence Interv al for M edian

95% C onfidence Interv al for M edian


95% C onfidence Interv al for S tD ev

0.59
0.101

95% C onfidence Interv al for M ean

95% C onfidence Interv al for Mean


5.4611

A -S quared
P -V alue

0.4484

1.0322

Mean
Median
6.0

6.2

6.4

6.6

6.8

7.0

Median
5.50

5.75

6.00

6.25

6.50

Summary for Process B


A nderson-D arling N ormality Test

All three process response


time data pass the normality
test.
4.0

4.5

5.0

5.5

6.0

6.5

A -S quared
P -V alue

0.46
0.212

M ean
S tD ev
V ariance
S kew ness
Kurtosis
N

5.9615
0.8282
0.6859
-1.02716
1.44419
13

M inimum
1st Q uartile
M edian
3rd Q uartile
M aximum

7.0

4.0000
5.5000
6.0000
6.5000
7.0000

95% C onfidence Interv al for M ean

Even if the data is not normal, one can go


ahead with test of equal variances.

5.4611

6.4620

95% C onfidence Interv al for M edian


5.5000

6.5000

95% C onfidence Interv al for S tD ev

9 5 % C onfide nce Inter v als

0.5939

1.3671

Mean
Median
5.50

5.75

6.00

6.25

6.50

40
Copyright 2013 Tata Consultancy Services limited

22 November 2013

ANOVA: Assumptions Testing


Stat > ANOVA > Test for Equal
Variances
Assumption Testing:
Variances testing requires
stacked data.

41
Copyright 2013 Tata Consultancy Services limited

22 November 2013

ANOVA: Example
Test for Equal Variances: Stacked versus
Process
Test for Equal Variances for Stacked

95% Bonferroni confidence intervals for


standard deviations
N
Lower StDev Upper
13 0.717000 1.07094 2.00301
13 0.554475 0.82819 1.54898
13 0.418654 0.62532 1.16955

Bartlett's Test (Normal Distribution)


Test statistic = 3.25, p-value = 0.197
Levene's Test (Any Continuous Distribution)
Test statistic = 1.84, p-value = 0.173

Process A

3.25
0.197

Levene's Test
Test Statistic
P-Value

Process

Process
Process A
Process B
Process C

Bartlett's Test
Test Statistic
P-Value

1.84
0.173

Process B

Process C

0.50 0.75 1.00 1.25 1.50 1.75 2.00


95% Bonferroni Confidence Intervals for StDevs

Since p value > 0.05 through Bartletts test, data passes the
test of equal variances assumption.

42
Copyright 2013 Tata Consultancy Services limited

22 November 2013

ANOVA: Example
Stat > ANOVA > One way (Unstacked)..

43
Copyright 2013 Tata Consultancy Services limited

22 November 2013

ANOVA: Example
One-way ANOVA: Process A, Process B, Process C
Source DF
SS
MS
F
P
Factor 2 12.043 6.022 8.12 0.001
Error 36 26.686 0.741
Total 38 38.729
S = 0.8610 R-Sq = 31.10% R-Sq(adj) = 27.27%
Individual 95% CIs For Mean Based on
Pooled StDev
Level
N Mean StDev ----+---------+---------+---------+----Process A 13 5.0231 1.0709 (-------*-------)
Process B 13 5.9615 0.8282
(-------*-------)
Process C 13 6.3462 0.6253
(-------*-------)
----+---------+---------+---------+---4.80
5.40
6.00
6.60
Pooled StDev = 0.8610

Interpretation:
Since p < 0.05, the difference in the response time is significant and the process
can be called a significant factor.

44
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Proportion Testing
Proportion Testing is used to understand whether the proportion
created by the factor level is significant.
It can be of 2 types:

One Proportion Test:

Ho : PA
Ha : PA

>=<

Two Proportion Test:

P0

Ho : PA

P0

Ha : PA

>=<

PB
PB

45
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Proportion Testing Example


1 Proportion Test:
A HR Services complaints resolution process is meant for
resolving the complaints raised by associates. The data
provided in the table suggests the % of complaints
resolved by the process within 8 Hrs of timeline. Process
manager claims that the process is resolving at least 30%
of the complaints on more than 80% of the occasions. Is
it possible to use 1 P test for validating the claim of
process manager ?

46
Copyright 2013 Tata Consultancy Services limited

22 November 2013

1 Proportion Test: Example


Day
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

Issues
Complaints
Resolved %
25
35
30
36
32
33
34
36
28
30
29
32
31
28
35
25
35
30
36
32
33
34
36
28
30
29
32
31
28
35

Data Suggests:
Total no. of trials: 30
No. of events of complaints resolved >= 30% : 22

One Proportion Test:

Ho : PA

0.8

Ha : PA

>

0.8

47
Copyright 2013 Tata Consultancy Services limited

22 November 2013

1 Proportion Test: Example


Stat > Basic Statistics > 1 Proportion

48
Copyright 2013 Tata Consultancy Services limited

22 November 2013

1 Proportion Test: Example

Minitab Project Report


Test and CI for One Proportion
Test of p = 0.8 vs p > 0.8

Sample
1

95% Lower Exact


X N Sample p
Bound
P-Value
22 30 0.733333 0.570066 0.871

Interpretation:
Since p > 0.05 through 1 P test, it is not advisable to say that
the team is resolving at least 30% of complaints per day more
than 80% of the times. Process managers claim of providing
resolution on more than 80% of the occasions is not valid.

49
Copyright 2013 Tata Consultancy Services limited

22 November 2013

2 Proportion Test: Example


2 Proportion Tests:

In a invoice processing process, the process manager is thinking of


giving promotion to one of the team members A and B. For this he
wants to look at the last 7days of invoices processed by them for
getting a feel of better performer. Can you use 2P test for identifying
better performer ?

Data Suggests:
Team Member A:
Total no. of invoices resolved: 60
Total no. of invoices without error: 32
Team Member B:
Total no. of invoices resolved: 65
Total no. of invoices without error: 48
50
Copyright 2013 Tata Consultancy Services limited

22 November 2013

2 Proportion Test: Example


Stat > Basic Statistics > 2 Proportion
Test and CI for Two Proportions
Sample
1
2

X N Sample p
32 60 0.533333
48 65 0.738462

Difference = p (1) - p (2)


Estimate for difference: -0.205128
95% upper bound for difference: -0.0663404
Test for difference = 0 (vs < 0): Z = -2.43 P-Value = 0.008
Fisher's exact test: P-Value = 0.014

Interpretation:
Since p < 0.05 through 2 P test, the performance of Team Member A can
be considered significantly less than performance of team member B.
Hence process manager can select member B for promotion.
51
Copyright 2013 Tata Consultancy Services limited

22 November 2013

2 Proportion Test - Exercise


On auditing two pizza outlets, 7 deliveries were late out
of 155 in first one and 22 deliveries were late out of 200
in the second one. Find with 99% of confidence if the two
proportions are different.

52
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Contingency Table

Contingency table is used when both output and input variables are
attribute in nature. It uses Chi square test for reaching to the
conclusion.

Chi Square Test:


Ho : Y is independent of X
Ha : Y is not independent of X

53
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Contingency Table : Example


During a project for looking into the recruitment possibilities, the
Personnel Department wanted to understand whether the chances
of being hired is dependent upon the age of the person. Can the
linkage between age and chances of being hired be statistically
validated ?

Hypothesis:
Ho : Hiring of a person is independent of his/ her age
Ha : Hiring of a person is not independent of his/ her
age

54
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Contingency Table : Example


Data was collected for all the candidates who were taken through the
recruitment process in last one year.

Total

Hired

Not Hired

Old

30

150

180

Young

45

230

275

Total

75

380

455

Old: > 35 Years


Young: <= 35 Years
55
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Contingency Table : Example


Stat > Table > Chi Square Test
Each cell must have a count
of >=5 for going ahead
with the test.

56
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Contingency Table: Analysis in Minitab


Chi-Square Test
Expected counts are printed below
observed counts

Total

Hired Not Hire


30
150
29.67
150.33

Total
180

45
45.33

230
229.67

275

75

380

455

Chi-Sq =

0.004 + 0.001 +
0.002 + 0.000 = 0.007
DF = 1, P-Value = 0.932

Contingency table generate numbers by


calculating observed values and expected
values. In a chi square distribution, If there is
independence, we expect the difference to be
close to 0. The further away we are, the more
likely the variables are dependent. To help us
make that decision, we only need to look at p
value.

Interpretation:
Since p > 0.05 , the hiring of a candidate is not dependent upon his/ her age.

57
Copyright 2013 Tata Consultancy Services limited

22 November 2013

Chi Square Test : Exercise

Are ladies more likely to be right handed compared to gentlemen?

Hypothesis:
Ho : There is no relationship between gender & dexterity
Ha : There is a relationship between gender & dexterity

58
Copyright 2013 Tata Consultancy Services limited

22 November 2013

58

End of Analyse Phase

59
Copyright 2013 Tata Consultancy Services limited

22 November 2013

You might also like