You are on page 1of 117

FF2613

Inferential Statistics, T Test,


ANOVA & Proportionate Test
Assoc. Prof . Dr Azmi Mohd Tamil
Dept of Community Health
Universiti Kebangsaan Malaysia
drtamil@gmail.com 2012

FF2613

Inferential Statistics
Basic Hypothesis Testing

drtamil@gmail.com 2012

Inferential Statistic
4 When

we conduct a study, we want to


make an inference from the data
collected. For example;
drug A is better than drug B in treating
disease D"

drtamil@gmail.com 2012

Drug A Better Than Drug B?


4 Drug

A has a higher rate of cure than


drug B. (Cured/Not Cured)
4 If for controlling BP, the mean of BP
drop for drug A is larger than drug B.
(continuous data mm Hg)

drtamil@gmail.com 2012

Null Hypothesis
4 Null

Hyphotesis;

no difference of effectiveness between


drug A and drug B in treating disease D"

drtamil@gmail.com 2012

Null Hypothesis
4 H0

is assumed TRUE unless data indicate


otherwise:
The experiment is trying to reject the null
hypothesis
Can reject, but cannot prove, a hypothesis
e.g. all swans are white
One black swan suffices to reject
H0 Not all swans are white
No number of white swans can prove the hypothesis
since the next swan could still be black.

drtamil@gmail.com 2012

Can reindeer fly?


4
4
4
4
4

You believe reindeer can fly


Null hypothesis: reindeer cannot fly
Experimental design: to throw reindeer off the
roof
Implementation: they all go splat on the ground
Evaluation: null hypothesis not rejected
This does not prove reindeer cannot fly: what you have
shown is that
from this roof, on this day, under these weather conditions,
these particular reindeer either could not, or chose not to,
fly

It is possible, in principle, to reject the null


hypothesis
By exhibiting a flying reindeer!
drtamil@gmail.com 2012

Significance
4 Inferential

statistics determine whether a significant


difference of effectiveness exist between drug A
and drug B.
4 If there is a significant difference (p<0.05), then the
null hypothesis would be rejected.
4 Otherwise, if no significant difference (p>0.05), then
the null hypothesis would not be rejected.
4 The usual level of significance utilised to reject or
not reject the null hypothesis are either 0.05 or 0.01.
In the above example, it was set at 0.05.

drtamil@gmail.com 2012

Confidence interval
4 Confidence

interval = 1 - level of
significance.
4 If the level of significance is 0.05, then
the confidence interval is 95%.
4CI

= 1 0.05 = 0.95 = 95%


4If CI = 99%, then level of
significance is 0.01.
drtamil@gmail.com 2012

What is level of
significance? Chance?

Reject H0

Reject H0

.025

.025

-1.96
1.96
-2.0639
0 2.0639

drtamil@gmail.com 2012

Fishers Use of p-Values


4
4

R.A. Fisher referred to the probability to declare


significance as p-value.
It is a common practice to judge a result significant, if
it is of such magnitude that it would be produced by
chance not more frequently than once in 20 trials.
1/20=0.05. If p-value less than 0.05, then the
probability of the effect detected were due to chance
is less than 5%.
We would be 95% confident that the effect detected is
due to real effect, not due to chance.

drtamil@gmail.com 2012

Error
4 Although

we have determined the level


of significance and confidence interval,
there is still a chance of error.
4 There are 2 types;
Type I Error
Type II Error

drtamil@gmail.com 2012

Error
REALITY
Treatments are
not different

Treatments are
different

Conclude
treatments are
not different

Correct Decision

Type II error
error

(Cell a)

(Cell b)

Conclude
treatments are
different

Type I error
error

Correct Decision

DECISION

(Cell c)

(Cell d)
drtamil@gmail.com
2012

Error
Test of
Significance
Null Hypothesis
Not Rejected
Null Hypothesis
Rejected

Correct Null Hypothesis


(Ho not rejected)

Incorrect Null
Hypothesis
(Ho rejected)

Correct Conclusion

Type II Error

Type I Error

Correct Conclusion

drtamil@gmail.com 2012

Type I Error
Type I Error rejecting the null hypothesis
although the null hypothesis is correct
e.g.
when we compare the mean/proportion of
the 2 groups, the difference is small but the
difference is found to be significant.
Therefore the null hypothesis is rejected.
It may occur due to inappropriate choice of
alpha (level of significance).
drtamil@gmail.com 2012

Type II Error
Type II Error not rejecting the null
hypothesis although the null hypothesis is
wrong
e.g. when we compare the mean/proportion
of the 2 groups, the difference is big but the
difference is not significant. Therefore the
null hypothesis is not rejected.

It may occur when the sample size is


too small.
drtamil@gmail.com 2012

Example of Type II Error


Data of a clinical trial on 30 patients on comparison of pain control between
two modes of treatment.
Type of treatment * Pain (2 hrs post-op) Crosstabulation

Type of treatment

Pethidine

Cocktail

Total

Count
% within Type
of treatment
Count
% within Type
of treatment
Count
% within Type
of treatment

Pain (2 hrs post-op)


No pain
In pain
8
7

Total
15

53.3%

46.7%

100.0%

11

15

26.7%

73.3%

100.0%

12

18

30

40.0%

60.0%

100.0%

Chi-square =2.222, p=0.136


p = 0.136. p bigger than 0.05. No significant difference and the null hypothesis was not
rejected.

There was a large difference between the rates but were not
significant. Type II Error?
drtamil@gmail.com 2012

Not significant since power of


the study is less than 80%.

Power is only
32%!

drtamil@gmail.com 2012

Check for the errors


4 You

can check for type II errors of your


own data analysis by checking for the
power of the respective analysis
4 This can easily be done by utilising
software such as Power & Sample Size
(PS2) from the website of the Vanderbilt
University

drtamil@gmail.com 2012

Determining the
appropriate statistical test

drtamil@gmail.com 2012

Data Analysis
4 Descriptive

summarising data
4 Test of Association
4 Multivariate controlling for confounders

drtamil@gmail.com 2012

Test of Association
4 To

study the relationship between one


or more risk variable(s) (independent)
with outcome variable (dependent)
4 For example; does ethnicity affects the
suicidal/para-suicidal tendencies of
psychiatric patients.

drtamil@gmail.com 2012

Problem Flow Chart


Independent Variables
Ethnicity

Marital Status

Suicidal Tendencies
Dependent Variable

drtamil@gmail.com 2012

Multivariat
4 Studies

the association between


multiple causative factors/variables
(independent variables) with the
outcome (dependent).
4 For example; risk factors such as
parental care, practise of religion,
education level of parents & disciplinary
problems of their child (outcome).
drtamil@gmail.com 2012

Hypothesis Testing
4 Distinguish

parametric & non-parametric

procedures
4 Test two or more populations using
parametric & non-parametric procedures
Means
Medians
Variances

drtamil@gmail.com 2012

Hypothesis Testing
Procedures

drtamil@gmail.com 2012

Parametric Test
Procedures
4 Involve

population parameters

Example: Population mean


4 Require

interval scale or ratio scale

Whole numbers or fractions


Example: Height in inches: 72, 60.5, 54.7
4 Have

stringent assumptions

Example: Normal distribution


4 Examples:

Z test, t test
drtamil@gmail.com 2012

Nonparametric Test
Procedures
4 Statistic

does not depend on population


distribution
4 Data may be nominally or ordinally
scaled
Example: Male-female
4 May

involve population parameters such


as median
4 Example: Wilcoxon rank sum test
drtamil@gmail.com 2012

Parametric Analysis

Quantitative
Qualitative
Dichotomus
Qualitative
Polinomial
Quantitative

Quantitative continous

Quantitative

Normally distributed data

Student's t Test

Quantitative

Normally distributed data

ANOVA

Repeated measurement of the Paired t Test


same individual & item (e.g.
Hb level before & after
treatment). Normally
distributed data
Quantitative - Normally distributed data
Pearson Correlation
continous
& Linear
Regresssion

Quantitative

drtamil@gmail.com 2012

non-parametric tests
Variable 1
Qualitative
Dichotomus

Variable 2
Qualitative
Dichotomus

Qualitative
Dichotomus

Quantitative Data not normally distributed Wilcoxon Rank Sum


Test or U MannWhitney Test
Quantitative Data not normally distributed Kruskal-Wallis One
Way ANOVA Test
Quantitative Repeated measurement of the Wilcoxon Rank Sign
same individual & item
Test
Quantitative - Data not normally distributed Spearman/Kendall
Rank Correlation
continous

Qualitative
Polinomial
Quantitative
Quantitative continous

Criteria
Type of Test
Sample size < 20 or (< 40 but Fisher Test
with at least one expected
value < 5)

drtamil@gmail.com 2012

Statistical Tests - Qualitative

Variable 1
Qualitative

Variable 2
Qualitative

Qualitative
Dichotomus
Qualitative
Dichotomus

Qualitative
Dichotomus
Qualitative
Dichotomus

Qualitative
Dichotomus

Quantitative
Qualitative
Dichotomus

Qualitative

Quantitative

Criteria
Sample size > 20 dan no
expected value < 5
Sample size > 30

Type of Test
Chi Square Test (X2)
Proportionate Test

Sample size > 40 but with at X2 Test with Yates


least one expected value < 5 Correction
Normally
distributed
Student's
Sample size
< 20 or data
(< 40 but Fisher
Testt Test
with at least one expected
value < 5)
Data not normally distributed Wilcoxon Rank Sum

drtamil@gmail.com 2012

Data Analysis
4Using

SPSS;
http://161.142.92.104/spss/
4Using Excel;
http://161.142.92.104/excel/

drtamil@gmail.com 2012

FF2613

T Test, ANOVA &


Proportionate Test
Assoc. Prof . Dr Azmi Mohd Tamil
Dept of Community Health
Universiti Kebangsaan Malaysia
drtamil@gmail.com 2012

T -

Test

Independent T-Test
Students T-Test
Paired T-Test
ANOVA

d rta m il@ g m a il.c o m

2012

Students T-test
William Sealy Gosset @
Student, 1908. The Probable
Error of Mean. Biometrika.

drtamil@gmail.com 2012

Students T-Test
4 To

compare the means of two independent


groups. For example; comparing the mean
Hb between cases and controls. 2 variables
are involved here, one quantitative (i.e. Hb)
and the other a dichotomous qualitative
variable (i.e. case/control).

t=
drtamil@gmail.com 2012

Examples: Students ttest


4 Comparing

the level of blood cholestrol


(mg/dL) between the hypertensive and
normotensive.
4 Comparing the HAMD score of two
groups of psychiatric patients treated
with two different types of drugs (i.e.
Fluoxetine & Sertraline

drtamil@gmail.com 2012

Example
Group Statistics
DHAMAWK6

DRUG
F
S

N
35
32

Mean
4.2571
3.8125

Std. Deviation
3.12808
4.39529

Independent Samples Test

t
DHAMAWK6

Equal variances
assumed

.48

t-test for Equality of Means


Sig.
Mean
(2-tailed)
Difference
df
65

.633

.4446

drtamil@gmail.com 2012

Assumptions of T test
4 Observations

are normally distributed in


each population. (Explore)
4 The population variances are equal.
( L

t)

The 2 groups are independent of each


other. (Design of study)

drtamil@gmail.com 2012

Manual Calculation
4 Sample

t=

size > 30

X1 X 2
2
1

2
2

s
s
+
n1 n2

4 Small

sample size,
equal variance
X1 X 2
t=
1 1
s0
+
n1 n2
2
2
(
n

1)
s
+
(
n

1)
s
1
2
2
s02 = 1
(n1 1) + (n2 1)
drtamil@gmail.com 2012

Example compare
cholesterol level
4 Hypertensive

:
214.92
39.22

4 Normal

Mean :
Mean : 182.19
s.d. :
s.d. :
37.26
n : 64
n : 36
Comparing the cholesterol level between
hypertensive and normal patients.
The difference is (214.92 182.19) = 32.73 mg%.
H0 : There is no difference of cholesterol level
between hypertensive and normal patients.
n > 30, (64+36=100), therefore use the first formula.
drtamil@gmail.com 2012

Calculation

t=

X1 X 2
2
1

2
2

s
s
+
n1 n2
4t

= (214.92- 182.19)________
((39.222/64)+(37.262/36))0.5
4 t = 4.137
4 df = n1+n2-2 = 64+36-2 = 98
4 Refer to t table; with t = 4.137, p < 0.001
drtamil@gmail.com 2012

If df>100, can refer Table A1.


We dont have 4.137 so we
use 3.99 instead. If t = 3.99,
then p=0.00003x2=0.00006
Therefore if t=4.137,
p<0.00006.

Or can refer to Table A3.


We dont have df=98,
so we use df=60 instead.
t = 4.137 > 3.46 (p=0.001)
Therefore if t=4.137, p<0.001.

Conclusion
Therefore p < 0.05, null hypothesis rejected.
There is a significant difference of
cholesterol level between hypertensive and
normal patients.
Hypertensive patients have a significantly
higher cholesterol level compared to
normotensive patients.
drtamil@gmail.com 2012

Exercise (try it)


Comparing the mini test 1 (2012) results between
UKM and ACMS students.
The difference is 11.255
H0 : There is no difference of marks between UKM
and ACMS students.
n > 30, therefore use the first formula.

drtamil@gmail.com 2012

Exercise (answer)

4 Null

hypothesis rejected
4 There is a difference of marks between
UKM and ACMS students. UKM marks
higher than AUCMS

drtamil@gmail.com 2012

T-Test In SPSS
4

For this exercise, we will


be using the data from
the CD, under Chapter
7, sga-bab7.sav
This data came from a
case-control study on
factors affecting SGA in
Kelantan.
Open the data & select >Analyse
>Compare Means
>Ind-Samp T
Test
drtamil@gmail.com 2012

T-Test in SPSS
4

We want to see whether


there is any association
between the mothers weight
and SGA. So select the risk
factor (weight2) into Test
Variable & the outcome
(SGA) into Grouping
Variable.
Now click on the Define
Groups button. Enter
0 (Control) for Group 1 and
1 (Case) for Group 2.

Click the Continue button &


then click the OK button.
drtamil@gmail.com 2012

T-Test Results
Group Statistics

Weight at first ANC

SGA
Normal
SGA

4 Compare

N
108
109

Mean
58.666
51.037

Std. Deviation
11.2302
9.3574

Std. Error
Mean
1.0806
.8963

the mean+sd of both groups.

Normal 58.7+11.2 kg
SGA
51.0+ 9.4 kg
4 Apparently

there is a difference of
weight between the two groups.
drtamil@gmail.com 2012

Results & Homogeneity of


Variances
Independent Samples Test
Levene's Test for
Equality of Variances

F
Weight at first ANC Equal variances
assumed
Equal variances
not assumed

4
4

1.862

Sig.
.174

t-test for Equality of Means

df

Sig. (2-tailed)

Mean
Difference

Std. Error
Difference

95% Confidence
Interval of the
Difference
Lower
Upper

5.439

215

.000

7.629

1.4028

4.8641

10.3940

5.434

207.543

.000

7.629

1.4039

4.8612

10.3969

Look at the p value of Levenes Test. If p is not


significant then equal variances is assumed (use top
row).
If it is significant then equal variances is not assumed
(use bottom row).
So the t value here is 5.439 and p < 0.0005. The
difference is significant. Therefore there is an
association between the mothers weight and SGA.
drtamil@gmail.com 2012

How to present the


result?
Group

Normal

SGA

Mean

test

T test
t = 5.439

<0.0005

108 58.7+11.2 kg

109

51.0+ 9.4
drtamil@gmail.com 2012

Paired t-test
Repeated measurement on the
same individual

drtamil@gmail.com 2012

Paired T-Test
4 Repeated

measurement on the same

individual
4

t=

drtamil@gmail.com 2012

Formula
d 0
t=
sd
n

sd =

2
i

d)
(

n 1

df = n p 1
drtamil@gmail.com 2012

Examples of paired t-test


4 Comparing

the HAMD score between


week 0 and week 6 of treatment with
Sertraline for a group of psychiatric
patients.
4 Comparing the haemoglobin level
amongst anaemic pregnant women after
6 weeks of treatment with haematinics.

drtamil@gmail.com 2012

Example
Paired Samples Statistics
Pair
1

DHAMAWK0
DHAMAWK6

Mean
13.9688
3.8125

N
32
32

Std. Deviation
6.48315
4.39529

Paired Samples Test

Paired Differences
Std.
Mean
Deviation
Pair
1

DHAMAWK0 DHAMAWK6

10.1563

6.75903

df

Sig.
(2-tailed)

8.500

31

.000

drtamil@gmail.com 2012

l c

l a

t i o

The measurement of the systolic and diastolic


blood pressures was done two consecutive
times with an interval of 10 minutes. You want
to

d e te r m

in e

h e th e r

th e r e

a s

a n y

difference between those two measurements.


4 H0:There is no difference of the systolic blood
pressure during the first (time 0) and second
measurement (time 10 minutes).

drtamil@gmail.com 2012

Calculation
4 Calculate

the difference between first &


second measurement and square it.
Total up the difference and the square.

drtamil@gmail.com 2012

Calculation
4

d = 112
d2 = 1842
4 Mean d = 112/36 = 3.11
4 sd = ((1842-1122/36)/35)0.5
sd = 6.53
4 t = 3.11/(6.53/6)
t = 2.858
4 df = np 1 = 36 1 = 35.
4 Refer to t table;

n = 36
t=

d 0
sd
n

sd =

2
d
i

( d )

n 1

df = n p 1
drtamil@gmail.com 2012

Refer to Table A3.


We dont have df=35,
so we use df=30 instead.
t = 2.858, larger than 2.75
(p=0.01) but smaller than 3.03
(p=0.005).
3.03>t>2.75
Therefore if t=2.858,
0.005<p<0.01.

Conclusion
with t = 2.858, 0.005<p<0.01
Therefore p < 0.01.
Therefore p < 0.05, null hypothesis
rejected.
Conclusion: There is a significant
difference of the systolic blood pressure
between the first and second
measurement. The mean average of first
reading is significantly higher compared
to the second reading.
drtamil@gmail.com 2012

Paired T-Test In SPSS


4

For this exercise, we will


be using the data from
the CD, under Chapter
7, sgapair.sav
This data came from a
controlled trial on
haematinic effect on Hb.
Open the data & select >Analyse
>Compare Means
>Paired-Samples T
Test
drtamil@gmail.com 2012

Paired T-Test In SPSS


4

We want to see whether


there is any association
between the prescription
on haematinic to
anaemic pregnant
mothers and Hb.
We are comparing the
Hb before & after
treatment. So pair the
two measurements (Hb2
& Hb3) together.
Click the OK button.
drtamil@gmail.com 2012

Paired T-Test Results


Paired Samples Statistics

Pair
1

HB2
HB3

Mean
10.247
10.594

N
70
70

Std. Deviation
.3566
.9706

Std. Error
Mean
.0426
.1160

4 This

shows the mean & standard


deviation of the two groups.

drtamil@gmail.com 2012

Paired T-Test Results


Paired Samples Test
Paired Differences

Pair 1

HB2 - HB3

Mean
-.347

Std. Deviation
.9623

Std. Error
Mean
.1150

95% Confidence
Interval of the
Difference
Lower
Upper
-.577
-.118

t
-3.018

df
69

Sig. (2-tailed)
.004

4 This

shows the mean difference of Hb


before & after treatment is only 0.347
g%.
4 Yet the t=3.018 & p=0.004 show the
difference is statistically significant.
drtamil@gmail.com 2012

How to present the


result?
Group
Before
treatment
(HB2) vs
After
treatment
(HB3)

70

Mean D
(Diff.)

Test

0.35 + 0.96

Paired Ttest
t = 3.018

0.004

drtamil@gmail.com 2012

ANOVA

drtamil@gmail.com 2012

ANOVA
Analysis of Variance
4 Extension

of independent-samples t test

4 Compares

the means of groups of


independent observations
Dont be fooled by the name. ANOVA does
not compare variances.

4 Can

compare more than two groups


drtamil@gmail.com 2012

One-Way ANOVA
F-Test
4 Tests

the equality of 2 or more population means


4 Variables
One nominal scaled independent variable
2 or more treatment levels or classifications
(i.e. Race; Malay, Chinese, Indian & Others)

One interval or ratio scaled dependent variable


(i.e. weight, height, age)
4 Used

to analyse completely randomized


experimental designs

drtamil@gmail.com 2012

Examples
4 Comparing

the blood cholesterol levels


between the bus drivers, bus conductors
and taxi drivers.
4 Comparing the mean systolic pressure
between Malays, Chinese, Indian &
Others.

drtamil@gmail.com 2012

One-Way ANOVA
F-Test Assumptions
4 Randomness

& independence of errors

Independent random samples are drawn


4 Normality

Populations are normally distributed


4 Homogeneity

of variance

Populations have equal variances

drtamil@gmail.com 2012

Example
Descriptives
Birth weight
N
Housewife
Office work
Field work
Total

151
23
44
218

Mean
2.7801
2.7643
2.8430
2.7911

Std. Deviation
.52623
.60319
.55001
.53754

Minimum
1.90
1.60
1.90
1.60

Maximum
4.72
3.96
3.79
4.72

ANOVA
Birth weight

Between Groups
Within Groups
Total

Sum of
Squares
.153
62.550
62.703

df
2
215
217

Mean Square
.077
.291

F
.263

Sig.
.769

drtamil@gmail.com 2012

Manual Calculation
ANOVA

drtamil@gmail.com 2012

Manual Calculation
4 Not

expected to be calculated manually


by medical students.

drtamil@gmail.com 2012

Example:
Time To Complete
Analysis
45 samples were
analysed using 3 different
blood analyser (Mach1,
Mach2 & Mach3).
15 samples were placed
into each analyser.
Time in seconds was
measured for each
sample analysis.

Example:
Time To Complete
Analysis
The overall mean of the
entire sample was 22.71
seconds.
This is called the grand
mean, and is often
denoted by X .
If H0 were true then wed
expect the group means
to be close to the grand
mean.

Example:
Time To Complete
Analysis
The ANOVA test is
based on the combined
distances from X .
If the combined
distances are large, that
indicates we should
reject H0.

The Anova Statistic


To combine the differences from the grand mean we
Square the differences
Multiply by the numbers of observations in the groups
Sum over the groups

SSB = 15 X Mach1 X + 15 X Mach 2 X + 15 X Mach3 X


where the X * are the group means.

SSB = Sum of Squares Between groups

The Anova Statistic


To combine the differences from the grand mean we
Square the differences
Multiply by the numbers of observations in the groups
Sum over the groups

SSB = 15 X Mach1 X + 15 X Mach 2 X + 15 X Mach3 X


where the X * are the group means.

SSB = Sum of Squares Between groups


Note: This looks a bit like a variance.

Sum of Squares Between

SSB = 15 X Mach1 X + 15 X Mach 2 X + 15 X Mach3 X

4 Grand

Mean = 22.71
4 Mean Mach1 = 24.93; (24.93-22.71)2=4.9284
4 Mean Mach2 = 22.61; (22.61-22.71)2=0.01
4 Mean Mach3 = 20.59; (20.59-22.71)2=4.4944
4 SSB = (15*4.9284)+(15*0.01)+(15*4.4944)
4 SSB = 141.492
drtamil@gmail.com 2012

How big is big?

4 For

4 Is

the Time to Complete, SSB = 141.492

that big enough to reject H0?

4 As

with the t test, we compare the statistic to


the variability of the individual observations.

4 In

ANOVA the variability is estimated by the


Mean Square Error, or MSE

MSE
Mean Square Error

The Mean Square Error


is a measure of the
variability after the
group effects have
been taken into
account.

MSE =

1
N K

(x

ij

X j)

where xij is the ith


observation in the jth
group.

MSE
Mean Square Error

The Mean Square Error


is a measure of the
variability after the
group effects have
been taken into
account.

MSE =

1
N K

(x

ij

X j)

where xij is the ith


observation in the jth
group.

MSE
Mean Square Error

The Mean Square Error


is a measure of the
variability after the
group effects have
been taken into
account.

MSE =

1
N K

(x

ij

X j)

1
2
(xij X j )
MSE =

N K j i
Mach1 (x-mean)^2 Mach2 (x-mean)^2
23.73
1.4400
21.5
1.2321
23.74
1.4161
21.6
1.0201
23.75
1.3924
21.7
0.8281
24.00
0.8649
21.7
0.8281
24.10
0.6889
21.8
0.6561
24.20
0.5329
21.9
0.5041
25.00
0.0049
22.75
0.0196
25.10
0.0289
22.75
0.0196
25.20
0.0729
22.75
0.0196
25.30
0.1369
23.3
0.4761
25.40
0.2209
23.4
0.6241
25.50
0.3249
23.4
0.6241
26.30
1.8769
23.5
0.7921
26.31
1.9044
23.5
0.7921
26.32
1.9321
23.6
0.9801
SUM
12.8380
9.4160

Mach3
19.74
19.75
19.76
19.9
20
20.1
20.3
20.4
20.5
20.5
20.6
20.7
22.1
22.2
22.3

(x-mean)^2
0.7225
0.7056
0.6889
0.4761
0.3481
0.2401
0.0841
0.0361
0.0081
0.0081
0.0001
0.0121
2.2801
2.5921
2.9241
11.1262
drtamil@gmail.com 2012

1
2
(xij X j )
MSE =

N K j i
4 Note

that the variation of the means


(141.492) seems quite large (more likely
to be significant???) compared to the
variance of observations within groups
(12.8380+9.4160+11.1262=33.3802).
4 MSE = 33.3802/(45-3) = 0.7948

drtamil@gmail.com 2012

Notes on MSE
4 If

there are only two groups, the MSE is equal


to the pooled estimate of variance used in the
equal-variance t test.
4 ANOVA assumes that all the group variances
are equal.
4 Other options should be considered if group
variances differ by a factor of 2 or more.
4 (12.8380

~ 9.4160 ~ 11.1262)

ANOVA F Test
4 The

ANOVA F test is based on the F statistic

SSB (K 1)
F=
MSE
where K is the number of groups.
4 Under

H0 the F statistic has an F distribution,


with K-1 and N-K degrees of freedom (N is the
total number of observations)

Time to Analyse:
F test p-value
To get a p-value we
compare our F statistic
to an F(2, 42)
distribution.

Time to Analyse:
F test p-value
To get a p-value we
compare our F statistic
to an F(2, 42)
distribution.
In our example

141.492 2
F=
= 89.015
33.3802 42
We cannot draw the line
since the F value is so
large, therefore the p
value is so small!!!!!!

Refer to F Dist. Table (=0.01).


We dont have df=2;42,
so we use df=2;40 instead.
F = 89.015, larger than 5.18
(p=0.01)
Therefore if F=89.015, p<0.01.

Why use df=2;42?


We have 3 groups
so K-1 = 2
We have 45
samples therefore
N-K = 42.

drtamil@gmail.com 2012

Time to Analyse:
F test p-value
To get a p-value we
compare our F statistic
to an F(2, 42)
distribution.
In our example

141.492 2
F=
= 89.015
33.3802 42

The p-value is really

P(F (2,42) > 89.015) = 0.00000000000008

ANOVA Table
Results are often displayed using an ANOVA Table
Sum of
Squares

df

Mean
Square

141.492

40.746

Within Groups 33.380

42

.795

Total

44

Between
Groups

174.872

Sig.

89.015 .0000000

ANOVA Table
Results are often displayed using an ANOVA Table
Sum of
Squares

df

Mean
Square

141.492

40.746

Within Groups 33.380

42

.795

Total

44

Between
Groups

174.872

Sig.

89.015 .0000000

Pop Quiz!: Where are the following quantities presented in this table?
Sum of Squares
Between (SSB)

Mean Square
Error (MSE)

F Statistic

p value

ANOVA Table
Results are often displayed using an ANOVA Table
Sum of
Squares

df

Mean
Square

141.492

40.746

Within Groups 33.380

42

.795

Total

44

Between
Groups

174.872

Sum of Squares
Between (SSB)

Mean Square
Error (MSE)

Sig.

89.015 .0000000

F Statistic

p value

ANOVA Table
Results are often displayed using an ANOVA Table
Sum of
Squares

df

Mean
Square

141.492

40.746

Within Groups 33.380

42

.795

Total

44

Between
Groups

174.872

Sum of Squares
Between (SSB)

Mean Square
Error (MSE)

Sig.

89.015 .0000000

F Statistic

p value

ANOVA Table
Results are often displayed using an ANOVA Table
Sum of
Squares

df

Mean
Square

141.492

40.746

Within Groups 33.380

42

.795

Total

44

Between
Groups

174.872

Sum of Squares
Between (SSB)

Mean Square
Error (MSE)

Sig.

89.015 .0000000

F Statistic

p value

ANOVA Table
Results are often displayed using an ANOVA Table
Sum of
Squares

df

Mean
Square

141.492

40.746

Within Groups 33.380

42

.795

Total

44

Between
Groups

174.872

Sum of Squares
Between (SSB)

Mean Square
Error (MSE)

Sig.

89.015 .0000000

F Statistic

p value

ANOVA In SPSS
4

For this exercise, we will


be using the data from
the CD, under Chapter
7, sga-bab7.sav
This data came from a
case-control study on
factors affecting SGA in
Kelantan.
Open the data & select >Analyse
>Compare Means
>One-Way
ANOVA
drtamil@gmail.com 2012

ANOVA in SPSS
4

4
4
4

We want to see whether


there is any association
between the babies weight
and mothers type of work.
So select the risk factor
(typework) into Factor & the
outcome (birthwgt) into
Dependent.
Now click on the Post Hoc
button. Select Bonferonni.
Click the Continue button &
then click the OK button.
Then click on the Options
button.
drtamil@gmail.com 2012

ANOVA in SPSS
4 Select

Descriptive,
Homegeneity of
variance test and
Means plot.
4 Click Continue and
then OK.

drtamil@gmail.com 2012

ANOVA Results
Descriptives
Birth weight

N
Housewife
Office work
Field work
Total

151
23
44
218

Mean
2.7801
2.7643
2.8430
2.7911

Std. Deviation
.52623
.60319
.55001
.53754

Std. Error
.04282
.12577
.08292
.03641

95% Confidence Interval for


Mean
Lower Bound Upper Bound
2.6955
2.8647
2.5035
3.0252
2.6757
3.0102
2.7193
2.8629

Minimum
1.90
1.60
1.90
1.60

Maximum
4.72
3.96
3.79
4.72

4 Compare

the mean+sd of all groups.


4 Apparently there are not much
difference of babies weight between the
groups.
drtamil@gmail.com 2012

Results & Homogeneity of


Variances
Test of Homogeneity of Variances
Birth weight
Levene
Statistic
.757

df1
2

df2
215

Sig.
.470

4 Look

at the p value of Levenes Test. If p


is not significant then equal variances is
assumed.

drtamil@gmail.com 2012

ANOVA Results
ANOVA
Birth weight

Between Groups
Within Groups
Total

Sum of
Squares
.153
62.550
62.703

df
2
215
217

Mean Square
.077
.291

F
.263

Sig.
.769

4 So

the F value here is 0.263 and p =0.769.


The difference is not significant. Therefore
there is no association between the
babies weight and mothers type of work.

drtamil@gmail.com 2012

How to present the


result?
Type of Work

Mean+sd

Office

2.76 + 0.60

Housewife

2.78 + 0.53

Farmer

2.84 + 0.55

Test

ANOVA
F = 0.263

0.769

drtamil@gmail.com 2012

Proportionate Test

drtamil@gmail.com 2012

Proportionate Test
4 Qualitative

data utilises rates, i.e. rate of


anaemia among males & females
4 To compare such rates, statistical tests
such as Z-Test and Chi-square can be
used.

drtamil@gmail.com 2012

Formula
z=

p1 p2

where p1 is the rate for


event 1 = a1/n1
p2 is the rate for event 2
= a2/n2
a1 and a2 are frequencies
of event 1 and 2

1 1
p0 q0 +
n1 n2

p1n1 + p2 n2
p0 =
n1 + n2
q0 = 1 p0

We refer to the normal


distribution table to
decide whether to reject
or not the null
hypothesis.

drtamil@gmail.com 2012

http://stattrek.com/hypothesistest/proportion.aspx

4 The

sampling method is simple random


sampling.
4 Each sample point can result in just two
possible outcomes. We call one of these
outcomes a success and the other, a failure.
4 The sample includes at least 10 successes
and 10 failures.
4 The population size is at least 10 times as
big as the sample size.
drtamil@gmail.com 2012

Example
4 Comparison

of worm infestation rate


between male and female medical
students in Year 2.
4 Rate for males ; p1= 29/96 = 0.302
4 Rate for females;p2 =24/104 = 0.231
4 H0: There is no difference of worm
infestation rate between male and
female medical students in Year 2
drtamil@gmail.com 2012

Cont.
p1

p0

p2

q0
drtamil@gmail.com 2012

Cont.

4 p0

= (29/96*96)+(24/104*104) = 0.265
96+104

4 q0

= 1 0.265 = 0.735
drtamil@gmail.com 2012

Cont.

4z

0.302 - 0.231
= 1.1367
((0.735*0.265) (1/96 + 1/104))0.5

4 From

the normal distribution table (A1), z value


is significant at p=0.05 if it is above 1.96. Since
the value is less than 1.96, then there is no
difference of rate for worm infestatation
between the male and female students.
drtamil@gmail.com 2012

Refer to Table A1.


We dont have 1.1367 so we
use 1.14 instead. If z = 1.14,
then p=0.1271x2=0.2542
Therefore if z=1.14,
p=0.2542. H0 not rejected

Exercise (try it)

4 Comparison

of failure rate between


ACMS and UKM medical students in
Year 2 for minitest 1 (MS2 2012).
4 Rate for UKM ; p1= 42/196 = 0.214
4 Rate for ACMS;p2 = 35/70 = 0.5
drtamil@gmail.com 2012

Answer

4 P1

= 0.214, p2 = 0.5, p0 = 0.289, q0 = 0.711


4 N1 = 196, n2 = 70, Z = 20.470.5 = 4.52
4 p < 0.00006
drtamil@gmail.com 2012

You might also like