You are on page 1of 7

Last Update: December 4, 2017

Part I
ANALYSIS OF VARIANCE (ANOVA) M -63
Analysis of variance (ANOVA) is an extension and generalization of students t test, but is far more
powerful and can be applied to two are more groups simultaneously. It also helps in minimizing
experimental errors, because experiments have to be designed more rigorously to fulfill its assumptions.
It tests the difference between the variances (S2) of two or more groups. It analyzes different
components of the total variance ( S t2 ) of the sample to estimate the relative magnitudes of the within groups
variance ( S w2 ) due to uncontrolled random factors, and the between groups variance ( S b2 ) which may have
been influenced by the applied independent variable. It aims at finding out whether S b2 can be explained
away by the null hypothesis that, S b2 varies no variance component other than what is present in S w2 and
does not differ significantly from the latter.
Only if the probability (p) of correctness of the null hypothesis (Ho) is found to be too low (px) the
independent variable is considered to have produced significant changes in the scores of the dependent
variable.
Classification of ANOVA :-
A method of ANOVA differs according to the number of independent variable(s) used in the
experiment.
1) A single classification or one-way-ANOVA is used to investigate the effects of a single independent
variable on the dependent variable. The number of the applied levels of the independent variable determines
the number of groups in the experiment.
Example :- A one way ANOVA may be applied to the scores of blood glucose (Dependent variable)
measured in three groups of 20 animals each, after the animals of each group have been injected with one of
the three chosen doses (levels) of insulin (Independent variable) to find if insulin changes blood sugar
significantly.
2) Highier orders of ANOVA such as two way and three way ANOVA are used in a factorial
experiment, where the simultaneous effects of more than are independent variable are being investigated.
Assumptions of Anova :-
a) Random assignment :
i) The experimental design should provide for random sampling so that each individual of the
population has an equal probability of being chosen for a group of and that the choice of
individual is independent of the choice of others.
ii) Randomization of treatment should also be ensured for different levels of the independent
variable (S). Such randomized trewtment eliminates the order effect which may result if different
levels of the independent variable are applied in the same sequence to all the individuals.
b) Nomal distribution :
The dependent variable should have a normal distribution in the population, i.e., the deviations of
individuals scores from the respective group means, are distributed normally. Non linear transformations
may change the mean, SD, Shape, Skewness, and kurtosis of the original raw score distribution to convert
some-normal or heteroscedastic distributions into normal and homoscedastic ones.
i) Logarithmic transformations Convert the raw scores into their logarithms. (Xs=log X).
ii) Square root transformation-The square root of the raw score is the transformed score here :
Xs x
iii) Reciprocal transformations Change the reciprocals from the raw scores X s 1 X .
iv) Arc sine transformation The transformed score is the angle (degree radian) whose sine is the

square root of the raw score. X s Sin1 P .
-1-
c) Homos cadasticity :- Homoscadasticity or homogeneity of variances implies that the groups in an
experiment should possess equal variances.
d) Idependent of errors :- The deviations of individual scores from the group mean, should be
independent of each other.
e) Additivity :- Different factors, including the independent variables used, produce independent bits of
variations of the dependent variable and these variations add up to give the total variation of the latter. This
additive property of variations due to different factors enables the analysis of the total variance (St2) of the
dependent variable into various components.
Computation of F ratio
The ANOVA statistic F or variance ratio is a measure of the ratio of variations of scores between the
groups to their variations within the groups in an experiment.
S 2b SS SS
F 2 ; S 2b b ; S 2 w w
S w df b df w
x1 2 x2 2 x1 x2 2
SSb
n1 n2 n1 n2 n
dfb = K-1.
x1 x2 2
SSt x1 x2
2 2

n1 n2
dft = N-1
dfw = dft dfb
SSw = SSt SSb
N = (n1 + n2 + ----)
S2b Between group variance.
S2w Within group variance.
S2t Total group variance.
dfb Degrees of freedom of S b2 .
dfw Degrees of freedom of S w2 .
dft Degrees of freedom of S t2 .
K Total no of groups.
The between groups variance ( S b2 ) is also known as the greater-mean square. The within groups
variance ( S w2 ), also called the lesser mean square.
The computed F is compared with critical Fx(dfb, dfw) from table, using the df of two variances. The
computed F is considered significant, only if it exceeds or equals the critical F score for the chosen
P .
In one-way anova between two groups only, the two group means are straightway considered to
differ significantly if the computed F is found to be significant. This need not be followed by multiple
comparison tests.
But in one way anova between more than two groups a significant F ratio simply indicates that there
are significant differences between some or all of the groups, this must be followed then by multiple
comparison tests to find which of the groups differ significantly from each other.
* Omega square (w2) should be computed if a model I anova has yielded a significant F- ratio. It estimates
thereby the stength of association between the two variables.
W2
K 1F 1
K 1F 1 N
* A significant F ratio in model-II anova indicates the presence of an added variance component S a2
between the groups due to the effect of the uncontrolled classification variable acting as an independent
variable. Where all the groups are of equal size (n).
S b2 S w2
Sa
2

n
-2-
But if the K number of groups differ in size from each other.
S b2 S w2
S a2
1 n12 n22 n32
n1 n 2 n3
K 1 n1 n2 n3

* Multiple comparison tests :-
If the F ratio is found to be significant in a one way model-I anova with more than two groups, it
should be followed by a multiple comparison test between pairs of group means to find which of the group
means differ significantly from each other. There are two types of test-
i) A priori test.
ii) A posterior test.
A priori test :- In some cases, even before commencing the experiment and collecting the data the
investigator may plan about which group means should be subjected to a test for finding the significance of
the differences between those means. Statistical test for such pre-planned comparisons of choosen group
means are called a priori test. It is of two types.
A) Multiple comparison t test :- For such test the difference between the group means of
every chosen pair, say x1 x 2 . First it convert into t score -
x1 x 2 s w2 s w2
t ; S x1 x2 ; df = N-K.
S x1 x2 n1 n 2
If the computed t exceeds or equals the critical t score (df=N-K) for the chosen level of significance (),
the relevant group means are considered to differ significantly P . This procedure is repeated for each
such chosen pair.
Bone ferroni-adjustment :- The significance level (P) is then multiplied by the total number ( K ) of
pairs of groups, being compared, to find the adjusted probability ( P ). If this P is lower than or equal to a
chosen , relevant group means are considered to differ significantly from each other at or beyond that
P .
B) Scheffes F test :- For this test F score is computed from the difference between the means
of each pair of groups, say x1 x 2
2
x1 x 2 s w2 s w2
F ; S x1 x2 ;
S x x n1 n2
1 2
Each computed F score is then compared with critical F scores for 0.05 and 0.01 levels of
significance.
K 1F.05 df b , df w
F.05
K 1F.01 df b , df w
F.01
If the computed F of any pair of group means exceeds or equals the critical F score, the difference between
those group means is significant at or beyond that P . This procedure is repeated for all other chosen
pairs.
Posteriori test :- The sum of squares simultaneous test procedure may be used as an a posteriori multiple
comparison test.
a) First the critical sum of squares (SS) is computed.
SS K 1S w2 F df b , df w .
b) Then, the group means are computed from the obtained data and scrutinized to indentify those group
means which apparently look so close as to raise doubts any significant difference between them.
For Example, Here means of group 2, 3, 5 are so-close in data, obtained in the experiment.
x2 2 x3 2 x5 2 x2 x3 x5
SSb
n2 n3 n5 n2 n3 n5
There is a significant difference between the chosen group means x2 , x3 , x5 is this case only if the
computed SSb equals or exceeds the SS P .

-3-
Example 1 :
The blood sugar levels (mg dl-1) were estimated in three groups of animals one hour after injecting
the groups 1, 2 and 3 with respectively the placebo. 100 g of an anti-diabetic substance and 150 g of the
latter. The estimated blood sugar scores are recorded in the first three columns of Table 1. In there any
significant difference between the means of groups 1 and 2, and between those of groups 2 and 3? If yes,
estimate the strength of association between blood sugar and the anit-diabetic substance.
Solution :
First, one-way model I anova is worked out to find whether or not there is any significant difference
between the three group means.
a) Using each score of the data, entered in the first three columns of Table 1, the sum of the scores of
each group is worked out the used in computing the group mean. Each score is also squared and the sum of
the squared scores of each group is computed and entered in Table 1.

X1
X 1 1267 126.7 mg ; X X 2 1185 118.5 mg ; X X 3 892 89.2 mg.
2 3
n1 10 n2 10 n3 10
b) The total sum of squares (SSt) is computed and partitioned into the between groups SSb and the
within groups SSw.
X X X 2

SSt X X X
2 2 2 1 2 3

n1 n2 n3
1 2 3

160865 140827 80086


1267 1185 892
2
9033 .47
10 10 10
df t N 1 30 1 29 .
X X X X X X
2 2 2 2


1 2 3 1 2 3
SSb
n1 n2 n3 n1 n2 n3


1267 2

1185 2

892 2

1267 1185 892
2
7773 .27.
10 10 10 10 10 10
df b k 1 3 1 2.
SS w SSt SSb 9033.47 7773.27 1260.20.
df w N k 30 3 27.
Table 1 : Table for computing anova of blood sugar data.
Blood sugar scores X 12 X 22 X 32
X1 X2 X3
128 118 88 16384 13924 7744
124 115 80 15376 13225 6400
129 122 84 16641 14884 7056
135 128 96 18225 16384 9516
132 125 86 17424 15625 7396
128 117 87 16384 13689 7569
118 110 96 13924 12100 9216
123 116 78 15129 13456 6084
117 108 98 13689 11664 9604
133 126 99 17689 15876 9801
1267 1185 892 160865 140827 80086

c) Between-groups variance ( sb2 ) and within-groups variance ( s w2 ) are computed using SSb and SSw
(Table 2). F ratio is computed using sb2 and s w2 .
SSb 7773.27 SS 1260.20 s 2 3886.64
sb2 3886 .64; s w2 w 46.67; F b2 83.28
df b 2 df w 27 sw 46.67
df of computed F : df b , df w 2, 27.
Table 2 : Anova table for blood sugar data.
Sources of variation Sums of squares Df Variances F
-4-
Between groups 7773.27 2 3886.64 83.28
Within groups 1260.20 27 46.67
Total 9033.47 29
Critical F scores (df=2,27) are quoted below from Table F of Appendix.
F.05 2, 27 3.35; F.012, 27 5.49 .
As the computed F is found to be higher than the critical F for 0.01 level, the computed F is
significant beyond the 0.01 level (P<0.01). Hence, there is a significant added treatment component between
the groups.
d) Scheffes multiple comparison F test may be done to find out whether X 1 X 2 and X 2 X 3 are
significant. As three group means are being compared, k = 3.

F
X 1 X2
2


126.7 118.52 7.20
1 1 1 1
s
2
46.67
10 10
w
n1 n2

F
X 2 X3
2


118.5 89.22 91.97
1 1 1 1
s w2 46.67
n 2 n3 10 10
Critical F.01 k 1 F.012.27 3 1 5.49 10.98.
k 1 F.012.27 3 1 3.35 6.70.
Critacal F.05
As the F ratio for X 1 X 2 exceeds the critical F.05
but not the critical F.01
, the difference between
X 1 and X 2 is significant beyond 0.05 level only P 0.05 . But as the F ratio for X 2 X 3 exceeds also
, the difference between X 2 and X 3 is significant beyond 0.01 level (P<0.01).
the critical F.01
(Alternatively, multiple comparison t test may be worked out with Bonferroni modification)
X X2 126.7 118.5
t 1 2.684
s w2 s w2 46.67 46.67

n1 n2 10 10
X2 X3 118.5 89.2
t 29.590
2 2
s s 46.67 46.67
w
w
n1 n2 10 10
df=N-k=30-3=27.
Critical t scores from Table B of Appendix :
t.02 27 2.473 ; t.0127 2.771 ; t.00127 3.690 .
Comparing the computed t scores with the critical t scores, it is found that X 1 X 2 is significant
beyond the 0.02 level (P<0.02) while X 2 X 3 is significant beyond the 0.001 level (P<0.001).
To apply of Bonferroni modification, each P obtained by the t test is multiplied by the number k of
paired comparisons to get the corrected probability P of the H 0 being correct.
k 2; P k P;
so, for X 1 X 2 , P k P 2 x 0.02 0.04 .
Similarly, for X 2 X 3 , P k P 2 x 0.001 0.002 .
Thus, X 1 X 2 is significant beyond the 0.04 level (P<0.04) while X 2 X 3 is significant beyond the
0.002 level (P<0.002).
e) Omega square is computed to estimate the strength of association between the blood sugar and the
anti-diabetic factor. Using the F ratio computed in the model I anova and the number k of gourps.
w2
k 1F 1 3 183.28 1 0.85 .
k 1F 1 N 3 183.28 1 30
-5-
So, a proportion of about 0.85 of the total variance of blood sugar is associated with the anit-diabetic
factor used as the treatment variable.
Example 2 :
The scores, obtained by three groups of students in an abstract reasoning test are given in the first
three columns of Table 3. Apply anova to find whether an added variance component is present in the
variance between the groups due to random factors (=0.01).
Solution :
The data, arranged in Table 3, are subjected to a one-way model II anova.
Table 3 : Anova for abstract reasoning scores.
X1 X2 X3 X 1 X 1 X 1 X 1 2 X 2 X 2 X 2 X 2 2 X 3 X 3 X 3 X 3 2
26 30 34 -2 4 -1 1 +2 4
27 34 35 -1 1 +3 9 +3 9
25 28 28 -3 9 -3 9 -4 16
26 29 27 -2 4 -2 4 -5 25
28 32 34 0 0 +1 1 +2 4
30 31 33 +2 4 0 0 +1 1
26 32 32 -2 4 +1 1 0 0
29 34 33 +1 1 +3 9 +1 1
31 29 29 +3 9 -2 4 -3 9
32 34 +4 16 +2 4
33 +1 1
280 279 352 52 38 74
a) Group means are computed and then used in computing the squared deviations of scores from the
respective group means (Table 3).
X 1 280 X 2 279 X 3 352
X1 28. X2 31. X3 32. The grand mean
n1 10 n2 9 n3 11
X is also computed.
X 1 X 2 X 3 280 279 352
X 30.4.
n1 n2 n3 10 9 11
b) SSb is computed using the squared deviation of group means from the grand mean.
SS b n1 X 1 X n2 X 2 X n3 X 3 X
2 2 2

1028 30.4 931 30.4 1132 30.4 89.0.


2 2 2

dfb = k-1=3-1=2.
c) SSw is computed using the squared deviations of scores from their respective group means.
SS w X 1 X 1 X 2 X 2 X 3 X 3 52 38 74 164.0.
2 2 2

dfw = N-k = 30-3 = 27.


Alternatively, SSb and SSw may be computed in the following way, omitting steps (b) and (c). For
this, the abstract reasoning scores are entered in Table 4, each score is squared and the sum of the squared
scores is worked out for each group.
Table 4 : Alternative table for computing the anova of abstract reasoning scores.
X1 X2 X3 X 12 X 22 X 32
26 30 34 676 900 1156
27 34 35 729 1156 1125
25 28 28 625 784 784
26 29 27 676 841 729
28 32 34 784 1024 1156
30 31 33 900 961 1089
26 32 32 676 1024 1024
29 34 33 841 1156 1089
31 29 29 961 841 841
32 34 1024 1156
33 1089
280 279 352 7892 8687 11338

-6-
X 1 X 2 X 3 2
SSt X X X
2 2 2

n1 n2 n3
1 2 3

7892 8687 11338


280 279 352 2 252.97.
10 9 11
dft = N 1 = 30 1 = 29.
X 1 2 X 2 2 X 3 2 X 1 X 2 X 3 2
SSb
n1 n2 n3 n1 n2 n3


280 2 279 2 352 2 280 279 352 2
88.97.
10 9 11 10 9 11
dfb = k 1 = 3 1 = 2.
SSw = SSt SSb = 252.97 88.97 = 164.0.
Dfw = N k = 30 3 = 27.
d) Variances and F ratio are then computed using SSb and SSw.
SSb 88.97 SS w 164.0 s 2b 44.49
sb
2
44.49; s w
2
6.07; sb 2
2
7.33.
df b 2 df w 27 sw 6.07
Table 5 : Anova table for abstract reasoning scores.
Sources of variation Sums of squares df Variances F
Between groups 88.97 2 44.49 7.33
Within groups 164.0 27 6.07
Total 25297 29
Critical F.012, 27 5.49 . As the computed F is higher than the critical F for the 0.01 level of

significance, it is significant (P<0.01) there is a significant added variance component sa2 between the
groups.
sb2 s w2 44.49 6.07
s a2 3.85.
1 n1 n2 n3
2 2 2
1 10 2 9 2 112
n1 n2 n3 10 9 11
k 1 n1 n2 n3 3 1 10 9 11
Proportionate variation due to s a2 is given by :
s a2 3.85
0.39.
s w s a 6.07 3.85
2 2

So, a proportion of about 0.39 of the total variance is due to the added variance component.

-7-

You might also like