You are on page 1of 31

Analysis of Variance

In

practice it is often necessary to compare a


large number of independent random
selections in terms of level, we are interested in
hypothesis:

H 0 : 1 2 3 ... i ... m
H 1 : i

for at least one i (i = 1, 2,m)


for m > 2, when i , i =1, 2, m are mean
values of normally distributed populations with
equal variances 2 , t.j. N( , 2)
To verify this hypothesis is used important
statistical method called Analysis of
variance,
abbreviated ANOVA (resp. AV)
2

In

practice is AV used for examination of the


impact of one, or more factors (treatments)
on the statistical sign.
Factors are labeled A, B,in AV they will
be regarded as qualitative attributes with
different variations levels of factor
Result will be quantitative statistical sign
denoted Y
AV is frequently used in the evaluation of
biological experiments
The most simple case is AV with single
factor called One factor analysis of variance
3

Level

of the factor refer to :


certain amount of quantitative factor,
e.g. Amount of pure nutrients in manure,
different income groups of households
Certain kind of qualitative factor, e.g.
different types of the same crop, methods
of products placing in stores,
AV is a generalization of Student's t-test
for independent choices
AV also examines the impact of qualitative
factors resulting in a quantitative character
-> analyzes the relationships between
attributes
4

Scheme of single-factor
experiment Repetition

line
average

line sum

balanced attempt
A
1
2 j
n
Yi . yi .
1
y11 y12 y1j y1n Y1. y1.
2
y21 y22 y2j y2n Y2. y2.
Levels

..
of the
i
yi1 yi2 yij
yin Yi. yi.
factor

..
m
ym1 ym2 ymj ymn Ym. ym.
Total sum

Y..

y..

Overall
average

Line sum:

Yi .

Total sum:
n

y
j 1

Y ..

ij

y
i 1

Line average:

j 1

ij

1
1
yi . yij Yi .
n j 1
n
Overall average:

1
y ..
N

y
i 1

j 1

ij

, N m.n
6

Model for resulting observed value:

yij i eij

where i = 1, 2,, m
j = 1,2,, n
- expected values for all levels of the factor and observed
values

i - impact of i-th level of the factor A


eij - random error, every measurement is biased,
resp. impact of random factors

yij i eij

or

yij i eij

Then we can formulate null hypothesis:


Ho : 1 = 2 = i = m = 0
-> effects of all levels of factor A are zero, insignificant,
against the alternate hypothesis
H1: i 0 for at least one i (i = 1,2m)
effect i at least one i level of the factor is significant,
=> significantly different from zero
8

Estimates of parameters are sample characteristics: :

est y..

est i yi .

est i yi . - y.. est eij yij - yi .

yij i eij

yij i eij

What can be rewrited:

( yij - y..) (yi . - y..) (yij yi .)


9

Comparison of two experiments with


three levels of factor

y ..

y1

y3

y2

y3

y1
y2

y ..

3
10

Principle of the ANOVA


Basic principle of the analysis of variance is
decomposition of the total variability of the
investigated sign.

2
2
2
(
y

y
)

n
(
y

y
)

(
y

y
)
ij ..
i . .. ij i .
i 1 j 1

Total
variability

Sc

i 1

S1
Variability between levels
of factor, caused by the
action of factor A,
variability between
groups

i 1 j 1

Sr
Random
variability,
residual,
variability
within groups
11

Variability
Variability
between
groups
Variability
within
groups
Total
variability

2
3
4
Degrees
Mean
of
square (MS) F critical
freedom
(1/2)

1
Sum of squares
(SS)
m

n ( yi . y.. )
i 1

( y
m

m-1

s1 2

m.n - m

sr 2

S1

i 1 j 1

ij

yi . )

s1
F 2
sr

Sr

2
(
y

y
)
ij ..
i 1 j 1

Sc

N-1=
m .n-1

12

Test statistics for one factor ANOVA can be written:


m

n ( yi . y ..)
i 1

s1
F 2
sr

m 1

(y
i 1 j 1

ij

yi . )

N m

F value will be compared with appropriate table value for Fdistribution:


F , with (m-1) and (m.n - m) degrees of freedom
13

Decision about test result:


F calc F. ((m-1,(N-m)) We reject H0,
In that case is effect of at least one level of
the factor significant, thus average level of
the indicator is significantly different from
others. => At least one effect i is
significantly different from zero.

If

If
F calc F
Do not reject Ho
Acceptance regon
Ho

F Rejection region
H0
14

If null hypothesis is rejected:


We

found only that effect of the factor


on examined attribute is significant.
It is also necessary to identify levels of
the factor, which are significantly
different - for this purpose are used
tests of contrasts
Test of contrast: Duncan test, Scheffe
test, Tuckey test and others..

15

Terms of use AV:


Samples

have normal distribution,


violating of this assumption has
significant effect on the results of AV
statistical independence of random errors
eij
Identical residual variances
12 = 22 = . = 2 , t.j. D(eij) = 2
for all i = 1,2., m, j=1,2, n
this assumption is more serious and can
be verified by Cochran, resp. Bartlett test.

16

Scheme of single-factor experiment


line average
unbalanced attempt
line sum
Different number of repetitions

Levels
of the
factor

A 1
1 y11
2 y21

i yi1

m ym1
Y.. y..

2 j
y12 y1j ... n1
y22 y2j ... n2
..
yi2 yij ... ni
..
ym2 ymj ...

Where

ni
Y1.
Y2.

Yi .
y1.
y2.

Yi.

yi.

nm

Ym.

yi .

ym.

N ni
i 1

Overall
average

17

Variability
Variability
between
groups
Variability
within
groups
Total
variability

4
1
3
2
Sum of squares Degrees of Mean square F- critical
(MS)
(SS)
freedom
(1/2)
2

n ( y
i 1
m

i.

ni

i 1 j 1

m-1

s1 2

N-m

sr 2

S1

( y
m

y.. )

ij

yi . )

Sr

ni

2
(
y

y
)
ij ..
i 1 j 1

s1
F 2
sr

N-1

N ni
i 1

18

Two-factor analysis of variance with


one observation in each subclass....
TAV

Consider the effect of factor A, which


we investigate on the m - levels, i =
1,2, ...., m
Then consider the effect of factor B,
which is observed on n - levels , j =
1,2, , n
On every i-level of factor A and jlevel of factor B we have only one
observation (repetition) yij
=>We are veryfying two null
hypothesis
19

Scheme for Two-factor experiment


Row
with one observation in each
average
subclass TAV n- levels of factor B row sum
B

A 1
1 y11
2 y21
m-levels

of factor
i yi1
A

m ym1
Y.1
Column sum
y.1
Column average

2 j
y12 y1j
y22 y2j
..
yi2 yij
..
ym2 ymj
Y.2 ...
y.2 ...

n
y1n
y2n

Yi .
Y1 .
Y2.

yi .
Y1.
y2.

yin

Yi.

yi.

ymn Ym.
Y.j ...
y.j ...

ym.
Y.1 Overall
Y..
y.1 average
y..
20

We can write model for examined attribute as


follows:

yij i j eij

We are verifying the validity of two null hypothes


Hypothesis for factor A:
Ho 1: 1 = 2 = i = m = 0
t.j. All effects of factor A levels are equal to zero,
thus insignificant, against alternative hypothesis
H11 : i 0 for at least one i (i = 1,2m)
effect i of at least one i level of factor A is
significant, significantly different from zero
21

Hypothesis for factor B:


Ho 2: 1 = 2 = j = n = 0
=> All effects of factor A levels are equal to zero,
thus insignificant, against alternative hypothesis
H12 : j 0 for at least one j (j = 1,2m)
effect j of at least one j level of the factor B is
significant, significantly different from zero

doc.Ing. Zlata Sojkov,CSc.

22

4
2
3
1
TAV
Mean square F - critical
Degrees
of
Sum
of
squares
Variability
(MS)
(SS)
freedom
(1/2)
2
Variability
s1
S1
F

between
1
2
2
m-1
s
1
sr
rows

Variability
between
columns

S2

Residual
variability

Sr

Total
variability

Sc

n-1

(m-1)(n-1)

s22

s2
F2 2
sr

sr 2

m.n -1

23

Decomposition of the total variability


Sc= S1 + S2 + S r
m

S1 n ( yi . y ..)

Variability between rows,


effect of factor A

i 1

S 2 m ( y . j y ..)

j 1

Variability between columns,


Effect of factor B

S r ( yij yi . y . j y ..)
i 1 j 1

ni

S c ( yij y ..)

Residual
variability

Total variability

i 1 j 1

24

Investigating the relationships


between statistical attributes
Investigating

the relationship
between qualitative attributes,
e.g. AB , called measurement
of the association
Investigating the relationship
between quantitative attributes
- regression and correlation
analysis

25

Inestigating the association


Based

on the association, resp. pivot tables


For testing the existence ofsignificant
relationship between qualitative signs we
use
2 - test of independence
H0: two signs A and B are independent
H1: signs A and B are dependent
Attribute A has m - levels, variations
Attribute B has k - levels , variations

26

Hypotheses formulation
Dependence

of the attributes will appear in


different frequency
E.g. We examine wheter the size of the
package is affected by the size of the family
Ho : Choice of the package size does not
depend on the count of family members
H1 : Choice of package size is affected by
the size of the family
The procedure use comparing of empirical
and theoretical frequencies, (what should
be empirical frequencies, if the attributes A
and B were independent

27

Simultanous frequencies, frequencies


of the second order (aibj)
Package
size

1-2
(b1)

Size of the family


3-4
5<
(b2)
(b3)

Marginal
frequencies
(ai) resp.(bj)
Total

do 100g
(a1)

25
(a1b1)

37
(a1 b2)

70

100-150g
(a2)

10

62

53

125

250g <
(a3)

41

59
(a3b3)

105

Total

40

140

120

Total count
of the
respondent
sn

300

28

Determination of theoretical
frequencies

Based on the sentence about independence


of the random events A and B:
P(AB) = P(A) . P(B), thus signs A and B are
independent, then:
P(aibj) = P(ai) .P(bj)
Estimate based on the relative frequencies:
(aibj)o = (ai) . (bj)
(aibj)o = (ai) .(bj)
n
n
n
n
Theoretical frequencies

29

Calculation of theoretical frequencies


(a1b1)o = 70.40/300 = 9,33
Package
size

1-2
(b1)

do 100g
(a1)

25
9.33

100-150g
(a2)

Family size
3-4
(b2)

5 and <
(b3)

Total

37
32,67

8
28.00

70

10
16.67

62
58.33

53
50

125

250g <
(a3)

5
14.00

41
49

59
42

105

Total

40

140

120

Total
count of
responde
nts n

300

30

Calculation of test criteria and


decision:m k
2

i 1 j 1

((ai b j ) - (ai b j )o )
(ai b j )o

If 2 calculated 2 for significance for


degrees of freedom (m-1).(k-1)
Ho is rejected => signs A and B are dependent
In our case it means, that number of the family members
significantly affects choice of the package size. Further, we
should measure strength (power) of the dependence.
31

You might also like