12 Chi-Square Distribution

CHI-SQUARE
DISTRIBUTION
Chi-Square Distribution 1
EXPERIMENT WITH MORE THAN TWO OUTPUTS
Experiment with two outputs (head and tail on coin tossing, yes or no,
etc) normal distribution can be used to justified what is these results
are clearly significance to the expected frequency.
Experiment with more than two outputs (said k-outputs), the

normal distribution can be not used, chi-square distribution
OBSERVED FREQUENCIES (total k) o1 , o2 , o3 , . ok

EXPECTED FREQUENCIES (total k) e1 , e2 , e3 , .. , ek
CHI- SQUARE FORMULA:
(oi ei ) 2
2
ei
If 2 = 0, observed and expected frequencies are ideally same
EXAMPLE:
A dice is tossed 120 times and its results show in the table below (side-1
output is 13, side-2 output is 28, etc). If this dice is honest, find the 2
value.
Side o e (o e ) (o e ) 2 (o-e) 2/e

1 13 20 -7 49 2.45
2 28 20 8 64 3.20
3 16 20 -4 16 0.80
4 10 20 - 10 100 5.00
Chi-
5 32 20 square- 12 144 7.20
value
6 21 20 1 1 0.05
total 120 120 0 18.70
The chi-square value is 18.70 and its a discrete variable which
means the 2 is fixed and non negative
In this example if the observed values are 21, 19, 20 and etc,
the 2 value can be reduced until near 2 = 0.10
In this case, for degree of freedom (6-1) = 5 (the side of dice is 6),
= 5%, the table value is 2 = 1.07, so the result is significance
and we reject Ho
y In chi-square we use the table with a

v=1 given degree of freedom. If this value
tend to be greater, the chi-square
distribution will near the normal
v=3 distribution.
v=5
v = 15
0 X2
4 8 12 16 20
Application to Genetics
It is useful in genetic application. For example, according the Mendelian

theory, in crossing two kind of peas, four type of seeds, A, B, C, and D, are
expected to occur in the ratio 9:3:3:1. In such an experiment, an
experimenter obtains 102 seeds of type A, 30 of type B, 42 of type C, and 15
of type D. Are these results consistent with the theory on the basis of the 5%
level of significance?
A total of seeds in this experiment are 189, the number of type A seeds
expected under the hypothesis of a 9:3:3:1 segregation ratio is (9/16) x
189 = 106.3. Then we can made a table as follows:
o e (o-e) (o e)2 (o e)2/e
102 106.3 - 4.3 18.49 0.17
30 35.4 - 5.4 29.16 0.82
42 35.4 6.6 43.56 1.23
15 11.8 3.2 10.24 0.87
189 188.9 0 3.09
This result 2 is 3.09 which has to be compared with 2 0.05 = 7.82,

for v = 3 at the level 5%, the result is not significant, and there is a
good indication that observation agrees with expectation.
Application to Contingency Tables
A contingency table is an arrangement in which a set of objects is classified
according to two criteria of classification, one criterion being entered in rows,
the other in columns.
Such table is referred to as j x k table, j is rows and k is columns.
Example: There are two rocks, the biotite-granite with 55 samples and
pyroxene-granite with 34 samples. All samples are contains gold and silver, as
shown bellows:
Gold Silver Total of samples

Biotite-granite 24 31 55
Pyroxene-granite 8 26 34
Total >> 32 57 89
What is the 34 pyroxene-granite samples indicates that it contains gold a

little than biotite-granite, for 5% level of significance ?
We search the expected values for each data, the ratio of the
expected value of gold in biotite-granite to the total this rocks will
conform with the ratio between all gold to the all samples
x : 55 = 32 : 89 x = 19.8 , and silver is 55 19.8 = 35.2
Then for all data:
o e (o e) (o e)2 (o e)2/e
24 19.8 4.2 17.64 0.89
31 35.2 - 4.2 17.64 0.50
8 12.2 - 4.2 17.64 1.45
26 21.8 4.2 17.64 0.81
89 89.0 0 3.65
For the contingency table 2 x 2, the degree of freedom is 1, and from the
2 table for = 5%, v =1, the value of 2 = 3.84, then the result is not
significant, this data not indicate that biotite-granite more contain of gold
than pyroxene granite.
Why the degree of freedom for 2 x 2 table is equal 1, we see this
general form below
Total
a b A
c d NA
Total B N-B N
For the observed frequencies a, b, c, d, we have a relation as follows:

a+b=A , c+d=NA, a+c=B , b+d=N-B
There are three independent relationship, where the fourth equation is

not independent since its find from the other three equations.
Consequently, the degree of freedom is v = (4 3) = 1
In general, for j x k table, the degree of freedom v = (j-1)(k-1)
EXAMPLE
Assume the oil and non-oil export for the year of 1998 (in million USD),
expressed in every quarterly (three months) , as follows
1-Quart 2-Quart 3-Quart 4-Quart Total

Oil 29 19 12 18 78
Non-Oil 13 17 20 20 70
Total 42 36 32 38 148
What conclusion can be drawn from this data, for the level of
significance 5%
This is the contingency table of 2 x 4 and we will examine that
these two criteria of classification are independent. We calculate
the estimated values:
X : 78 = 42 : 148 x = 22.1
Y : 78 = 36 : 148 y = 19.0
Z : 78 = 32 : 148 z = 16.9 ,. etc
This is the contingency table (2 x 4) then the degree of freedom
is v = (1). (3) = 3
o e (o e) (o e)2 (o-e)2/e
25 22.1 6.9 47.61 2.15
19 19.0 0.0 0.00 0.00
12 16.9 - 4.9 24.01 1.42
18 20.0 - 2.0 4.00 0.20
13 19.9 - 6.9 47.61 2.39
17 17.0 0.0 0.00 0.00
20 15.1 4.9 24.01 1.59
20 18.0 2.0 4.00 0.22
7.97
From the chi-square distribution table for v = 3 and = 5%, the value
2 = 7.82. Consequently the result is significant.
This data can be tested in the form of semester data
(grouped on six months)
1st semester 2nd semester

Oil 48 30
Non-oil 30 40
The result will become 2 = 5.18 and its greater than the value
of table which is 2 = 3.84 , so it indicate more significant.
Application in testing of normality
One of important useful of chi-square distribution is to testing the

normality of a set of data which these data are normally
distributed or in other form. The data are grouped in several
classes and we made its frequency distribution, find the mean and
the standard deviation.
We test the data with hypothesis that its distribution is normal.
Data are grouped in 5 classes and we have to search their
expected values.
Since in the testing of normality, the total data, mean, and
standard deviation are fixed, then the degree of freedom of this
distribution will v = (h 3), where h is the number of classes.
EXAMPLE:
We will test the rain data in region A, during 90 days with level of
significance 5%
18.6 13.8 10.4 15.0 16.0 22.1 16.2 36.1 11.6 7.8
22.6 17.9 25.3 32.8 16.6 13.6 8.5 23.7 14.2 22.9
17.7 26.3 9.2 24.9 17.9 26.5 26.6 16.5 18.1 24.8
16.6 32.3 14.0 11.6 20.0 33.8 15.8 15.2 24.0 16.4
24.1 23.2 17.3 10.5 15.0 20.2 20.2 17.3 16.6 16.9
22.0 23.9 24.0 12.2 21.8 12.2 22.0 9.6 8.0 20.4
17.2 18.3 13.0 10.6 17.2 8.9 16.8 14.2 15.7 8.0
17.7 16.1 17.8 11.6 10.4 13.6 8.4 12.6 8.1 11.6
21.1 20.5 19.8 24.8 9.7 25.1 31.8 24.9 20.0 17.6
We arrange these data as the next table, with the mean of
population m = 18.3 and the standard deviation = 6.28
We calculate the expected frequencies for each class :
10.5 18.13 13.5 18.13

z1 1.21 and z2 0.74
6.28 6.28
From the normal distribution table

can be calculated the area between
z1 and z2, for example for the
second class is A = 0.3869
0.2704 = 0.1165, then the
expected frequency for this class is
e2 = (0.1665).90 = 10.5
z1 z2
Class o e (o e) (o e)2 (o-e)2/e
boundaries
7.5 10.5 12 10.2 1.8 3.24 0.32
10.5 13.5 10 10.5 - 0.5 0.25 0.02
13.5 - 16.5 15 15.1 - 0.1 0.01 0.0
16.5 19.5 19 17.1 1.9 3.61 0.21
19.5 22.5 12 15.4 - 3.4 11.56 0.75
22.5 25.5 14 10.9 3.1 9.61 0.88
25.5 37.5 8 10.9 - 2.9 8.41 0.77

90 90.1 - 0.1 2.95
Since the result is 2 = 2.95 and from the table 2 = 9.49 (for v = (7-
3) = 4 and = 5%) we accept hypothesis the data normally
distributed.

12 Chi-Square Distribution

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

12 Chi-Square Distribution

Uploaded by

Copyright:

Available Formats

CHI-SQUARE

Experiment with more than two outputs (said k-outputs), the

OBSERVED FREQUENCIES (total k) o1 , o2 , o3 , . ok

If 2 = 0, observed and expected frequencies are ideally same

Side o e (o e ) (o e ) 2 (o-e) 2/e

y In chi-square we use the table with a

It is useful in genetic application. For example, according the Mendelian

This result 2 is 3.09 which has to be compared with 2 0.05 = 7.82,

Gold Silver Total of samples

What is the 34 pyroxene-granite samples indicates that it contains gold a

For the observed frequencies a, b, c, d, we have a relation as follows:

There are three independent relationship, where the fourth equation is

1-Quart 2-Quart 3-Quart 4-Quart Total

1st semester 2nd semester

One of important useful of chi-square distribution is to testing the

10.5 18.13 13.5 18.13

From the normal distribution table

7.5 10.5 12 10.2 1.8 3.24 0.32

10.5 13.5 10 10.5 - 0.5 0.25 0.02

13.5 - 16.5 15 15.1 - 0.1 0.01 0.0

16.5 19.5 19 17.1 1.9 3.61 0.21

19.5 22.5 12 15.4 - 3.4 11.56 0.75

22.5 25.5 14 10.9 3.1 9.61 0.88

25.5 37.5 8 10.9 - 2.9 8.41 0.77

You might also like