You are on page 1of 17

CHI-SQUARE

DISTRIBUTION

Chi-Square Distribution 1
EXPERIMENT WITH MORE THAN TWO OUTPUTS

Experiment with two outputs (head and tail on coin tossing, yes or no,
etc) normal distribution can be used to justified what is these results
are clearly significance to the expected frequency.

Experiment with more than two outputs (said k-outputs), the


normal distribution can be not used, chi-square distribution

OBSERVED FREQUENCIES (total k) o1 , o2 , o3 , . ok


EXPECTED FREQUENCIES (total k) e1 , e2 , e3 , .. , ek
CHI- SQUARE FORMULA:

(oi ei ) 2
2
ei

If 2 = 0, observed and expected frequencies are ideally same

Chi-Square Distribution 2
EXAMPLE:
A dice is tossed 120 times and its results show in the table below (side-1
output is 13, side-2 output is 28, etc). If this dice is honest, find the 2
value.

Side o e (o e ) (o e ) 2 (o-e) 2/e


1 13 20 -7 49 2.45
2 28 20 8 64 3.20
3 16 20 -4 16 0.80
4 10 20 - 10 100 5.00
Chi-
5 32 20 square- 12 144 7.20
value
6 21 20 1 1 0.05
total 120 120 0 18.70

Chi-Square Distribution 3
The chi-square value is 18.70 and its a discrete variable which
means the 2 is fixed and non negative
In this example if the observed values are 21, 19, 20 and etc,
the 2 value can be reduced until near 2 = 0.10
In this case, for degree of freedom (6-1) = 5 (the side of dice is 6),
= 5%, the table value is 2 = 1.07, so the result is significance
and we reject Ho

y In chi-square we use the table with a


v=1 given degree of freedom. If this value
tend to be greater, the chi-square
distribution will near the normal
v=3 distribution.

v=5

v = 15

0 X2
4 8 12 16 20
Chi-Square Distribution 4
Application to Genetics

It is useful in genetic application. For example, according the Mendelian


theory, in crossing two kind of peas, four type of seeds, A, B, C, and D, are
expected to occur in the ratio 9:3:3:1. In such an experiment, an
experimenter obtains 102 seeds of type A, 30 of type B, 42 of type C, and 15
of type D. Are these results consistent with the theory on the basis of the 5%
level of significance?

A total of seeds in this experiment are 189, the number of type A seeds
expected under the hypothesis of a 9:3:3:1 segregation ratio is (9/16) x
189 = 106.3. Then we can made a table as follows:

Chi-Square Distribution 5
o e (o-e) (o e)2 (o e)2/e
102 106.3 - 4.3 18.49 0.17
30 35.4 - 5.4 29.16 0.82
42 35.4 6.6 43.56 1.23
15 11.8 3.2 10.24 0.87
189 188.9 0 3.09

This result 2 is 3.09 which has to be compared with 2 0.05 = 7.82,


for v = 3 at the level 5%, the result is not significant, and there is a
good indication that observation agrees with expectation.

Chi-Square Distribution 6
Application to Contingency Tables
A contingency table is an arrangement in which a set of objects is classified
according to two criteria of classification, one criterion being entered in rows,
the other in columns.
Such table is referred to as j x k table, j is rows and k is columns.

Example: There are two rocks, the biotite-granite with 55 samples and
pyroxene-granite with 34 samples. All samples are contains gold and silver, as
shown bellows:

Gold Silver Total of samples


Biotite-granite 24 31 55
Pyroxene-granite 8 26 34
Total >> 32 57 89

What is the 34 pyroxene-granite samples indicates that it contains gold a


little than biotite-granite, for 5% level of significance ?

Chi-Square Distribution 7
We search the expected values for each data, the ratio of the
expected value of gold in biotite-granite to the total this rocks will
conform with the ratio between all gold to the all samples
x : 55 = 32 : 89 x = 19.8 , and silver is 55 19.8 = 35.2
Then for all data:

o e (o e) (o e)2 (o e)2/e
24 19.8 4.2 17.64 0.89
31 35.2 - 4.2 17.64 0.50
8 12.2 - 4.2 17.64 1.45
26 21.8 4.2 17.64 0.81
89 89.0 0 3.65

For the contingency table 2 x 2, the degree of freedom is 1, and from the
2 table for = 5%, v =1, the value of 2 = 3.84, then the result is not
significant, this data not indicate that biotite-granite more contain of gold
than pyroxene granite.

Chi-Square Distribution 8
Why the degree of freedom for 2 x 2 table is equal 1, we see this
general form below

Total
a b A
c d NA
Total B N-B N

For the observed frequencies a, b, c, d, we have a relation as follows:


a+b=A , c+d=NA, a+c=B , b+d=N-B

There are three independent relationship, where the fourth equation is


not independent since its find from the other three equations.
Consequently, the degree of freedom is v = (4 3) = 1
In general, for j x k table, the degree of freedom v = (j-1)(k-1)

Chi-Square Distribution 9
EXAMPLE
Assume the oil and non-oil export for the year of 1998 (in million USD),
expressed in every quarterly (three months) , as follows

1-Quart 2-Quart 3-Quart 4-Quart Total


Oil 29 19 12 18 78
Non-Oil 13 17 20 20 70
Total 42 36 32 38 148

What conclusion can be drawn from this data, for the level of
significance 5%

Chi-Square Distribution 10
This is the contingency table of 2 x 4 and we will examine that
these two criteria of classification are independent. We calculate
the estimated values:

X : 78 = 42 : 148 x = 22.1
Y : 78 = 36 : 148 y = 19.0
Z : 78 = 32 : 148 z = 16.9 ,. etc

Chi-Square Distribution 11
This is the contingency table (2 x 4) then the degree of freedom
is v = (1). (3) = 3

o e (o e) (o e)2 (o-e)2/e
25 22.1 6.9 47.61 2.15
19 19.0 0.0 0.00 0.00
12 16.9 - 4.9 24.01 1.42
18 20.0 - 2.0 4.00 0.20
13 19.9 - 6.9 47.61 2.39
17 17.0 0.0 0.00 0.00
20 15.1 4.9 24.01 1.59
20 18.0 2.0 4.00 0.22
7.97

From the chi-square distribution table for v = 3 and = 5%, the value
2 = 7.82. Consequently the result is significant.

Chi-Square Distribution 12
This data can be tested in the form of semester data
(grouped on six months)

1st semester 2nd semester


Oil 48 30
Non-oil 30 40

The result will become 2 = 5.18 and its greater than the value
of table which is 2 = 3.84 , so it indicate more significant.

Chi-Square Distribution 13
Application in testing of normality

One of important useful of chi-square distribution is to testing the


normality of a set of data which these data are normally
distributed or in other form. The data are grouped in several
classes and we made its frequency distribution, find the mean and
the standard deviation.
We test the data with hypothesis that its distribution is normal.
Data are grouped in 5 classes and we have to search their
expected values.
Since in the testing of normality, the total data, mean, and
standard deviation are fixed, then the degree of freedom of this
distribution will v = (h 3), where h is the number of classes.

Chi-Square Distribution 14
EXAMPLE:
We will test the rain data in region A, during 90 days with level of
significance 5%

18.6 13.8 10.4 15.0 16.0 22.1 16.2 36.1 11.6 7.8
22.6 17.9 25.3 32.8 16.6 13.6 8.5 23.7 14.2 22.9
17.7 26.3 9.2 24.9 17.9 26.5 26.6 16.5 18.1 24.8
16.6 32.3 14.0 11.6 20.0 33.8 15.8 15.2 24.0 16.4
24.1 23.2 17.3 10.5 15.0 20.2 20.2 17.3 16.6 16.9
22.0 23.9 24.0 12.2 21.8 12.2 22.0 9.6 8.0 20.4
17.2 18.3 13.0 10.6 17.2 8.9 16.8 14.2 15.7 8.0
17.7 16.1 17.8 11.6 10.4 13.6 8.4 12.6 8.1 11.6
21.1 20.5 19.8 24.8 9.7 25.1 31.8 24.9 20.0 17.6

Chi-Square Distribution 15
We arrange these data as the next table, with the mean of
population m = 18.3 and the standard deviation = 6.28
We calculate the expected frequencies for each class :

10.5 18.13 13.5 18.13


z1 1.21 and z2 0.74
6.28 6.28

From the normal distribution table


can be calculated the area between
z1 and z2, for example for the
second class is A = 0.3869
0.2704 = 0.1165, then the
expected frequency for this class is
e2 = (0.1665).90 = 10.5
z1 z2

Chi-Square Distribution 16
Class o e (o e) (o e)2 (o-e)2/e
boundaries

7.5 10.5 12 10.2 1.8 3.24 0.32

10.5 13.5 10 10.5 - 0.5 0.25 0.02

13.5 - 16.5 15 15.1 - 0.1 0.01 0.0

16.5 19.5 19 17.1 1.9 3.61 0.21

19.5 22.5 12 15.4 - 3.4 11.56 0.75

22.5 25.5 14 10.9 3.1 9.61 0.88

25.5 37.5 8 10.9 - 2.9 8.41 0.77


90 90.1 - 0.1 2.95

Since the result is 2 = 2.95 and from the table 2 = 9.49 (for v = (7-
3) = 4 and = 5%) we accept hypothesis the data normally
distributed.

Chi-Square Distribution 17

You might also like