You are on page 1of 34

Tutorial: Chi-Square

Distribution
Presented by: Nikki Natividad
Course: BIOL 5081 - Biostatistics

Purpose
To

measure discontinuous categorical/binned


data in which a number of subjects fall into
categories
We want to compare our observed data to
what we expect to see. Due to chance? Due
to association?
When can we use the Chi-Square Test?
Testing outcome of Mendelian Crosses, Testing
Independence Is one factor associated with
another?, Testing a population for expected
proportions

Assumptions:
1

or more categories
Independent observations
A sample size of at least 10
Random sampling
All observations must be used
For the test to be accurate, the
expected frequency should be at least
5

Conducting Chi-Square
Analysis
1)
2)
3)

4)
5)
6)

Make a hypothesis based on your basic biological


question
Determine the expected frequencies
Create a table with observed frequencies, expected
frequencies, and chi-square values using the
formula:
(O-E)2
E
Find the degrees of freedom: (c-1)(r-1)
Find the chi-square statistic in the Chi-Square
Distribution table
If chi-square statistic > your calculated chi-square
value, you do not reject your null hypothesis and vice
versa.

Example 1: Testing for


Proportions

HO: Horned lizards eat equal amounts of leaf cutter, carpenter


and black ants.
HA: Horned lizards eat more amounts of one species of ants
Carpenter
Black
Total
than the others. Leaf
Cutter
Ants
Ants
Ants
Observed

25

18

17

60

Expected

20

20

20

60

O-E

-2

-3

(O-E)2
E

1.25

0.2

0.45

2 = 1.90

Example 1: Testing for


Proportions

2
=
5.991
=0.05

Example 1: Testing for


Proportions
Leaf
Carpenter
Black

Total

Cutter
Ants

Ants

Ants

Observed

25

18

17

60

Expected

20

20

20

60

O-E

-2

-3

(O-E)2
E

1.25

0.2

0.45

2 = 1.90

Chi-square statistic: 2 = 5.991


value: 2 = 1.90

Our calculated

*If chi-square statistic > your calculated value, then you do


not reject your null hypothesis. There is a significant difference
that is not due to chance.
5.991 > 1.90 We do not reject our null hypothesis.

SAS: Example 1
Included to format
the table

Define your data


Indicate what
your want in
your output

SAS: Example 1

SAS: What does the p-value


mean?
The exactp-value for a nondirectional test is
the sum of probabilities for the table having a
test statistic greater than or equal to the value
of the observed test statistic.
High p-value: High probability that test statistic
> observed test statistic. Do not reject null
hypothesis.
Low p-value: Low probability that test statistic
> observed test statistic. Reject null
hypothesis.

SAS: Example 1

High probability that


Chi-Square statistic
> our calculated chisquare statistic.
We do not reject our
null hypothesis.

SAS: Example 1

Example 2: Testing
Association
c
H : Gender and eye colour
O

are not associated with each


other.
HAcellchi2
: Gender=and
eye colour
are each
displays
how much
associated
with each
other.
cell contributes
to the
overall chisquared value
no col = do not display totals of
column
no row = do not display totals of
rows
chi sq = display chi square statistics

Example 2: More SAS


Examples

Example 2: More SAS


Examples

(2-1)(3-1) = 1*2
=2

High probability that


Chi-Square statistic > our
calculated chi-square
statistic. (78.25%)
We do not reject our null
hypothesis.

Example 2: More SAS


Examples

If there was an
association, can
check which
interactions
describe
association by
looking at how
much each cell
contributes to the
overall Chi-square
value.

Limitations
No

categories should be less than 1


No more than 1/5 of the expected categories
should be less than 5
To correct for this, can collect larger samples or
combine your data for the smaller expected
categories until their combined value is 5 or more
Yates Correction*
When there is only 1 degree of freedom, regular
chi-test should not be used
Apply the Yates correction by subtracting 0.5
from the absolute value of each calculated O-E
term, then continue as usual with the new
corrected values

What do these mean?

Likelihood Ratio Chi


Square

Continuity-Adjusted ChiSquare Test

Mantel-Haenszel ChiSquare Test


QMH = (n-1)r2
r2

is the Pearson correlation coefficient (which


also measures the linear association between row
and column)
http://support.sas.com/documentation/cdl/en/procstat/63104/
HTML/default/viewer.htm#procstat_freq_a0000000659.htm

Tests

alternative hypothesis that there is a


linear association between the row and column
variable
Follows a Chi-square distribution with 1 degree
of freedom

Phi Coefficient

Contigency Coefficient

Cramers V

Yates & 2 x 2 Contingency


Tables
H
: Heart Disease is not associated with cholesterol levels.
O

HA: Heart Disease is more likely in patients with a high


cholesterol diet.
High
Low
Total
Cholester Cholesterol
ol
Heart Disease

15

22

Expected

12.65

9.35

22

Chi-Square

0.44

0.59

1.03

No Heart
Disease

10

18

Expected

10.35

7.65

18

Chi-Square

0.53

0.72

1.25

TOTAL

23

17

40

Chi-Square
2.28
Total
Calculate
degrees of freedom: (c-1)(r-1) = 1*1 = 1
We need to use the YATES CORRECTION

Yates & 2 x 2 Contingency


Tables
H
: Heart Disease is not associated with cholesterol levels.
O

HA: Heart Disease is more likely in patients with a high


cholesterol diet.
High
Low
Total
Cholester Cholesterol
ol
Heart Disease

15

22

Expected

12.65

9.35

22

Chi-Square

0.27

0.37

0.64

No Heart
Disease

Expected

10.35

Chi-Square

0.33

TOTAL

23

Chi-Square
Total

10
(|15-12.65|
0.5)2
12.65
7.65
= 0.27
0.45
17

18
18
0.78
40
1.42

Example 1: Testing for


Proportions

2
=
3.841
=0.05

Yates & 2 x 2 Contingency


Tables
H
: Heart Disease is not associated with cholesterol levels.
O

HA: Heart Disease is more likely in patients with a high


cholesterol diet.
High
Low
Total
Cholester Cholesterol
ol
Heart Disease

15

22

Expected

12.65

9.35

22

Chi-Square

0.27

0.37

0.64

No Heart
Disease

10

18

Expected

10.35

7.65

18

Chi-Square

0.33

0.45

0.78

TOTAL

23

17

40

Chi-Square
1.42
Total
3.841 > 1.42 We do not reject our null hypothesis.

Fishers Exact Test


Left:

Use when the alternative to independence is


negative association between the variables. These
observations tend to lie in lower left and upper
right cells of the table. Small p-value = Likely
negative association.

Right:

Use this one-sided test when the alternative


to independence is positive association between
the variables. These observations tend to lie in
upper left and lower right cells or the table. Small
p-value = Likely positive association.

Two-Tail:

Use this when there is no prior


alternative.

Yates & 2 x 2 Contingency


Tables

Yates & 2 x 2 Contingency


Tables

HO: Heart Disease is not


associated with
cholesterol levels.
HA: Heart Disease is
more likely in patients
with a high cholesterol
diet.

Conclusion
The

Chi-square test is important in testing


the association between variables and/or
checking if ones expected proportions meet
the reality of ones experiment

There

are multiple chi-square tests, each


catered to a specific sample size, degrees of
freedom, and number of categories

We

can use SAS to conduct Chi-square tests


on our data by utilizing the command proc
freq

References
Chi-Square Test Descriptions:
http://www.enviroliteracy.org/pdf/materials/1210.pdf
http://129.123.92.202/biol1020/Statistics/Appendix
%206%20%20The%20Chi-Square%20TEst.pdf
Ozdemir T and Eyduran E. 2005. Comparison of chisquare and likelihood ratio chi-square tests: power of
test. Journal of Applied Sciences Research. 1(2):242244.
SAS Support website: http://www.sas.com/index.html
FREQ procedure
YouTube Chi-square SAS Tutorial (user: mbate001):
http://www.youtube.com/watch?v=ACbQ8FJTq7k

You might also like