You are on page 1of 31

Two-Way Tables and the Chi-Square Test

When analysis of categorical data is concerned with more than one variable, two-way tables (also
known as contingency tables) are employed. These tables provide a foundation for statistical inference,
where statistical tests question the relationship between the variables on the basis of the data observed.

Example
In the dataset "Popular Kids," students in grades 4-6 were asked whether good grades, athletic ability,
or popularity was most important to them. A two-way table separating the students by grade and by
choice of most important factor is shown below:
Grade
Goals | 4 5 6 Total
---------------------------------
Grades | 49 50 69 168
Popular | 24 36 38 98
Sports | 19 22 28 69
---------------------------------
Total | 92 108 135 335

To investigate possible differences among the students' choices by grade, it is useful to compute the
column percentages for each choice, as follows:
Grade
Goals | 4 5 6
---------------------------
Grades | 53 46 51
Popular | 26 33 28
Sports | 21 20 21
---------------------------
Total | 100 100 100

There is error in the second column (the percentages sum to 99, not 100) due to rounding. From the
appearance of the column percentages, it does not appear that there is much of a variation in preference
across the three grades.
Data source: Chase, M.A and Dummer, G.M. (1992), "The Role of Sports as a Social Determinant for
Children," Research Quarterly for Exercise and Sport, 63, 418-424. Dataset available through the
Statlib Data and Story Library (DASL).

The chi-square test provides a method for testing the association between the row and column
variables in a two-way table. The null hypothesis H0 assumes that there is no association between the
variables (in other words, one variable does not vary according to the other variable), while the
alternative hypothesis Ha claims that some association does exist. The alternative hypothesis does not
specify the type of association, so close attention to the data is required to interpret the information
provided by the test.
The chi-square test is based on a test statistic that measures the divergence of the observed data
from the values that would be expected under the null hypothesis of no association. This requires
calculation of the expected values based on the data. The expected value for each cell in a two-
way table is equal to (row total*column total)/n, where n is the total number of observations
included in the table.

Example
Continuing from the above example with the two-way table for students choice of grades, athletic
ability, or popularity by grade, the expected values are calculated as shown below:
Original Table Expected Values
Grade Grade
Goals | 4 5 6 Total Goals | 4 5 6
--------------------------------- ---------------------------
Grades | 49 50 69 168 Grades | 46.1 54.2 67.7
Popular | 24 36 38 98 Popular | 26.9 31.6 39.5
Sports | 19 22 28 69 Sports | 18.9 22.2 27.8
---------------------------------
Total | 92 108 135 335

The first cell in the expected values table, Grade 4 with "grades" chosen to be most important, is
calculated to be 168*92/335 = 46.1, for example.

Once the expected values have been computed (done automatically in most software packages), the
chi-square test statistic is computed as

where the square of the differences between the observed and expected values in each cell, divided by
the expected value, are added across all of the cells in the table.
The distribution of the statistic X2 is chi-square with (r-1)(c-1) degrees of freedom, where r represents
the number of rows in the two-way table and c represents the number of columns. The distribution is
denoted (df), where df is the number of degrees of freedom.
The chi-square distribution is defined for all positive values. The P-value for the chi-square test is P(
>X²), the probability of observing a value at least as extreme as the test statistic for a chi-square
distribution with (r-1)(c-1) degrees of freedom.

Example
The chi-square statistic for the above example is computed as follows:
X² = (49 - 46.1)²/46.1 + (50 - 54.2)²/54.2 + (69 - 67.7)²/67.7 + .... + (28 - 27.8)²/27.8
= 0.18 + 0.33 + 0.03 + .... + 0.01
= 1.51
The degrees of freedom are equal to (3-1)(3-1) = 2*2 = 4, so we are interested in the probability P(
> 1.51) = 0.8244 on 4 degrees of freedom. This indicates that there is no association between the
choice of most important factor and the grade of the student -- the difference between observed and
expected values under the null hypothesis is negligible.
Example
The "Popular Kids" dataset also divided the students' responses into "Urban," "Suburban," and "Rural"
school areas. Is there an association between the type of school area and the students' choice of good
grades, athletic ability, or popularity as most important?
A two-way table for student goals and school area appears as follows:
School Area
Goals | Rural Suburban Urban Total
--------------------------------------------
Grades | 57 87 24 168
Popular | 50 42 6 98
Sports | 42 22 5 69
--------------------------------------------
Total | 149 151 35 335

The corresponding column percentages are the following:


School Area
Goals | Rural Suburban Urban
-----------------------------------
Grades | 38 58 69
Popular | 34 28 17
Sports | 28 14 14
-----------------------------------
Total | 100 100 100

Barplots comparing the percentages of students' choices by school area appear below:

From the table and corresponding graphs, it appears that the emphasis on grades increases as the school
areas become more urban, while the emphasis on popularity decreases. Is this association significant?
Using the MINITAB "CHIS" command to perform a chi-square test on the tabular data gives the
following results:
Chi-Square Test

Expected counts are printed below observed counts

Rural Suburban Urban Total


1 57 87 24 168
74.72 75.73 17.55

2 50 42 6 98
43.59 44.17 10.24

3 42 22 5 69
30.69 31.10 7.21

Total 149 151 35 335

Chi-Sq = 4.203 + 1.679 + 2.369 +


0.943 + 0.107 + 1.755 +
4.168 + 2.663 + 0.677 = 18.564
DF = 4, P-Value = 0.001

The P-value is highly significant, indicating that some association between the variables is present. We
can conclude that the urban students' increased emphasis on grades is not due to random variation.
Data source: Chase, M.A and Dummer, G.M. (1992), "The Role of Sports as a Social Determinant for
Children," Research Quarterly for Exercise and Sport, 63, 418-424. Dataset available through the
Statlib Data and Story Library (DASL).

The chi-square index in the Statlib Data and Story Library (DASL) provides several other examples of
the use of the chi-square test in categorical data analysis.
Hypothesis Testing - Chi Squared Test

Introduction
This module will continue the discussion
of hypothesis testing, where a specific
statement or hypothesis is generated
about a population parameter, and
sample statistics are used to assess the
likelihood that the hypothesis is true.
The hypothesis is based on available
information and the investigator's belief
about the population parameters. The
specific tests considered here are called
chi-square tests and are appropriate
when the outcome is discrete
(dichotomous, ordinal or categorical).
For example, in some clinical trials the
outcome is a classification such as
hypertensive, pre-hypertensive or
normotensive. We could use the same
classification in an observational study such as the Framingham Heart Study to compare mean and
women in terms of their blood pressure status - again using the classification of hypertensive, pre-
hypertensive or normotensive status.
The technique to analyze a discrete outcome uses what is called a chi-square test. Specifically, the test
statistic follows a chi-square probability distribution. We will consider chi-square tests here with one,
two and more than two independent comparison groups.

Learning Objectives
After completing this module, the student will be able to:
1. Perform chi-square test s by hand
2. Appropriately interpret results of chi-square tests
3. Identify the appropriate hypothesis testing procedure based on type of outcome variable and
number of samples
Tests with One Sample, Discrete Outcome
Here we consider hypothesis testing with a discrete outcome variable in a single population. Discrete
variables are variables that take on more than two distinct responses or categories and the responses can
be ordered or unordered (i.e., the outcome can be ordinal or categorical). The procedure we describe
here can be used for dichotomous (exactly 2 response options), ordinal or categorical discrete outcomes
and the objective is to compare the distribution of responses, or the proportions of participants in each
response category, to a known distribution. The known distribution is derived from another study or
report and it is again important in setting up the hypotheses that the comparator distribution specified in
the null hypothesis is a fair comparison. The comparator is sometimes called an external or a historical
control.
In one sample tests for a discrete outcome, we set up our hypotheses against an appropriate comparator.
We select a sample and compute descriptive statistics on the sample data. Specifically, we compute the
sample size (n) and the proportions of participants in each response category ( 1 , 2 , ... k ) where
k represents the number of response categories. We then determine the appropriate test statistic for the
hypothesis test. The formula for the test statistic is given below.
Test Statistic for Testing H0: p1 = p 10 , p2 = p
20 , ..., pk = p k0

We find the critical value in a table of probabilities for the chi-square distribution with df=k-1.
In the test statistic, O = observed frequency and E=expected frequency in each of the response
categories. The observed frequencies are those observed in the sample and the expected frequencies are
computed as described below. χ2 (chi-square) is another probability distribution and ranges from 0 to
∞. The test above statistic formula above is appropriate for large samples, defined as expected
frequencies of at least 5 in each of the response categories.
When we conduct a χ2 test, we compare the observed frequencies in each response category to the
frequencies we would expect if the null hypothesis were true. These expected frequencies are
determined by allocating the sample to the response categories according to the distribution specified in
H0. This is done by multiplying the observed sample size (n) by the proportions specified in the null
hypothesis (p 10 , p 20 , ..., p k0 ). To ensure that the sample size is appropriate for the use of the test
statistic above, we need to ensure that the following: min(np10 , n p20 , ..., n pk0 ) > 5.
The test of hypothesis with a discrete outcome measured in a single sample, where the goal is to assess
whether the distribution of responses follows a known distribution, is called the χ2 goodness-of-fit test.
As the name indicates, the idea is to assess whether the pattern or distribution of responses in the
sample "fits" a specified population (external or historical) distribution. In the next example we
illustrate the test. As we work through the example, we provide additional details related to the use of
this new test statistic.
Example
A University conducted a survey of its recent graduates to collect demographic and health information
for future planning purposes as well as to assess students' satisfaction with their undergraduate
experiences. The survey revealed that a substantial proportion of students were not engaging in regular
exercise, many felt their nutrition was poor and a substantial number were smoking. In response to a
question on regular exercise, 60% of all graduates reported getting no regular exercise, 25% reported
exercising sporadically and 15% reported exercising regularly as undergraduates. The next year the
University launched a health promotion campaign on campus in an attempt to increase health behaviors
among undergraduates. The program included modules on exercise, nutrition and smoking cessation.
To evaluate the impact of the program, the University again surveyed graduates and asked the same
questions. The survey was completed by 470 graduates and the following data were collected on the
exercise question:
No Regular Sporadic Regular Total
Exercise Exercise Exercise

Number of 255 125 90 470


Students

Based on the data, is there evidence of a shift in the distribution of responses to the exercise question
following the implementation of the health promotion campaign on campus? Run the test at a 5% level
of significance.
In this example, we have one sample and a discrete (ordinal) outcome variable (with three response
options). We specifically want to compare the distribution of responses in the sample to the distribution
reported the previous year (i.e., 60%, 25%, 15% reporting no, sporadic and regular exercise,
respectively). We now run the test using the five-step approach.

Step 1. Set up hypotheses and determine level of significance.


The null hypothesis again represents the "no change" or "no difference" situation. If the health
promotion campaign has no impact then we expect the distribution of responses to the exercise
question to be the same as that measured prior to the implementation of the program.
H0: p1=0.60, p2=0.25, p3=0.15 or equivalently
H0: Distribution of responses is 0.60, 0.25, 0.15

H1: H0 is false. α =0.05


Notice that the research hypothesis is written in words rather than in symbols. The research hypothesis
as stated captures any difference in the distribution of responses from that specified in the null
hypothesis. We do not specify a specific alternative distribution, instead we are testing whether the
sample data "fit" the distribution in H0 or not. With the χ2 goodness-of-fit test there is no upper or
lower tailed version of the test.
Step 2. Select the appropriate test statistic.
The test statistic is:

.
We must first assess whether the sample size is adequate. Specifically, we need to check min(np0,
np1, ..., n pk) > 5. The sample size here is n=470 and the proportions specified in the null hypothesis are
0.60, 0.25 and 0.15. Thus, min( 470(0.65), 470(0.25), 470(0.15))=min(282, 117.5, 70.5)=70.5. The
sample size is more than adequate so the formula can be used.
Step 3. Set up decision rule.
The decision rule for the χ2 test depends on the level of significance and the degrees of freedom,
defined as df=k-1 (where k is the number of response categories). If the null hypothesis is true, the
observed and expected frequencies will be close in value and the χ2 statistic will be close to zero. If the
null hypothesis is false, then the χ2 statistic will be large. The rejection region for the χ2 test is always
in the upper (right-hand) tail as shown below.
Rejection Region for χ2 Test with α=0.05 and df=2

Critical values can be found in a table of probabilities for the χ2 distribution. Here we have df=k-1=3-
1=2 and a 5% level of significance. The appropriate critical value is 5.99, and the decision rule is as
follows: Reject H0 if χ2 > 5.99.
Step 4. Compute the test statistic.
We now compute the expected frequencies using the sample size and the proportions specified in the
null hypothesis. We then substitute the sample data (observed frequencies) and the expected
frequencies into the formula for the test statistic identified in Step 2. The computations can be
organized as follows.
No Sporadic Regular Total
Regular Exercise Exercise
Exercise
Observed 255 125 90 470
Frequencies (O)

Expected 470(0.60) 470(0.25) 470(0.15) 470


Frequencies (E)
=282 =117.5 =70.5

Notice that the expected frequencies are taken to one decimal place and that the sum of the observed
frequencies is equal to the sum of the expected frequencies. The test statistic is computed as follows:

= 2.59 + 0.48 + 5.39 = 8.46.


Step 5. Conclusion.
We reject H0 because 8.46 > 5.99. We have statistically significant evidence at α=0.05 to show that H0
is false, or that the distribution of responses is not 0.60, 0.25, 0.15. The p-value is p < 0.005.
In the χ2 goodness-of-fit test, we conclude that either the distribution specified in H0 is false (when we
reject H0) or that we do not have sufficient evidence to show that the distribution specified in H0 is
false (when we fail to reject H0). Here, we reject H0 and concluded that the distribution of responses to
the exercise question following the implementation of the health promotion campaign was not the same
as the distribution prior. The test itself does not provide details of how the distribution has shifted. A
comparison of the observed and expected frequencies will provide some insight into the shift (when the
null hypothesis is rejected). Does it appear that the health promotion campaign was effective?
Consider the following:
No Regular Sporadic Regular Total
Exercise Exercise Exercise

Observed 255 125 90 470


Frequencies (O)

Expected 282 117.5 70.5 470


Frequencies (E)

If the null hypothesis were true (i.e., no change from the prior year) we would have expected more
students to fall in the "No Regular Exercise" category and fewer in the "Regular Exercise" categories.
In the sample, 255/470 = 54% reported no regular exercise and 90/470=19% reported regular exercise.
Thus, there is a shift toward more regular exercise following the implementation of the health
promotion campaign. There is evidence of a statistical difference, is this a meaningful difference? Is
there room for improvement?

Example
The National Center for Health Statistics (NCHS) provided data on the distribution of weight (in
categories) among Americans in 2002. The distribution was based on specific values of body mass
index (BMI) computed as weight in kilograms over height in meters squared. Underweight was defined
as BMI< 18.5, Normal weight as BMI between 18.5 and 24.9, overweight as BMI between 25 and 29.9
and obese as BMI of 30 or greater. Americans in 2002 were distributed as follows: 2% Underweight,
39% Normal Weight, 36% Overweight, and 23% Obese. Suppose we want to assess whether the
distribution of BMI is different in the Framingham Offspring sample. Using data from the n=3,326
participants who attended the seventh examination of the Offspring in the Framingham Heart Study we
created the BMI categories as defined and observed the following:
Underweight Normal Overweight Obese Total
Weight
BMI<18.5 BMI 25.0-29.9 BMI > 30
BMI 18.5-
24.9

# of 20 932 1374 1000 3326


Participants

Step 1. Set up hypotheses and determine level of significance.


H0: p1=0.02, p2=0.39, p3=0.36, p4=0.23 or equivalently
H0: Distribution of responses is 0.02, 0.39, 0.36, 0.23

H1: H0 is false. α=0.05


Step 2. Select the appropriate test statistic.
The formula for the test statistic is:

.
We must assess whether the sample size is adequate. Specifically, we need to check min(np0, np1, ..., n
pk) > 5. The sample size here is n=3,326 and the proportions specified in the null hypothesis are 0.02,
0.39, 0.36 and 0.23. Thus, min( 3326(0.02), 3326(0.39), 3326(0.36), 3326(0.23))=min(66.5, 1297.1,
1197.4, 765.0)=66.5. The sample size is more than adequate, so the formula can be used.
Step 3. Set up decision rule.
Here we have df=k-1=4-1=3 and a 5% level of significance. The appropriate critical value is 7.81 and
the decision rule is as follows: Reject H0 if χ2 > 7.81.
Step 4. Compute the test statistic.
We now compute the expected frequencies using the sample size and the proportions specified in the
null hypothesis. We then substitute the sample data (observed frequencies) into the formula for the test
statistic identified in Step 2. We organize the computations in the following table.
Underweight Normal Overweight Obese Total
BMI 25.0- BMI > 30
29.9
BMI<18.5 BMI 18.5-
24.9

Observed Frequencies 20 932 1374 1000 3326


(O)

Expected Frequencies 66.5 1297.1 1197.4 765.0 3326


(E)

The test statistic is computed as follows:

= 32.52 + 102.77 + 26.05 + 72.19 = 233.53.


Step 5. Conclusion.
We reject H0 because 233.53 > 7.81. We have statistically significant evidence at α=0.05 to show that
H0 is false or that the distribution of BMI in Framingham is different from the national data reported in
2002, p < 0.005.
Again, the χ2 goodness-of-fit test allows us to assess whether the distribution of responses "fits" a
specified distribution. Here we show that the distribution of BMI in the Framingham Offspring Study is
different from the national distribution. To understand the nature of the difference we can compare
observed and expected frequencies or observed and expected proportions (or percentages). The
frequencies are large because of the large sample size, the observed percentages of patients in the
Framingham sample are as follows: 0.6% underweight, 28% normal weight, 41% overweight and 30%
obese. In the Framingham Offspring sample there are higher percentages of overweight and obese
persons (41% and 30% in Framingham as compared to 36% and 23% in the national data), and lower
proportions of underweight and normal weight persons (0.6% and 28% in Framingham as compared to
2% and 39% in the national data). Are these meaningful differences?
In the module on hypothesis testing for means and proportions, we discussed hypothesis testing
applications with a dichotomous outcome variable in a single population. We presented a test using a
test statistic Z to test whether an observed (sample) proportion differed significantly from a historical
or external comparator. The chi-square goodness-of-fit test can also be used with a dichotomous
outcome and the results are mathematically equivalent.
In the prior module, we considered the following example. Here we show the equivalence to the chi-
square goodness-of-fit test.

Example
The NCHS report indicated that in 2002, 75% of children aged 2 to 17 saw a dentist in the past year. An
investigator wants to assess whether use of dental services is similar in children living in the city of
Boston. A sample of 125 children aged 2 to 17 living in Boston are surveyed and 64 reported seeing a
dentist over the past 12 months. Is there a significant difference in use of dental services between
children living in Boston and the national data?
We presented the following approach to the test using a Z statistic.
Step 1. Set up hypotheses and determine level of significance
H0: p = 0.75
H1: p ≠ 0.75 α=0.05
Step 2. Select the appropriate test statistic.
We must first check that the sample size is adequate. Specifically, we need to check min(np0, n(1-p0)) =
min( 125(0.75), 125(1-0.75))=min(94, 31)=31. The sample size is more than adequate so the following
formula can be used

.
Step 3. Set up decision rule.
This is a two-tailed test, using a Z statistic and a 5% level of significance. Reject H0 if Z < -1.960 or if
Z > 1.960.
Step 4. Compute the test statistic.
We now substitute the sample data into the formula for the test statistic identified in Step 2. The sample
proportion is:

Step 5. Conclusion.
We reject H0 because -6.15 < -1.960. We have statistically significant evidence at a =0.05 to show that
there is a statistically significant difference in the use of dental service by children living in Boston as
compared to the national data. (p < 0.0001).
We now conduct the same test using the chi-square goodness-of-fit test. First, we summarize our
sample data as follows:
Saw a Did Not See Total
Dentist a Dentist

in Past 12 in Past 12
Months Months
# of 64 61 125
Participants

Step 1. Set up hypotheses and determine level of significance.


H0: p1=0.75, p2=0.25 or equivalently
H0: Distribution of responses is 0.75, 0.25

H1: H0 is false. α=0.05


Step 2. Select the appropriate test statistic.
The formula for the test statistic is:

.
We must assess whether the sample size is adequate. Specifically, we need to check min(np0, np1,
...,npk>) > 5. The sample size here is n=125 and the proportions specified in the null hypothesis are
0.75, 0.25. Thus, min( 125(0.75), 125(0.25))=min(93.75, 31.25)=31.25. The sample size is more than
adequate so the formula can be used.
Step 3. Set up decision rule.
Here we have df=k-1=2-1=1 and a 5% level of significance. The appropriate critical value is 3.84, and
the decision rule is as follows: Reject H0 if χ2 > 3.84. (Note that 1.962 = 3.84, where 1.96 was the
critical value used in the Z test for proportions shown above.)
Step 4. Compute the test statistic.
We now compute the expected frequencies using the sample size and the proportions specified in the
null hypothesis. We then substitute the sample data (observed frequencies) into the formula for the test
statistic identified in Step 2. We organize the computations in the following table.
Saw a Dentist Did Not See a Total
Dentist
in Past 12
Months in Past 12
Months

Observed 64 61 125
Frequencies (O)

Expected 93.75 31.25 125


Frequencies (E)

The test statistic is computed as follows:


= 9.44 + 28.32 = 37.8.
(Note that (-6.15)2 = 37.8, where -6.15 was the value of the Z statistic in the test for proportions shown
above.)
Step 5. Conclusion.
We reject H0 because 37.8 > 3.84. We have statistically significant evidence at α=0.05 to show that
there is a statistically significant difference in the use of dental service by children living in Boston as
compared to the national data. (p < 0.0001). This is the same conclusion we reached when we
conducted the test using the Z test above. With a dichotomous outcome, Z2 = χ2 ! In statistics, there
are often several approaches that can be used to test hypotheses.

Tests for Two or More Independent Samples,


Discrete Outcome
Here we extend that application of the chi-square test to the case with two or more independent
comparison groups. Specifically, the outcome of interest is discrete with two or more responses and the
responses can be ordered or unordered (i.e., the outcome can be dichotomous, ordinal or categorical).
We now consider the situation where there are two or more independent comparison groups and the
goal of the analysis is to compare the distribution of responses to the discrete outcome variable among
several independent comparison groups.
The test is called the χ2 test of independence and the null hypothesis is that there is no difference in the
distribution of responses to the outcome across comparison groups. This is often stated as follows: The
outcome variable and the grouping variable (e.g., the comparison treatments or comparison groups) are
independent (hence the name of the test). Independence here implies homogeneity in the distribution of
the outcome among comparison groups.
The null hypothesis in the χ2 test of independence is often stated in words as: H0: The distribution of
the outcome is independent of the groups. The alternative or research hypothesis is that there is a
difference in the distribution of responses to the outcome variable among the comparison groups (i.e.,
that the distribution of responses "depends" on the group). In order to test the hypothesis, we measure
the discrete outcome variable in each participant in each comparison group. The data of interest are the
observed frequencies (or number of participants in each response category in each group). The formula
for the test statistic for the χ2 test of independence is given below.

Test Statistic for Testing H0: Distribution of outcome is independent of groups


and we find the critical value in a table of probabilities for the chi-square distribution with df=(r-1)*(c-
1).
Here O = observed frequency, E=expected frequency in each of the response categories in each group, r
= the number of rows in the two-way table and c = the number of columns in the two-way table. r and
c correspond to the number of comparison groups and the number of response options in the outcome
(see below for more details). The observed frequencies are the sample data and the expected
frequencies are computed as described below. The test statistic is appropriate for large samples, defined
as expected frequencies of at least 5 in each of the response categories in each group.
The data for the χ2 test of independence are organized in a two-way table. The outcome and grouping
variable are shown in the rows and columns of the table. The sample table below illustrates the data
layout. The table entries (blank below) are the numbers of participants in each group responding to
each response category of the outcome variable.
Outcome Variable

Grouping Response Response ... Response Row


Variable Option 1 Option 2 Option c Totals

Group 1

Group 2

Group r

Column N
Totals

In the table above, the grouping variable is shown in the rows of the table; r denotes the number of
independent groups. The outcome variable is shown in the columns of the table; c denotes the number
of response options in the outcome variable. Each combination of a row (group) and column (response)
is called a cell of the table. The table has r*c cells and is sometimes called an r x c ("r by c") table. For
example, if there are 4 groups and 5 categories in the outcome variable, the data are organized in a 4 X
5 table. The row and column totals are shown along the right-hand margin and the bottom of the table,
respectively. The total sample size, N, can be computed by summing the row totals or the column
totals. Similar to ANOVA, N does not refer to a population size here but rather to the total sample size
in the analysis. The sample data can be organized into a table like the above. The numbers of
participants within each group who select each response option are shown in the cells of the table and
these are the observed frequencies used in the test statistic.
The test statistic for the χ2 test of independence involves comparing observed (sample data) and
expected frequencies in each cell of the table. The expected frequencies are computed assuming that
the null hypothesis is true. The null hypothesis states that the two variables (the grouping variable and
the outcome) are independent. The definition of independence is as follows:

Two events, A and B, are independent if P(A|B) = P(A), or equivalently, if P(A and B) = P(A)
P(B).

The second statement indicates that if two events, A and B, are independent then the probability of their
intersection can be computed by multiplying the probability of each individual event. To conduct the χ2
test of independence, we need to compute expected frequencies in each cell of the table. Expected
frequencies are computed by assuming that the grouping variable and outcome are independent (i.e.,
under the null hypothesis). Thus, if the null hypothesis is true, using the definition of independence:

P(Group 1 and Response Option 1) = P(Group 1) P(Response Option 1).

The above states that the probability that an individual is in Group 1 and their outcome is Response
Option 1 is computed by multiplying the probability that person is in Group 1 by the probability that a
person is in Response Option 1. To conduct the χ2 test of independence, we need expected frequencies
and not expected probabilities. To convert the above probability to a frequency, we multiply by N.
Consider the following small example.
Response Response Response Total
1 2 3

Group 10 8 7 25
1

Group 22 15 13 50
2

Group 30 28 17 75
3

Total 62 51 37 150

The data shown above are measured in a sample of size N=150. The frequencies in the cells of the table
are the observed frequencies. If Group and Response are independent, then we can compute the
probability that a person in the sample is in Group 1 and Response category 1 using:
P(Group 1 and Response 1) = P(Group 1) P(Response 1),
P(Group 1 and Response 1) = (25/150) (62/150) = 0.069.
Thus if Group and Response are independent we would expect 6.9% of the sample to be in the top left
cell of the table (Group 1 and Response 1). The expected frequency is 150(0.069) = 10.4. We could do
the same for Group 2 and Response 1:

P(Group 2 and Response 1) = P(Group 2) P(Response 1),


P(Group 2 and Response 1) = (50/150) (62/150) = 0.138.
The expected frequency in Group 2 and Response 1 is 150(0.138) = 20.7.
Thus, the formula for determining the expected cell frequencies in the χ2 test of independence is as
follows:
Expected Cell Frequency = (Row Total * Column Total)/N.
The above computes the expected frequency in one step rather than computing the expected probability
first and then converting to a frequency.

Example
In a prior example we evaluated data from a survey of university graduates which assessed, among
other things, how frequently they exercised. The survey was completed by 470 graduates. In the prior
example we used the χ2 goodness-of-fit test to assess whether there was a shift in the distribution of
responses to the exercise question following the implementation of a health promotion campaign on
campus. We specifically considered one sample (all students) and compared the observed distribution
to the distribution of responses the prior year (a historical control). Suppose we now wish to assess
whether there is a relationship between exercise on campus and students' living arrangements. As part
of the same survey, graduates were asked where they lived their senior year. The response options were
dormitory, on-campus apartment, off-campus apartment, and at home (i.e., commuted to and from the
university). The data are shown below.
No Sporadic Regular Total
Regular Exercise Exercise
Exercise

Dormitory 32 30 28 90

On-Campus 74 64 42 180
Apartment

Off-Campus 110 25 15 150


Apartment

At Home 39 6 5 50

Total 255 125 90 470

Based on the data, is there a relationship between exercise and student's living arrangement? Do you
think where a person lives affect their exercise status? Here we have four independent comparison
groups (living arrangement) and a discrete (ordinal) outcome variable with three response options. We
specifically want to test whether living arrangement and exercise are independent. We will run the test
using the five-step approach.
Step 1. Set up hypotheses and determine level of significance.
H0: Living arrangement and exercise are independent
H1: H0 is false. α=0.05
The null and research hypotheses are written in words rather than in symbols. The research hypothesis
is that the grouping variable (living arrangement) and the outcome variable (exercise) are dependent or
related.
Step 2. Select the appropriate test statistic.
The formula for the test statistic is:

.
The condition for appropriate use of the above test statistic is that each expected frequency is at least 5.
In Step 4 we will compute the expected frequencies and we will ensure that the condition is met.
Step 3. Set up decision rule.
The decision rule depends on the level of significance and the degrees of freedom, defined as df = (r-1)
(c-1), where r and c are the numbers of rows and columns in the two-way data table. The row variable
is the living arrangement and there are 4 arrangements considered, thus r=4. The column variable is
exercise and 3 responses are considered, thus c=3. For this test, df=(4-1)(3-1)=3(2)=6. Again, with χ2
tests there are no upper, lower or two-tailed tests. If the null hypothesis is true, the observed and
expected frequencies will be close in value and the χ2 statistic will be close to zero. If the null
hypothesis is false, then the χ2 statistic will be large. The rejection region for the χ2 test of
independence is always in the upper (right-hand) tail of the distribution. For df=6 and a 5% level of
significance, the appropriate critical value is 12.59 and the decision rule is as follows: Reject H0 if c 2 >
12.59.
Step 4. Compute the test statistic.
We now compute the expected frequencies using the formula,
Expected Frequency = (Row Total * Column Total)/N.
The computations can be organized in a two-way table. The top number in each cell of the table is the
observed frequency and the bottom number is the expected frequency. The expected frequencies are
shown in parentheses.
No Sporadic Regular Total
Regular Exercise Exercise
Exercise

Dormitory 32 30 28 90
(48.8) (23.9) (17.2)

On-Campus 74 64 42 180
Apartment
(97.7) (47.9) (34.5)

Off-Campus 110 25 15 150


Apartment
(81.4) (39.9) (28.7)

At Home 39 6 5 50

(27.1) (13.3) (9.6)

Total 255 125 90 470

Notice that the expected frequencies are taken to one decimal place and that the sums of the observed
frequencies are equal to the sums of the expected frequencies in each row and column of the table.
Recall in Step 2 a condition for the appropriate use of the test statistic was that each expected frequency
is at least 5. This is true for this sample (the smallest expected frequency is 9.6) and therefore it is
appropriate to use the test statistic.
The test statistic is computed as follows:

= 5.78 + 1.56 + 6.78 + 5.75 + 5.41 + 1.63 + 10.05 + 5.56 + 6.54 + 5.23 + 4.01 + 2.20 = 60.5.
Step 5. Conclusion.
We reject H0 because 60.5 > 12.59. We have statistically significant evidence at a =0.05 to show that
H0 is false or that living arrangement and exercise are not independent (i.e., they are dependent or
related), p < 0.005.
Again, the χ2 test of independence is used to test whether the distribution of the outcome variable is
similar across the comparison groups. Here we rejected H0 and concluded that the distribution of
exercise is not independent of living arrangement, or that there is a relationship between living
arrangement and exercise. The test provides an overall assessment of statistical significance. When the
null hypothesis is rejected, it is important to review the sample data to understand the nature of the
relationship. Consider again the sample data.
No Sporadic Regular Total
Regular Exercise Exercise
Exercise

Dormitory 32 30 28 90

On-Campus 74 64 42 180
Apartment

Off-Campus 110 25 15 150


Apartment

At Home 39 6 5 50

Total 255 125 90 470

Because there are different numbers of students in each living situation, it makes the comparisons of
exercise patterns difficult on the basis of the frequencies alone. The following table displays the
percentages of students in each exercise category by living arrangement. The percentages sum to 100%
in each row of the table. For comparison purposes, percentages are also shown for the total sample
along the bottom row of the table.
No Regular Sporadic Regular
Exercise Exercise Exercise

Dormitory 36% 33% 31%

On-Campus 41% 36% 23%


Apartment

Off-Campus 73% 17% 10%


Apartment

At Home 78% 12% 10%

Total 54% 27% 19%


From the above, it is clear that higher percentages of students living in dormitories and in on-campus
apartments reported regular exercise (31% and 23%) as compared to students living in off-campus
apartments and at home (10% each).

Test Yourself

Pancreaticoduodenectomy (PD) is a procedure that is associated


with considerable morbidity. A study was recently conducted on
553 patients who had a successful PD between January 2000
and December 2010 to determine whether their Surgical Apgar
Score (SAS) is related to 30-day perioperative morbidity and
mortality. The table below gives the number of patients
experiencing no, minor, or major morbidity by SAS category.

Q: What would be an appropriate statistical test to examine


whether there is an association between Surgical Apgar Score
and patient outcome? Using 14.13 as the value of the test
statistic for these data, carry out the appropriate test at a 5%
level of significance. Show all parts of your test.

SEE ANSWER
Patient Outcome

Surgical Apgar Score No morbidity Minor morbidity Major morbidity or mortality

0-4 21 20 16

5-6 135 71 35

7-10 158 62 35

In the module on hypothesis testing for means and proportions, we discussed hypothesis testing
applications with a dichotomous outcome variable and two independent comparison groups. We
presented a test using a test statistic Z to test for equality of independent proportions. The chi-square
test of independence can also be used with a dichotomous outcome and the results are mathematically
equivalent.
In the prior module, we considered the following example. Here we show the equivalence to the chi-
square test of independence.

Example
A randomized trial is designed to evaluate the effectiveness of a newly developed pain reliever
designed to reduce pain in patients following joint replacement surgery. The trial compares the new
pain reliever to the pain reliever currently in use (called the standard of care). A total of 100 patients
undergoing joint replacement surgery agreed to participate in the trial. Patients were randomly assigned
to receive either the new pain reliever or the standard pain reliever following surgery and were blind to
the treatment assignment. Before receiving the assigned treatment, patients were asked to rate their pain
on a scale of 0-10 with higher scores indicative of more pain. Each patient was then given the assigned
treatment and after 30 minutes was again asked to rate their pain on the same scale. The primary
outcome was a reduction in pain of 3 or more scale points (defined by clinicians as a clinically
meaningful reduction). The following data were observed in the trial.
Number with Proportion with
Reduction Reduction
Treatment Group n
of 3+ Points of 3+ Points

New Pain Reliever 50 23 0.46

Standard Pain 50 11 0.22


Reliever

We tested whether there was a significant difference in the proportions of patients reporting a
meaningful reduction (i.e., a reduction of 3 or more scale points) using a Z statistic, as follows.
Step 1. Set up hypotheses and determine level of significance
H0: p1 = p2
H1: p1 ≠ p2 α=0.05
Here the new or experimental pain reliever is group 1 and the standard pain reliever is group 2.
Step 2. Select the appropriate test statistic.
We must first check that the sample size is adequate. Specifically, we need to ensure that we have at
least 5 successes and 5 failures in each comparison group or that:

In this example, we have min(50(0.46), 50(1-0.46), 50(0.22), 50(1-0.22)) = min(23, 27, 11, 39) = 11.
The sample size is adequate, so the following formula can be used:

.
Step 3. Set up decision rule.
Reject H0 if Z < -1.960 or if Z > 1.960.
Step 4. Compute the test statistic.
We now substitute the sample data into the formula for the test statistic identified in Step 2. We first
compute the overall proportion of successes:

We now substitute to compute the test statistic.

Step 5. Conclusion.
We reject H0 because 2.53 > 1960. We have statistically significant evidence at α=0.05 to show that
there is a difference in the proportions of patients on the new pain reliever reporting a meaningful
reduction (i.e., a reduction of 3 or more scale points) as compared to patients on the standard pain
reliever.
We now conduct the same test using the chi-square test of independence.

Step 1. Set up hypotheses and determine level of significance.


H0: Treatment and outcome (meaningful reduction in pain) are independent
H1: H0 is false. α=0.05
Step 2. Select the appropriate test statistic.
The formula for the test statistic is:

.
The condition for appropriate use of the above test statistic is that each expected frequency is at least 5.
In Step 4 we will compute the expected frequencies and we will ensure that the condition is met.
Step 3. Set up decision rule.
For this test, df=(2-1)(2-1)=1. At a 5% level of significance, the appropriate critical value is 3.84 and
the decision rule is as follows: Reject H0 if χ2 > 3.84. (Note that 1.962 = 3.84, where 1.96 was the
critical value used in the Z test for proportions shown above.)
Step 4. Compute the test statistic.
We now compute the expected frequencies using:
Expected Frequency = (Row Total * Column Total)/N.
The computations can be organized in a two-way table. The top number in each cell of the table is the
observed frequency and the bottom number is the expected frequency. The expected frequencies are
shown in parentheses.
# with # with
Reduction Reduction
Treatment Group Total
of 3+ Points of <3 Points

New Pain Reliever 23 27 50

(17.0) (33.0)

Standard Pain 11 39 50
Reliever
(17.0) (33.0)

Total 34 66 100

A condition for the appropriate use of the test statistic was that each expected frequency is at least 5.
This is true for this sample (the smallest expected frequency is 22.0) and therefore it is appropriate to
use the test statistic.
The test statistic is computed as follows:
= 2.12 + 1.09 + 2.12 + 1.09 = 6.42
(Note that (2.53)2 = 6.4, where 2.53 was the value of the Z statistic in the test for proportions shown
above.)
Step 5. Conclusion.
We reject H0 because 6.42 > 3.84. We have statistically significant evidence at α=0.05 to show that H0
is false or that treatment and outcome are not independent (i.e., they are dependent or related). This is
the same conclusion we reached when we conducted the test using the Z test above. With a
dichotomous outcome and two independent comparison groups, Z2 = χ2 ! Again, in statistics there are
often several approaches that can be used to test hypotheses.

Tutorial: Pearson's Chi-square Test for Independence


Ling 300, Fall 2008

What is the Chi-square test for?

The Chi-square test is intended to test how likely it is that an observed distribution is due to chance. It
is also called a "goodness of fit" statistic, because it measures how well the observed distribution of
data fits with the distribution that is expected if the variables are independent.

A Chi-square test is designed to analyze categorical data. That means that the data has been counted
and divided into categories. It will not work with parametric or continuous data (such as height in
inches). For example, if you want to test whether attending class influences how students perform on an
exam, using test scores (from 0-100) as data would not be appropriate for a Chi-square test. However,
arranging students into the categories "Pass" and "Fail" would. Additionally, the data in a Chi-square
grid should not be in the form of percentages, or anything other than frequency (count) data. Thus, by
dividing a class of 54 into groups according to whether they attended class and whether they passed the
exam, you might construct a data set like this:

Pass Fail
Attended 25 6
Skipped 8 15

IMPORTANT: Be very careful when constructing your categories! A Chi-square test can tell you
information based on how you divide up the data. However, it cannot tell you whether the categories
you constructed are meaningful. For example, if you are working with data on groups of people, you
can divide them into age groups (18-25, 26-40, 41-60...) or income level, but the Chi-square test will
treat the divisions between those categories exactly the same as the divisions between male and
female, or alive and dead! It's up to you to assess whether your categories make sense, and whether the
difference (for example) between age 25 and age 26 is enough to make the categories 18-25 and 26-40
meaningful. This does not mean that categories based on age are a bad idea, but only that you need to
be aware of the control you have over organizing data of that sort.
Another way to describe the Chi-square test is that it tests the null hypothesis that the variables are
independent. The test compares the observed data to a model that distributes the data according to the
expectation that the variables are independent. Wherever the observed data doesn't fit the model, the
likelihood that the variables are dependent becomes stronger, thus proving the null hypothesis
incorrect!

The following table would represent a possible input to the Chi-square test, using 2 variables to divide
the data: gender and party affiliation. 2x2 grids like this one are often the basic example for the Chi-
square test, but in actuality any size grid would work as well: 3x3, 4x2, etc.

Democrat Republican
Male 20 30
Female 30 20

This shows the basic 2x2 grid. However, this is actually incomplete, in a sense; generally, the data table
should include "marginal" information giving the total counts for each column and row, as well as for
the whole data set:

Democrat Republican Total


Male 20 30 50
Female 30 20 50
Total 50 50 100

We now have a complete data set on the distribution of 100 individuals into categories of gender
(Male/Female) and party affiliation (Democrat/Republican). A Chi-square test would allow you to test
how likely it is that gender and party affiliation are completely independent; or in other words, how
likely it is that the distribution of males and females in each party is due to chance.

So, as implied, the null hypothesis in this case would be that gender and party affiliation are
independent of one another. To test this hypothesis, we need to construct a model which estimates how
the data should be distributed if our hypothesis of independence is correct. This is where the totals we
put in the margins will become handy: later on, I'll show how you can calculate your estimated data
using the marginals. Meanwhile, however, I've constructed an example which will allow very easy
calculations. Assuming that there's a 50/50 chance of males or females being in either party, we get the
very simple distribution shown below.

Democrat Republican Total


Male 25 25 50
Female 25 25 50
Total 50 50 100

This is the information we would need to calculate the likelihood that gender and party affiliation are
independent. I will discuss the next steps in calculating a Chi square value later, but for now I'll focus
on the background information.

Note: you can assume a different null hypothesis for a Chi-square test. Using the scenario suggested
above, you could test the hypothesis that women are twice as likely to register as Democrats than men,
and a Chi-square test would tell you how likely it is that the observed data reflects that relationship
between your variables. In this case, you would simply run the test using a model of expected data built
under the assumption that this hypothesis is true, and the formula will (as before) test how well that
distribution fits the observed data. I will not discuss this in more detail, but it is important to know that
the null hypothesis is not some abstract "fact" about the test, but rather a choice you make when
calculating your model.

What is the Chi-square test NOT for?

This is also an important question to tackle, of course. Using a statistical test without having a good
idea of what it can and cannot do means that you may misuse the test, but also that you won't have a
clear grasp of what your results really mean. Even if you don't understand the detailed mathematics
underlying the test, it is not difficult to have a good comprehension of where it is or isn't appropriate to
use. I mentioned some of this above, when contrasting types of data and so on. This section will
consider other things that the Chi-square test is not meant to do.

First of all, the Chi-square test is only meant to test the probability of independence of a distribution
of data. It will NOT tell you any details about the relationship between them. If you want to calculate
how much more likely it is that a woman will be a Democrat than a man, the Chi-square test is not
going to be very helpful. However, once you have determined the probability that the two variables are
related (using the Chi-square test), you can use other methods to explore their interaction in more
detail. For a fairly simple way of discussing the relationship between variables, I recommend the odds
ratio.

Some further considerations are necessary when selecting or organizing your data to run a Chi-square
test. The variables you consider must be mutually exclusive; participation in one category should not
entail or allow participation in another. In other words, the data from all of your cells should add up to
the total count, and no item should be counted twice.

You should also never exclude some part of your data set. If your study examined males and females
registered as Republican, Democrat, and Independent, then excluding one category from the grid might
conceal critical data about the distribution of your data.

It is also important that you have enough data to perform a viable Chi-square test. If the estimated
data in any given cell is below 5, then there is not enough data to perform a Chi-square test. In a
case like this, you should research some other techniques for smaller data sets: for example, there is a
correction for the Chi-square test to use with small data sets, called the Yates correction. There are also
tests written specifically for smaller data sets, like the Fisher Exact Test.
Degrees of Freedom

A broader description of this topic can be found here.

The degrees of freedom (often abbreviated as df or d) tell you how many numbers in your grid are
actually independent. For a Chi-square grid, the degrees of freedom can be said to be the number of
cells you need to fill in before, given the totals in the margins, you can fill in the rest of the grid using a
formula. You can see the idea intended; if you have a given set of totals for each column and row, then
you don't have unlimited freedom when filling in the cells. You can only fill in a certain amount of cells
with "random" numbers before the rest just becomes dependent on making sure the cells add up to the
totals. Thus, the number of cells that can be filled in independently tell us something about the actual
amount of variation permitted by the data set.

The degrees of freedom for a Chi-square grid are equal to the number of rows minus one times the
number of columns minus one: that is, (R-1)*(C-1). In our simple 2x2 grid, the degrees of
independence are therefore (2-1)*(2-1), or 1! Note that once you have put a number into one cell of a
2x2 grid, the totals determine the rest for you.

Degrees of freedom are important in a Chi-square test because they factor into your calculations of the
probability of independence. Once you calculate a Chi-square value, you use this number and the
degrees of freedom to decide the probability, or p-value, of independence. This is the crucial result of a
Chi-square test, which means that knowing the degrees of freedom is crucial!

Building a Model of Expected Data

Earlier, I showed a simple example of observed vs. expected data, using an artificial data set on the
party affiliations of males and females. I show them again below.
Observed

Democrat Republican Total


Male 20 30 50
Female 30 20 50
Total 50 50 100

Expected (assuming independence)

Democrat Republican Total


Male 25 25 50
Female 25 25 50
Total 50 50 100
We will focus on models based on the null hypothesis that the distribution of data is due to chance --
that is, our models will reflect the expected distribution of data when that hypothesis is assumed to be
true. But as I mentioned before, the ease of dividing up this data is due to the simplicity of the
distribution I chose. How do we calculate the expected distribution of a more complicated data set?

Pass Fail Total


Attended 25 6 31
Skipped 8 15 23
Total 33 21 54

Here is the grid for an earlier example I discussed, showing how students who attended or skipped class
performed on an exam. The numbers for this example are not so clean! Fortunately, we have a formula
to guide us.

The estimated value for each cell is the total for its row multiplied by the total for its column, then
divided by the total for the table: that is, (RowTotal*ColTotal)/GridTotal. Thus, in our table above,
the expected count in cell (1,1) is (33*31)/54, or 18.94. Don't be afraid of decimals for your expected
counts; they're meant to be estimates!

I'll show a different method for notating observed versus expected counts below: the expected
frequency appears in parentheses below the observed frequency. This allows you to show all your data
in one clean table.

Pass Fail Total


25 6
Attended 31
(18.94) (12.05)
8 15
Skipped 23
(14.05) (8.94)
Total 33 21 54

We have now calculated the distribution of our totals based on the assumption that attending class will
have absolutely no effect on your test performance. Let's all hope we can prove this null hypothesis
wrong.

The Chi-square Formula

It's finally time to put our data to the test. You can find many programs that will calculate a Chi-square
value for you, and later I will show you how to do it in Excel. For now, however, let's start by trying to
understand the formula itself.
What does this mean?? Actually, it's a fairly simple relationship. The variables in this formula are not
simply symbols, but actual concepts that we've been discussing all along. O stands for the Observed
frequency. E stands for the Expected frequency. You subtract the expected count from the observed
count to find the difference between the two (also called the "residual"). You calculate the square of
that number to get rid of positive and negative values (because the squares of 5 and -5 are, of course,
both 25). Then, you divide the result by the expected frequency to normalize bigger and smaller counts
(because we don't want a formula that will give us a bigger Chi-square value just because you're
working with a bigger set of data). The huge sigma sitting in front of all that is asking for the sum of
every i for which you calculate this relationship - in other words, you calculate this for each cell in the
table, then add it all together. And that's it!

Using this formula, we find that the Chi-square value for our gender/party example is ((20-25)^2/25) +
((30-25)^2/25) + ((30-25)^2/25) + ((20-25)^2/25), or (25/25) + (25/25) + (25/25) + (25/25), or 1 + 1 +
1 + 1, which comes out to 4.

Okay, but what does THAT mean?? In a sense, not much yet. The Chi-square value serves as input
for the more interesting piece of information: the p-value. Calculating a p-value is less intuitive than a
Chi-square value, so I will not discuss the actual formula here, but simply tools to use in calculating
this data. We will need the following to get a p-value for our data:

(1) The Chi-square value.


(2) The degrees of freedom.

Once you have this information, there are a couple of methods you can use to get your p-value. For
example, charts like this one or even Javascript programs like the one on this site will take the Chi-
square value and degrees of freedom as input, and simply return a p-value. In the chart, you choose
your degrees of freedom (df) value on the left, follow along its row to the closest number to your Chi-
square value, and then check the corresponding number in the top row to see the approximate
probability ("Significance Level") for that value. The Javascript program is more direct, as you simply
input your numbers and click "calculate." Later, I will also show you how to make Excel do the work
for you.

So, for our example, we take a Chi-square value of 4 and a df of 1, which gives us a p-value of 0.0455.
This is interpreted as a 4.6% likelihood that the null hypothesis is correct. To put it best, if the
distribution of this data is due entirely to chance, then you have a 4.6% chance of finding a
discrepancy between the observed and expected distributions that is at least this extreme.

By convention, the "cutoff" point for a p-value is 0.05; anything below that can be considered a very
low probability, while anything above it is considered a reasonable probability. However, that does not
mean that we should take our 0.046 value and say, "Eureka! They're dependent!" Actually, 0.046 is so
close to 0.05 that there's really not much we can say from this example; it is teetering right on the brink
of chance. This is a very good thing to realize, because from this we discover that although the
distribution seems to have fairly clear tendencies in certain directions when you just look at it, the data
shows that it's not so unlikely that this would show up just by chance.
So, let's try our other data set, and see if attending class really does affect your exam performance.

Pass Fail Total


25 6
Attended 31
(18.94) (12.05)
8 15
Skipped 23
(14.05) (8.94)
Total 33 21 54

I'm going to skip the specific formula this time, and use the javascript program on this site to do the
calculation for me. It returns a value of 11.686. We still only have 1 degree of freedom, so our p-value
is calculated as 0.0006. In other words, if this distribution was due to chance, we would see exactly this
distribution only 0.06% of the time! A value of 0.0006 is a much lower probability than a value of 0.05.
We can thus safely say that the null hypothesis is incorrect; attending class and passing the exam are
definitely dependent on one another. (Of course, if you are testing a null hypothesis that you are
expecting to be correct, then you would want a very high p-value. The reason we want a low one in
this case is because we are trying to disprove the hypothesis that the variables are independent.)

This is all you need to know to calculate and understand Pearson's Chi-square test for independence.
It's a widely popular test because once you know the formula, it can all be done on a pocket calculator,
and then compared to simple charts to give you a probability value. You can also use this spreadsheet to
play around with all the steps of the test (spreadsheet created by Bill Labov, with some small additions
by Joel Wallenberg). The Chi-square test will prove to be a handy tool for analyzing all kinds of
relationships; once you know the basics for a 2x2 grid, expanding to a larger set of values is easy. Good
luck!

You might also like