You are on page 1of 36

Two-Way Tables.

ANOVA

LEARNING GOAL
Interpret and carry out hypothesis tests for
independence of variables with data organized in twoway tables.
Interpret and carry out hypothesis tests using the
method of one-way analysis of variance.

Copyright 2009 Pearson Education, Inc.

Identifying the Hypotheses with Two


Variables
Suppose that administrators at a college are concerned that
there may be bias in the way degrees are awarded to men and
women in different departments. They therefore collect data
on the number of degrees awarded to men and women in
different departments.
These data concern two variables: major and gender.
To test whether there is bias in the awarding of degrees, the
administrators ask the following question:
Do the data suggest a relationship between
the two variables?
Copyright 2009 Pearson Education, Inc.

Slide 10.2- 2

Null and Alternative Hypotheses with Two


Variables
The null hypothesis, H0, states that the variables are
independent (there is no relationship between them).
The alternative hypothesis, Ha, states that there is a
relationship between the two variables.

Copyright 2009 Pearson Education, Inc.

Slide 10.2- 3

Displaying the Data in Two-Way


Tables
With the hypotheses identified, the next step in the hypothesis
test is to examine the data set to see if it supports rejecting or
not rejecting the null hypothesis.
We can display the data efficiently with a two-way table
(also called a contingency table), so named because it
displays two variables.

Copyright 2009 Pearson Education, Inc.

Slide 10.2- 4

Note: One variable is displayed along the columns and the other along the
rows. Here, there are only two rows because gender can be only either
male or female. There are many columns for the majors, with just the first
few shown here.

Copyright 2009 Pearson Education, Inc.

Slide 10.2- 5

Two-Way Tables
A two-way table shows the relationship between two
variables by listing one variable in the rows and the
other variable in the columns.
The entries in the tables cells are called frequencies (or
counts).

Copyright 2009 Pearson Education, Inc.

Slide 10.2- 6

Here, to simplify the calculations, lets focus on just two


majors, biology and business.
Does a persons gender influence whether he or she chooses
to major in biology or business?
Table 10.3 shows the biology and business data extracted
from Table 10.2, along with row and column totals.

Copyright 2009 Pearson Education, Inc.

Slide 10.2- 7

EXAMPLE 1 A Two-Way Table for a Survey


Table 10.4 shows the results of a pre-election survey on gun
control. Use the table to answer the following questions.

a. Identify the two variables displayed in the table.


b. What percentage of Democrats favored stricter laws?
c. What percentage of all voters favored stricter laws?
d. What percentage of those who opposed stricter laws are
Republicans?
Copyright 2009 Pearson Education, Inc.

Slide 10.2- 8

EXAMPLE 1 A Two-Way Table for a Survey

Solution: Note that the total of the row totals and the total of the
column totals are equal.
a. The rows show the variable survey response, which can be
either favor stricter laws, oppose stricter laws, or
undecided. The columns show the variable party affiliation,
which in this table can be either Democrat or Republican.
b. Of the 622 Democrats polled, 456 favored stricter laws. The
percentage of Democrats favoring stricter laws is 456/622 =
0.733, or 73.3%.
Copyright 2009 Pearson Education, Inc.

Slide 10.2- 9

EXAMPLE 1 A Two-Way Table for a Survey

Solution: (cont.)
c. Of the 1,421 people polled, 788 favored stricter laws. The
percentage of all respondents favoring stricter laws is 788/1,421 =
0.555, or 55.5%.
d. Of the 569 people polled who opposed stricter laws, 446 are
Republicans. Since 446/569 = 0.783, 78.3% of those opposed to
stricter laws are Republicans.

Copyright 2009 Pearson Education, Inc.

Slide 10.2- 10

Carrying Out the Hypothesis Test


The basic idea of the hypothesis test is the same as always
to decide whether the data provide enough evidence to reject
the null hypothesis.
For the case of a test with a two-way table, the specific steps
are as follows:
As always, we start by assuming that the null hypothesis is
true, meaning there is no relationship between the two
variables. In that case, we would expect the frequencies (the
numbers in the individual cells) in the two-way table to be
those that would occur by pure chance.
Our first step, then, is to find a way to calculate the
frequencies we would expect by chance.
Copyright 2009 Pearson Education, Inc.

Slide 10.2- 11

Carrying Out the Hypothesis Test


(cont.)
We next compare the frequencies expected by chance to the
observed frequencies from the sample, which are the
frequencies displayed in the table.
We do this by calculating something called the chi-square
statistic (pronounced ky-square) for the sample data,
which here plays a role similar to the role of the standard
score z in the hypothesis tests we carried out in Chapter 9 or
the role of the t test statistic in Section 10.1.

Copyright 2009 Pearson Education, Inc.

Slide 10.2- 12

Carrying Out the Hypothesis Test


(cont.)
Recall that for the hypothesis tests in Chapter 9, we made
the decision about whether to reject or not reject the null
hypothesis by comparing the computed value of the
standard score for the sample data to critical values given in
tables; similarly, in Section 10.1 we compared computed
values of the t test statistic to values found in a table.
Here, we do the same thing, except rather than using critical
values for the standard score or t, we use critical values for
the chi-square statistic.

Copyright 2009 Pearson Education, Inc.

Slide 10.2- 13

Carrying Out the Hypothesis Test


Finding the Frequencies Expected by Chance
As an example of the process,
lets work through these steps
with the data in Table 10.3.
Our first step is to find the
frequencies we would expect in
Table 10.3 if there were no relationship between the variables,
which is equivalent to the frequency expected by chance alone.
Lets start by finding the frequency we would expect by chance
for male business majors.

Copyright 2009 Pearson Education, Inc.

Slide 10.2- 14

To do this, we first calculate the fraction of all students in the


sample who received business degrees:
total business degree
197
=
total degrees
250
As discussed in Chapter 6, we
can interpret this result as a
relative frequency probability.
That is, if we select a student at
random from the sample, the
probability that he or she earned a business degree is 197/250.
Using the notation for probability, we write
P(business) =

197
250

Copyright 2009 Pearson Education, Inc.

Slide 10.2- 15

Similarly, if we select a student at random from the sample, the


probability that this student is a man is
108
P(man) =
250
Recall from Section 6.5 that if
two events A and B are
independent (the outcome of
one does not affect the
probability of the other), then
P(A and B) = P(A) P(B)
We can apply this rule to determine the probability that a student
is both a man and a business major (assuming the null hypothesis
that gender is independent of major):
P(man and business) =
108 197
P(man) P(business) =

0.3404
250 250
Copyright 2009 Pearson Education, Inc.

Slide 10.2- 16

This probability is equivalent to the fraction of the total students


whom we expect to be male business majors if there is no
relationship between gender and major.
We therefore multiply this
probability by the total number of
students in the sample (250) to
find the number (or frequency) of
male business majors that we
expect by chance:
197
108

250 85.104
250
250
We call this value the expected frequency for the number of
male business majors.
Copyright 2009 Pearson Education, Inc.

Slide 10.2- 17

Definition
The expected frequencies in a two-way table
are the frequencies we would expect by chance if
there were no relationship between the row and
column variables.

Copyright 2009 Pearson Education, Inc.

Slide 10.2- 18

EXAMPLE 2 Expected Frequencies for Table 10.3


Find the frequencies expected
by chance for female business
majors in Table 10.3.
Solution:
P(woman and business) = P(woman) P(business) =
197
142

0.4476
250
250
We now find the expected frequency by multiplying the cell
probability by the total number of students (250):
142 197
250

111.896
250 250
Copyright 2009 Pearson Education, Inc.

Slide 10.2- 19

Solution: (cont.)
The calculations for men
biology majors and women
biology majors are shown
below.
Expected frequency of men biology majors =
53
108
250

22.896
250 250
Expected frequency of women biology majors =
53
142
250

30.104
250 250

Copyright 2009 Pearson Education, Inc.

Slide 10.2- 20

Table 10.5 repeats the data from Table 10.3, but this time it also
shows the expected frequency for each cell (in parentheses).
To check that
we did our work
correctly, we
confirm that the
total of all four
expected
frequencies equals the total of 250 students in the sample:
85.104 + 111.896 + 22.896 + 30.104 = 250.000
Notice also that the values in the Total row and Total column
are the same for both the observed frequencies and the frequencies
expected by chance. This should always be the case, providing
another good check on your work.
Copyright 2009 Pearson Education, Inc.

Slide 10.2- 21

Carrying Out the Hypothesis Test


Finding the Frequencies Expected by Chance
Computing the Chi-Square Statistic

Copyright 2009 Pearson Education, Inc.

Slide 10.2- 22

Finding the Chi-Square Statistic


Step 1. For each cell in the two-way table, identify O as
the observed frequency and E as the expected
frequency if the null hypothesis is true (no
relationship between the variables).
Step 2. Compute the value (O - E)2/E for each cell.
Step 3. Sum the values from step 2 to get the chi-square
statistic:
2
(O
E)
2 = sum of all values
E
The larger the value of 2, the greater the average
difference between the observed and expected
frequencies in the cells.
Copyright 2009 Pearson Education, Inc.

Slide 10.2- 23

To do this calculation in an organized way, its best to make a


table such as Table 10.6, with a row for each of the cells in
the original two-way table. As shown in the lower right cell,
the result for the gender/major data is 2 = 0.350.

Copyright 2009 Pearson Education, Inc.

Slide 10.2- 24

Carrying Out the Hypothesis Test


Finding the Frequencies Expected by Chance
Computing the Chi-Square Statistic
Making the Decision
The value of 2 gives us a way of testing the null hypothesis of
no relationship between the variables.
If 2 is small, then the average difference between the observed
and expected frequencies is small and we should not reject the
null hypothesis.
If 2 is large, then the average difference between the observed
and expected frequencies is large and we have reason to reject
the null hypothesis of independence.
Copyright 2009 Pearson Education, Inc.

Slide 10.2- 25

To quantify what we mean by small or large, we compare


the 2 value found for the sample data to critical values:

If the calculated value of 2 is less than the critical value, the


differences between the observed and expected values are
small and there is not enough evidence to reject the null
hypothesis.

If the calculated value of 2 is greater than or equal to the


critical value, then there is enough evidence in the sample to
reject the null hypothesis (at the given level of significance).

Copyright 2009 Pearson Education, Inc.

Slide 10.2- 26

Table 10.7 gives the critical values of 2 for two significance


levels, 0.05 and 0.01. Notice that the critical values differ for
different table sizes, so you must make sure you read the
critical values for a data set from the appropriate table size row.

Copyright 2009 Pearson Education, Inc.

Slide 10.2- 27

For the gender/major data we have been studying in Tables 10.3


(slide 7) and 10.5 (slide 22), there are two rows and two
columns (do not count the total rows or columns), which
means a table size of 2 2.
Looking in the first row of Table 10.7 (previous slide), we see
that the critical value of 2 for significance at the 0.05 level is
3.841.
The chi-square value that we found for the gender/major data is
2 = 0.350; because this is less than the critical value of 3.841,
we cannot reject the null hypothesis.
Of course, failing to reject the null hypothesis does not prove
that major and gender are independent. It simply means that
we do not have enough evidence to justify rejecting the null
hypothesis of independence.
Copyright 2009 Pearson Education, Inc.

Slide 10.2- 28

EXAMPLE 3 Vitamin C Test


A (hypothetical) study seeks to determine whether vitamin
C has an effect in preventing colds. Among a sample of
220 people, 105 randomly selected people took a vitamin
C pill daily for a period of 10 weeks and the remaining 115
people took a placebo daily for 10 weeks. At the end of 10
weeks, the number of people who got colds was recorded.
Table 10.8 summarizes the results. Determine whether
there is a relationship between taking vitamin C and
getting colds.

Copyright 2009 Pearson Education, Inc.

Slide 10.2- 29

Solution: We begin by stating the null and alternative hypotheses.


H0 (null hypothesis): There is no relationship between taking
vitamin C and getting colds; that is, vitamin C has no more
effect on colds than the placebo.
Ha (alternative hypothesis): There is a relationship between taking
vitamin C and getting colds; that is, the numbers of colds in
the two groups are not what we would expect if vitamin C and
the placebo were equally effective (or equally ineffective).
As always, we assume that the null hypothesis is true and
calculate the expected frequency for each cell in the table.
Noting that the sample size is 220 and proceeding as in Example
2, we find the following expected frequencies (next slide):

Copyright 2009 Pearson Education, Inc.

Slide 10.2- 30

Solution: (cont.)
Vitamin C and cold:

220

Vitamin C and no cold:

220

Placebo and cold:

220

Placebo and no cold:

220

105
220
105
220
115
220
115
220

120
220
100
220
120
220
100
220

= 57.273
= 47.727
= 62.727
= 52.273

Table 10.9 shows the two-way table with the expected


frequencies in parentheses.

Copyright 2009 Pearson Education, Inc.

Slide 10.2- 31

Solution: (cont.)
We now compute the chi-square statistic for the sample data.
Table 10.10 shows how we organize the work; you should
confirm all the calculations shown.

Copyright 2009 Pearson Education, Inc.

Slide 10.2- 32

To make the decision about whether to reject the null hypothesis,


we compare the value of chi-square for the sample data, 2 =
11.069, to the critical values from Table 10.7.
We look in the row for a table size of 2 2, because the original
data in Table 10.8 have two rows and two columns (not counting
the total values). We see that the critical value of 2 for
significance at the 0.01 level is 6.635.
Because our sample value of 2 = 11.069 is greater than this
critical value, we reject the null hypothesis and conclude that
there is a relationship between vitamin C and colds.
That is, based on the data from this sample, there is reason to
believe that vitamin C does have more effect on colds than a
placebo.
Copyright 2009 Pearson Education, Inc.

Slide 10.2- 33

Copyright 2009 Pearson Education, Inc.

Slide 1.1- 34

Copyright 2009 Pearson Education, Inc.

Slide 1.1- 35

Copyright 2009 Pearson Education, Inc.

Slide 1.1- 36

You might also like