Chi Square Test

CHI-SQUARE TEST
by Dr. M.Supriya
Moderator:Dr.B.Aruna,M.D.(H).
Page 1
PLAN OF STUDY
NEED FOR STUDY
INTRODUCTION
APPLICATION
REQUIREMENTS
CHI-SQUARE DISTRIBUTION
CHI-SQUARE TEST
EXAMPLES
CONCLUSION
Page 2
INTRODUCTION
It is a non-parametric test.
It is useful for assessment of the
association between the discrete
data.
Page 3
DEVELOPED BY
KARL
PEARSON
Page 4
APPLICATIONS
Proportion
Association
Goodness of fit
Page 5
REASONS FOR CALLING IT A

DISTRIBUTION FREE STATISTICS
Rigid assumptions are not necessary in
regard to the type of population distribution.
Calculation of mean and SD are not needed. It
is based only on Df.
It is simple to understand.
Used to simple ranking of values also.
Used where data is not exact.
Used with small samples not more than 50.
Page 6
Chi-Square Test Requirements

Quantitative data.
One or more categories.
Independent observations.
Adequate sample size (at least 10).
Simple random sample.
Data in frequency form.
All observations must be used
Page 7
Chi-Square Distribution
Page 8
The distribution of the

chi-square statistic is
called the chi-square
distribution.
Page 9
The Chi-Square Statistic

select a random sample of sizenfrom
a normal population, having a standard
deviation equal to . We find that the
standard deviation in our sample is
equal tos. Given these data, we can
define astatistic, calledchi-square,
using the following equation:
2= [ ( n - 1 ) * s2] / 2
Page 10
Properties of chi-square
distribution :
The mean of the distribution
is equal to the number of
degrees of freedom: =v.
The variance is equal to two
times the number of degrees
of freedom: 2= 2 *v
When the degrees of freedom
are greater than or equal to
2, the maximum value for Y
occurs when 2=v- 2.
As the degrees of freedom
increase, the chi-square
curve approaches a normal
distribution.
Page 11
PROBLEM
The Acme Battery Company has developed a
new cell phone battery. On average, the
battery lasts 60 minutes on a single
charge. The standard deviation is 4
minutes. Suppose the manufacturing
department runs a quality control test.
They randomly select 7 batteries. The
standard deviation of the selected batteries
is 6 minutes. What would be the chi-square
statistic represented by this test?
Page 12
SOLUTION
The standard deviation of the population is 4 minutes.
The standard deviation of the sample is 6 minutes.
The number of sample observations is 7.
To compute the chi-square statistic,
where 2is the chi-square statistic,nis the sample
size,sis the standard deviation of the sample, and
is the standard deviation of the population.
2= [ ( n - 1 ) * s2] / 2
2= [ ( 7 - 1 ) * 62] / 42=
13.5
Page 13
CALCULATION OF CHISQUARE:
consists of four steps:
(1)state the hypotheses
(2) formulate an analysis plan
(3) analyze sample data
(4) interpret results.
Page 14
State the Hypotheses

Every hypothesis test requires the analyst to state anull
hypothesisand analternative hypothesis. The
hypotheses are stated in such a way that they are
mutually exclusive. That is, if one is true, the other must
be false; and vice versa.
For a chi-square , the hypotheses take the following form.
H0: The data are consistent with a specified distribution.
Ha: The data arenotconsistent with a specified
distribution.
Typically, the null hypothesis specifies the proportion of
observations at each level of the categorical variable.
The alternative hypothesis is thatat leastone of the
specified proportions is not true.
Page 15
"Thenull hypothesisin a chi-square

goodness-of-fit test states that the
sample of observed
frequenciessupports the claimabout
the expected frequencies.
Thealternative hypothesisstates that
there isno support for the
claimpertaining to the expected
frequencies."
This deviates from our normal approach
to place our expected (preferred)
outcome in the alternative hypothesis.
Just be aware of this.
Page 16
Formulate an Analysis Plan
The analysis plan describes

how to use sample data to
accept or reject the null
hypothesis.
Page 17
COMPONENTS
Significance level
Often, researchers
choosesignificance levelsequal to 0.01,
0.05, or 0.10; but any value between 0
and 1 can be used.
Test method.
Use to determine whether observed
sample frequencies differ significantly
from expected frequencies specified in
the null hypothesis.
Page 18
Analyze Sample Data
Page 19
DEGREES OF FREEDOM
Degrees of freedom. Thedegrees of
freedom(DF) is equal to the number of
levels (k) of the categorical variable minus 1:
DF = k - 1 .
Degrees of freedom.Thedegrees of
freedom(DF) is equal to:
DF = (r - 1) * (c - 1)
where r is the number of levels for one
catagorical variable, and c is the number of
levels for the other categorical variable.
Page 20
Expected frequency counts

The expected frequency counts at each
level of the categorical variable are equal to
the sample size times the hypothesized
proportion from the null hypothesis
Ei= npi
where Eiis the expected frequency count for
theith level of the categorical variable, n is
the total sample size, and piis the
hypothesized proportion of observations in
leveli.
Page 21
Test statistic
The test statistic is a chi-square
random variable (2) defined by the
following equation.
2= [ (Oi- Ei)2/ Ei]
where Oiis the observed frequency
count for theith level of the categorical
variable, and Eiis the expected
frequency count for theith level of the
categorical variable.
Page 22
P-value
The P-value is the probability of
observing a sample statistic as
extreme as the test statistic. Since
the test statistic is a chi-square, use
theChi-Square Distribution
Calculatorto assess the probability
associated with the test statistic. Use
the degrees of freedom computed
above.
Page 23
probability level (alpha)
PROBABILITY LEVEL
Df
0.5
0.10
0.05
0.02
0.01
0.001
0.455
2.706
3.841
5.412
6.635
10.827
1.386
4.605
5.991
7.824
9.210
13.815
2.366
6.251
7.815
9.837
3.357
7.779
9.488
4.351
9.236
11.345 16.268
11.668 13.277 18.465
11.070 13.388 15.086 20.517
Page 24
Interpret Results
Reject the Hoif

calculated X2
critical X2
Page 25
Types of Data:
Page 26
2 x 2 Contingency
Table
Variable 2
Data type 1
Data type 2
Totals
Category 1
a+b
Category 2
c+d
b+d
a+b+c+d=
N
Total
a+c
Page 27
FORMULAE
Page 28
EXAMPLE
A drug trial on a group of animals and you
hypothesized that the animals receiving the
drug would show increased heart rates
compared to those that did not receive the
drug.
Ho: The proportion of animals whose heart
rate increased is independent of drug
treatment.
Ha: The proportion of animals whose heart
rate increased is associated with drug
treatment.
Page 29
Hypothetical drug trial results.

HeartRate
Increased
NoHeartRat
e
Increase
Treated
36
14
50
Not treated
30
25
55
Total
66
39
105
Total
Page 30
FORMULAE
Page 31
Chi square = 105 [(36)(25) - (14)(30)] 2/ (50)(55)(39)

(66) =
3.418
Page 32
Chi Square-Goodness of Fit

This test allows us to compare a
collection of categorical data with
some theoretical expected
distribution.
This test is often used in genetics to
compare the results of a cross with
the theoretical distribution based on
genetic theory.
Page 33
problem
Acme Toy Company prints baseball cards.
The company claims that 30% of the
cards are rookies, 60% veterans, and
10% are All-Stars. The cards are sold in
packages of 100.
Suppose a randomly-selected package of
cards has 50 rookies, 45 veterans, and 5
All-Stars. Is this consistent with Acme's
claim? Use a 0.05 level of significance.
Page 34
State the hypotheses

Null hypothesis: The proportion of rookies,
veterans, and All-Stars is 30%, 60% and 10%,
respectively.
Alternative hypothesis: At least one of the

proportions in the null hypothesis is false
Page 35
Formulate an analysis plan
the significance level is 0.05
Page 36
Analyze Sample Data
Page 37
DF = k - 1 = 3 - 1 = 2
(Ei) = n * pi
(E1) = 100 * 0.30 = 30
(E2) = 100 * 0.60 = 60
(E3) = 100 * 0.10 = 10
Page 38
2= [ (Oi- Ei)2/ Ei]
2= [ (50 - 30)2/ 30 ] + [ (45 - 60)2/ 60 ] + [ (5 10)2/ 10 ]
2= (400 / 30) + (225 / 60) + (25 /

10) =
13.33 + 3.75 + 2.50
= 19.58
Page 39
PROBABILITY LEVEL
Df
0.5
0.10
0.05
0.02
0.01
0.001
0.455
2.706
3.841
5.412
6.635
10.827
1.386
4.605
5.991
7.824
9.210
13.815
2.366
6.251
7.815
9.837
3.357
7.779
9.488
4.351
9.236
11.345 16.268
11.668 13.277 18.465
11.070 13.388 15.086 20.517
Page 40
Results of a monohybrid cross between two

heterozygotes for the 'a' gene.
Totals
10
42
52
33
15
48
Totals
43
57
100
Page 41
The penotypic ratio 85 of the A type

and 15 of the a-type (homozygous
recessive). In a monohybrid cross
between two heterozygotes,
however, we would have predicted
a 3:1 ratio of phenotypes.
In other words, we would
have expected to get 75 A-type
and 25 a-type. Are or results
different?
Page 42
Observe Expecte
d
d
(O E) (O E)2
(O
E)2/ E
A-type
85
75
10
100
1.33
a-type
15
25
10
100
4.0
Total
100
100
5.33
Page 43
PROBABILITY LEVEL
Df
0.5
0.10
0.05
0.02
0.01
0.001
0.455
2.706
3.841
5.412
6.635
10.827
1.386
4.605
5.991
7.824
9.210
13.815
2.366
6.251
7.815
9.837
3.357
7.779
9.488
4.351
9.236
11.345 16.268
11.668 13.277 18.465
11.070 13.388 15.086 20.517
Page 44
Chi Square Test of

Independence
applied when you have
twocategorical
variablesfrom a single
population.
Page 45
Category I Category II Category III Row Totals
Sample A
a+b+c
Sample B
d+e+f
Sample C
g+h+i
Column
Totals
a+d+g
b+e+h
c+f+i
a+b+c+d+e
+f+g+h+i=N
Page 46
PROBLEM
A public opinion poll surveyed a
simple random sample of 1000
voters. Respondents were
classified by gender (male or
female) and by voting preference
(Republican, Democrats, or
Independent).
Page 47
Voting Preferences
Republican
Indep
Demo enden
crat
t
Row total
Male
200
150
50
400
Female
250
300
50
600
Column total
450
450
100
1000
Page 48
DF = (r - 1) * (c - 1) =
(2 - 1) * (3 - 1) =
2
Page 49
Er,c= (nr* nc) / n

E1,1= (400 * 450) / 1000 =
180000/1000 =
180
E1,2= (400 * 450) / 1000 =
180000/1000 =
180
E1,3= (400 * 100) / 1000 =
40000/1000 =
40
E2,1= (600 * 450) / 1000 =
270000/1000 =
270
E2,2= (600 * 450) / 1000 =
270000/1000 =
270
E2,3= (600 * 100) / 1000 =
60000/1000 =
Page 50
2= [ (Or,c- Er,c)2/ Er,c]

2= (200 - 180)2/180 + (150 - 180)2/180 + (50 - 40)2/40
+ (250 - 270)2/270 + (300 - 270)2/270 + (50 - 60)2/40
2= 400/180 + 900/180 + 100/40 + 400/270 + 900/270
+ 100/60
2= 2.22 + 5.00 + 2.50 + 1.48 + 3.33 + 1.67 =
16.2
Page 51
PROBABILITY LEVEL
Df
0.5
0.10
0.05
0.02
0.01
0.001
0.455
2.706
3.841
5.412
6.635
10.827
1.386
4.605
5.991
7.824
9.210
13.815
2.366
6.251
7.815
9.837
3.357
7.779
9.488
4.351
9.236
11.345 16.268
11.668 13.277 18.465
11.070 13.388 15.086 20.517
Page 52
South
Totals
America
Asia
Africa
Malaria
A
31
14
45
90
Malaria
B
53
60
Malaria
C
53
45
100
Totals
86
64
100
250
Page 53
Observed Expected
|O -E|
(O E)2/
(O E)2
E
31
30.96
0.04
0.0016
0.0000516
14
23.04
9.04
81.72
3.546
45
36.00
9.00
81.00
2.25
20.64
18.64
347.45
16.83
15.36
10.36
107.33
6.99
53
24.00
29.00
841.00
35.04
53
34.40
18.60
345.96
10.06
45
25.60
19.40
376.36
14.70
40.00
38.00
1444.00
36.10
Page 54
PROBABILITY LEVEL
Df
0.5
0.10
0.05
0.02
0.01
0.001
0.455
2.706
3.841
5.412
6.635
10.827
1.386
4.605
5.991
7.824
9.210
13.815
2.366
6.251
7.815
9.837
3.357
7.779
9.488
4.351
9.236
11.345 16.268
11.668 13.277 18.465
11.070 13.388 15.086 20.517
Page 55
Chi-Square Test of
Homogeneity
applied to a
singlecategorical
variablefrom two different
populations.
Page 56
problem
Viewing Preferences
Row total
Lone
Ranger
Sesame
Street
The
Simpsons
Boys
50
30
20
100
Girls
50
80
70
200
Column
total
100
110
90
300
Page 57
Null hypothesis: The null hypothesis states that

the proportion of boys who prefer the Lone
Ranger is identical to the proportion of girls.
Similarly, for the other programs. Thus,
H0: Pboys who prefer Lone Ranger= Pgirls who prefer Lone Ranger
H0: Pboys who prefer Sesame Street= Pgirls who prefer Sesame Street
H0: Pboys who prefer The Simpsons= Pgirls who prefer The Simpsons

Alternative hypothesis: At least one of the null
hypothesis statements is false.
Page 58
DF = (r - 1) * (c - 1) = (2 - 1) * (3 - 1) = 2
Er,c= (nr* nc) / n
E1,1= (100 * 100) / 300 = 10000/300 = 33.3
E1,2= (100 * 110) / 300 = 11000/300 = 36.7
E1,3= (100 * 90) / 300 = 9000/300 = 30.0
E2,1= (200 * 100) / 300 = 20000/300 = 66.7
E2,2= (200 * 110) / 300 = 22000/300 = 73.3
E2,3= (200 * 90) / 300 = 18000/300 = 60.0
2= [ (Or,c- Er,c)2/ Er,c]
2= (50 - 33.3)2/33.3 + (30 - 36.7)2/36.7 + (20 - 30)2/30
+ (50 - 66.7)2/66.7 + (80 - 73.3)2/73.3 + (70 - 60)2/60
2= (16.7)2/33.3 + (-6.7)2/36.7 + (-10.0)2/30 + (-16.7)2/66.7 +
(3.3)2/73.3 + (10)2/60
2= 8.38 + 1.22 + 3.33 + 4.18 + 0.61 + 1.67 =
19.39
Page 59
Heads
Tails
Total
Observed
108
92
200
Expected
100
100
200
Total
208
192
400
Page 60
Chi-squared = (100-108)2/100 + (100-92)2/100 =
(-8)2/100 + (8)2/100 =
0.64 + 0.64 =
1.28
Page 61
(Observe
d
counts)
Colours
Red
Yellow
Green
Blue
Totals
Introvert
personal
ity
20
30
44
100
Extrover
t
personal
ity
180
34
50
36
300
Totals
200
40
80
80
400
Page 62
H0: Colour preference is not associated

with personality, and
H1: Colour preference is associated with
personality
Page 63
(Expecte
d counts)
Colours
Red
Yellow
Green
Blue
Totals
Introvert
personali
ty
50
10
20
20
100
Extrovert
personali
ty
150
30
60
60
300
Totals
200
40
80
80
400
Page 64
The chi-squared test statistic is 71.20
Page 65
School Area
Goals | Rural Suburban Urban Total
-------------------------------------------Grades | 57
87
24
168
Popular | 50
42
6
98
Sports | 42
22
5
69
-------------------------------------------Total | 149
151
35
335
Page 66
Barplots comparing the percentages of students' choices

by school area appear below:
Page 67
H0assumes that there is no

association between the
variables (in other words, one
variable does not vary
according to the other
variable), while the alternative
hypothesisHaclaims that
some association does exist
Page 68
below observed counts

Rural Suburban Urban
Total
1
57
87
24
168
74.72 75.73 17.55
2
50
42
6
98
43.59 44.17 10.24
3
42
22
5
69
30.69 31.10
7.21
Total
149
151
35
335
Page 69
Chi-Sq = 4.203 + 1.679

+ 2.369 +
0.943 + 0.107 +
1.755 +
4.168 + 2.663 +
0.677 = 18.564
DF = 4, P-Value = 0.001
Page 70
Applications of the X2 Statistic in

Epidemiology
Cohort study (2 samples)

Case-control study (2 samples)
Matched case-control study
(paired cases and controls)
Page 71
The chi-squared statistic provides a

test of the association between two
or more groups, populations, or
criteria
The chi-square test can be used to
test the strength of the association
between exposure and disease in a
cohort study, an unmatched casecontrol study, or a cross-sectional
study
Page 72
CONCLUSION
The chi-square test of significance is
useful as a tool to determine whether
or not it is worth the researcher's
effort to interpret a contingency
table.
A significant result of this test
means that the cells of a contingency
table should be interpreted.
Page 73
A non-significant test means that no

effects were discovered and chance
could explain the observed
differences in the cells. In this case,
an interpretation of the cell
frequencies is not useful.
Page 74
bibliography
METHODS IN BIOSTATISTICSBKMAHAJAN.
METHODS IN BIOSTATISTICST.BHASKARA RAO
http://stattrek.com
Page 75

Chi Square Test

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chi Square Test

Uploaded by

Copyright:

Available Formats

CHI-SQUARE TEST

REASONS FOR CALLING IT A

Chi-Square Test Requirements

The distribution of the

The Chi-Square Statistic

State the Hypotheses

"Thenull hypothesisin a chi-square

Formulate an Analysis Plan

The analysis plan describes

Analyze Sample Data

Expected frequency counts

probability level (alpha)

11.668 13.277 18.465

11.070 13.388 15.086 20.517

Reject the Hoif

Hypothetical drug trial results.

Chi square = 105 [(36)(25) - (14)(30)] 2/ (50)(55)(39)

Chi Square-Goodness of Fit

State the hypotheses

Alternative hypothesis: At least one of the

Formulate an analysis plan

the significance level is 0.05

Analyze Sample Data

2= [ (Oi- Ei)2/ Ei]

2= [ (50 - 30)2/ 30 ] + [ (45 - 60)2/ 60 ] + [ (5 10)2/ 10 ]

2= (400 / 30) + (225 / 60) + (25 /

probability level (alpha)

11.668 13.277 18.465

11.070 13.388 15.086 20.517

Results of a monohybrid cross between two

The penotypic ratio 85 of the A type

probability level (alpha)

11.668 13.277 18.465

11.070 13.388 15.086 20.517

Chi Square Test of

Category I Category II Category III Row Totals

Er,c= (nr* nc) / n

2= [ (Or,c- Er,c)2/ Er,c]

probability level (alpha)

11.668 13.277 18.465

11.070 13.388 15.086 20.517

probability level (alpha)

11.668 13.277 18.465

11.070 13.388 15.086 20.517

Null hypothesis: The null hypothesis states that

Chi-squared = (100-108)2/100 + (100-92)2/100 =

H0: Colour preference is not associated

The chi-squared test statistic is 71.20

Barplots comparing the percentages of students' choices

H0assumes that there is no

below observed counts

Chi-Sq = 4.203 + 1.679

Applications of the X2 Statistic in

Cohort study (2 samples)

The chi-squared statistic provides a

A non-significant test means that no

You might also like