Professional Documents
Culture Documents
by Dr. M.Supriya
Moderator:Dr.B.Aruna,M.D.(H).
Page 1
PLAN OF STUDY
NEED FOR STUDY
INTRODUCTION
APPLICATION
REQUIREMENTS
CHI-SQUARE DISTRIBUTION
CHI-SQUARE TEST
EXAMPLES
CONCLUSION
Page 2
INTRODUCTION
It is a non-parametric test.
It is useful for assessment of the
association between the discrete
data.
Page 3
DEVELOPED BY
KARL
PEARSON
Page 4
APPLICATIONS
Proportion
Association
Goodness of fit
Page 5
Page 6
Chi-Square Distribution
Page 8
Page 9
Properties of chi-square
distribution :
The mean of the distribution
is equal to the number of
degrees of freedom: =v.
The variance is equal to two
times the number of degrees
of freedom: 2= 2 *v
When the degrees of freedom
are greater than or equal to
2, the maximum value for Y
occurs when 2=v- 2.
As the degrees of freedom
increase, the chi-square
curve approaches a normal
distribution.
Page 11
PROBLEM
The Acme Battery Company has developed a
new cell phone battery. On average, the
battery lasts 60 minutes on a single
charge. The standard deviation is 4
minutes. Suppose the manufacturing
department runs a quality control test.
They randomly select 7 batteries. The
standard deviation of the selected batteries
is 6 minutes. What would be the chi-square
statistic represented by this test?
Page 12
SOLUTION
The standard deviation of the population is 4 minutes.
The standard deviation of the sample is 6 minutes.
The number of sample observations is 7.
To compute the chi-square statistic,
where 2is the chi-square statistic,nis the sample
size,sis the standard deviation of the sample, and
is the standard deviation of the population.
2= [ ( n - 1 ) * s2] / 2
2= [ ( 7 - 1 ) * 62] / 42=
13.5
Page 13
CALCULATION OF CHISQUARE:
consists of four steps:
(1)state the hypotheses
(2) formulate an analysis plan
(3) analyze sample data
(4) interpret results.
Page 14
Page 17
COMPONENTS
Significance level
Often, researchers
choosesignificance levelsequal to 0.01,
0.05, or 0.10; but any value between 0
and 1 can be used.
Test method.
Use to determine whether observed
sample frequencies differ significantly
from expected frequencies specified in
the null hypothesis.
Page 18
Page 19
DEGREES OF FREEDOM
Degrees of freedom. Thedegrees of
freedom(DF) is equal to the number of
levels (k) of the categorical variable minus 1:
DF = k - 1 .
Degrees of freedom.Thedegrees of
freedom(DF) is equal to:
DF = (r - 1) * (c - 1)
where r is the number of levels for one
catagorical variable, and c is the number of
levels for the other categorical variable.
Page 20
Test statistic
The test statistic is a chi-square
random variable (2) defined by the
following equation.
2= [ (Oi- Ei)2/ Ei]
where Oiis the observed frequency
count for theith level of the categorical
variable, and Eiis the expected
frequency count for theith level of the
categorical variable.
Page 22
P-value
The P-value is the probability of
observing a sample statistic as
extreme as the test statistic. Since
the test statistic is a chi-square, use
theChi-Square Distribution
Calculatorto assess the probability
associated with the test statistic. Use
the degrees of freedom computed
above.
Page 23
PROBABILITY LEVEL
Df
0.5
0.10
0.05
0.02
0.01
0.001
0.455
2.706
3.841
5.412
6.635
10.827
1.386
4.605
5.991
7.824
9.210
13.815
2.366
6.251
7.815
9.837
3.357
7.779
9.488
4.351
9.236
11.345 16.268
Page 24
Interpret Results
Page 25
Types of Data:
Page 26
2 x 2 Contingency
Table
Variable 2
Data type 1
Data type 2
Totals
Category 1
a+b
Category 2
c+d
b+d
a+b+c+d=
N
Total
a+c
Page 27
FORMULAE
Page 28
EXAMPLE
A drug trial on a group of animals and you
hypothesized that the animals receiving the
drug would show increased heart rates
compared to those that did not receive the
drug.
Ho: The proportion of animals whose heart
rate increased is independent of drug
treatment.
Ha: The proportion of animals whose heart
rate increased is associated with drug
treatment.
Page 29
NoHeartRat
e
Increase
Treated
36
14
50
Not treated
30
25
55
Total
66
39
105
Total
Page 30
FORMULAE
Page 31
Page 32
problem
Acme Toy Company prints baseball cards.
The company claims that 30% of the
cards are rookies, 60% veterans, and
10% are All-Stars. The cards are sold in
packages of 100.
Suppose a randomly-selected package of
cards has 50 rookies, 45 veterans, and 5
All-Stars. Is this consistent with Acme's
claim? Use a 0.05 level of significance.
Page 34
Page 35
Page 36
Page 37
DF = k - 1 = 3 - 1 = 2
(Ei) = n * pi
(E1) = 100 * 0.30 = 30
(E2) = 100 * 0.60 = 60
(E3) = 100 * 0.10 = 10
Page 38
Page 39
PROBABILITY LEVEL
Df
0.5
0.10
0.05
0.02
0.01
0.001
0.455
2.706
3.841
5.412
6.635
10.827
1.386
4.605
5.991
7.824
9.210
13.815
2.366
6.251
7.815
9.837
3.357
7.779
9.488
4.351
9.236
11.345 16.268
Page 40
Totals
10
42
52
33
15
48
Totals
43
57
100
Page 41
Observe Expecte
d
d
(O E) (O E)2
(O
E)2/ E
A-type
85
75
10
100
1.33
a-type
15
25
10
100
4.0
Total
100
100
5.33
Page 43
PROBABILITY LEVEL
Df
0.5
0.10
0.05
0.02
0.01
0.001
0.455
2.706
3.841
5.412
6.635
10.827
1.386
4.605
5.991
7.824
9.210
13.815
2.366
6.251
7.815
9.837
3.357
7.779
9.488
4.351
9.236
11.345 16.268
Page 44
Sample A
a+b+c
Sample B
d+e+f
Sample C
g+h+i
Column
Totals
a+d+g
b+e+h
c+f+i
a+b+c+d+e
+f+g+h+i=N
Page 46
PROBLEM
A public opinion poll surveyed a
simple random sample of 1000
voters. Respondents were
classified by gender (male or
female) and by voting preference
(Republican, Democrats, or
Independent).
Page 47
Voting Preferences
Republican
Indep
Demo enden
crat
t
Row total
Male
200
150
50
400
Female
250
300
50
600
Column total
450
450
100
1000
Page 48
DF = (r - 1) * (c - 1) =
(2 - 1) * (3 - 1) =
2
Page 49
Page 50
PROBABILITY LEVEL
Df
0.5
0.10
0.05
0.02
0.01
0.001
0.455
2.706
3.841
5.412
6.635
10.827
1.386
4.605
5.991
7.824
9.210
13.815
2.366
6.251
7.815
9.837
3.357
7.779
9.488
4.351
9.236
11.345 16.268
Page 52
South
Totals
America
Asia
Africa
Malaria
A
31
14
45
90
Malaria
B
53
60
Malaria
C
53
45
100
Totals
86
64
100
250
Page 53
Observed Expected
|O -E|
(O E)2/
(O E)2
E
31
30.96
0.04
0.0016
0.0000516
14
23.04
9.04
81.72
3.546
45
36.00
9.00
81.00
2.25
20.64
18.64
347.45
16.83
15.36
10.36
107.33
6.99
53
24.00
29.00
841.00
35.04
53
34.40
18.60
345.96
10.06
45
25.60
19.40
376.36
14.70
40.00
38.00
1444.00
36.10
Page 54
PROBABILITY LEVEL
Df
0.5
0.10
0.05
0.02
0.01
0.001
0.455
2.706
3.841
5.412
6.635
10.827
1.386
4.605
5.991
7.824
9.210
13.815
2.366
6.251
7.815
9.837
3.357
7.779
9.488
4.351
9.236
11.345 16.268
Page 55
Chi-Square Test of
Homogeneity
applied to a
singlecategorical
variablefrom two different
populations.
Page 56
problem
Viewing Preferences
Row total
Lone
Ranger
Sesame
Street
The
Simpsons
Boys
50
30
20
100
Girls
50
80
70
200
Column
total
100
110
90
300
Page 57
Page 58
DF = (r - 1) * (c - 1) = (2 - 1) * (3 - 1) = 2
Er,c= (nr* nc) / n
E1,1= (100 * 100) / 300 = 10000/300 = 33.3
E1,2= (100 * 110) / 300 = 11000/300 = 36.7
E1,3= (100 * 90) / 300 = 9000/300 = 30.0
E2,1= (200 * 100) / 300 = 20000/300 = 66.7
E2,2= (200 * 110) / 300 = 22000/300 = 73.3
E2,3= (200 * 90) / 300 = 18000/300 = 60.0
2= [ (Or,c- Er,c)2/ Er,c]
2= (50 - 33.3)2/33.3 + (30 - 36.7)2/36.7 + (20 - 30)2/30
+ (50 - 66.7)2/66.7 + (80 - 73.3)2/73.3 + (70 - 60)2/60
2= (16.7)2/33.3 + (-6.7)2/36.7 + (-10.0)2/30 + (-16.7)2/66.7 +
(3.3)2/73.3 + (10)2/60
2= 8.38 + 1.22 + 3.33 + 4.18 + 0.61 + 1.67 =
19.39
Page 59
Heads
Tails
Total
Observed
108
92
200
Expected
100
100
200
Total
208
192
400
Page 60
(-8)2/100 + (8)2/100 =
0.64 + 0.64 =
1.28
Page 61
(Observe
d
counts)
Colours
Red
Yellow
Green
Blue
Totals
Introvert
personal
ity
20
30
44
100
Extrover
t
personal
ity
180
34
50
36
300
Totals
200
40
80
80
400
Page 62
Page 63
(Expecte
d counts)
Colours
Red
Yellow
Green
Blue
Totals
Introvert
personali
ty
50
10
20
20
100
Extrovert
personali
ty
150
30
60
60
300
Totals
200
40
80
80
400
Page 64
Page 65
School Area
Goals | Rural Suburban Urban Total
-------------------------------------------Grades | 57
87
24
168
Popular | 50
42
6
98
Sports | 42
22
5
69
-------------------------------------------Total | 149
151
35
335
Page 66
Page 67
Page 69
Page 71
CONCLUSION
The chi-square test of significance is
useful as a tool to determine whether
or not it is worth the researcher's
effort to interpret a contingency
table.
A significant result of this test
means that the cells of a contingency
table should be interpreted.
Page 73
Page 74
bibliography
METHODS IN BIOSTATISTICSBKMAHAJAN.
METHODS IN BIOSTATISTICST.BHASKARA RAO
http://stattrek.com
Page 75