Professional Documents
Culture Documents
454
single sample. Since the Kolmogorov-Smirnovtest for two independent samples represents
a nonparametric alternative to the t test for two independent samples (Test ll), the most
common situation in which a researcher might elect to employ the Kolmogorov-Smirnovtest
to evaluate a hypothesis about two independent samples (where the dependent variable represents intervallratiomeasurement) is when there is reason to believe that the normality and/or
homogeneity of variance assumption of the t test have been saliently violated. The
Kolmogorov-Smirnov test for two independent samples is based on the following
assumptions: a) All of the observations in the two samples are randomly selected and
independent of one another; and b) The scale of measurement is at least ordinal.
11. Example
Example 13.1 is identical to Examples 1 1.1112.1 (which are evaluated with the t test for two
independent samples and the Mann-Whitney U test (Test 12)).
Example 13.1 In order to assess the efficacy of a new antidepressant drug, ten clinically
depressedpatients are randomly assigned to one of two groups. Five patients are assigned to
Group 1, which is administered the antidepressant drug for a period of six months. The other
five patients are assigned to Group 2, which is administered a placebo during the same sixmonth period. Assume thatprior to introducing the experimental treatments, the experimenter
confirmed that the level of depression in the two groups was equal. After six months elapse all
ten subjects are rated by apsychiatrist (who is blind with respect to a subject's experimental
condition) on their level of depression. The psychiatrist's depression ratings for the five
subjects in each groupfollow (the higher the rating, the more depressed a subject): Group 1:
11, 1,0,2,0; Group 2: 1 1, 11, 5, 8,4. Do the data indicate that the antidepressant drug is
effective?
HO:F , O
(The distribution of data in the population that Sample 1 is derived from is consistent with the
distribution of data in the population that Sample 2 is derived from. Another way of stating
the null hypothesis is as follows: At no point is the greatest vertical distance between the cumulative probability distribution for Sample 1 (which is assumed to be the best estimate of the
cumulative probability distribution ofthe population from which Sample 1 is derived) and the
cumulative probability distribution for Sample 2 (which is assumed to be the best estimate ofthe
cumulative probability distribution of the population from which Sample 2 is derived) larger
than what would be expected by chance, if the two samples are derived from the same
population.)
Copyright 2004 by Chapman & Hal/CRC
Test 13
Alternative hypothesis
(The distribution of data in the population that Sample 1 is derived from is not consistent with
the distribution of data in the population that Sample 2 is derived from. Another way of stating
this alternative hypothesis is as follows: There is at least one point where the greatest vertical
distance between the cumulative probability distribution for Sample 1 (which is assumed to be
the best estimate ofthe cumulative probability distribution ofthe population from which Sample
1 is derived) and the cumulative probability distribution for Sample 2 (which is assumed to be
the best estimate ofthe cumulative probability distribution ofthe population from which Sample
2 is derived) is larger than what would be expected by chance, if the two samples are derived
from the same population. At the point of maximum deviation separating the two cumulative
probability distributions, the cumulative probability for Sample 1 is either significantly greater
or less than the cumulative probability for Sample 2. This is a nondirectional alternative hypothesis and it is evaluated with a two-tailed test.)
456
ix,1
S,(X)
iX*)
s,0
s,tX) - s,o
0,o
1
2
215 = .40
315 = -60
415 = .80
415 = .SO
415 = . 80
415 = .SO
515 = 1.OO
0
0
0
4
5
8
11,11
115 = .20
215 = .40
315 = .60
515=1.00
11
.40- 0 =.40
.60 - 0 =.60
30- 0 = M = M
.SO - .20 = .60
.SO- .40 = .40
.SO - .60 = .20
1.00-1.00=.00
The values represented in the columns of Table 13.1 are summarized below.
The values of the psychiatrist's depression ratings for the subjects in Group 1 are recorded
in Column A. Note that there are five scores recorded in Column A, and that if the same score
is assigned to more than one subject in Group 1, each of the scores of that value is recorded in
the same row in Column A.
Each value in Column B represents the cumulative proportion associated with the value
of the X score recorded in Column A. The notation S,(X) is commonly employed to represent
the cumulative proportions for GrouplSample 1 recorded in Column B. The value in Column
B for any row is obtained as follows: a) The Group 1 cumulative frequency for the score in that
row (i.e., the frequencyof occurrence of all scores in Group 1 equal to or less than the score in
that row) is divided by the total number of scores in Group 1 (n, = 5). To illustrate, in the case
of Row 1,the score 0 is recorded twice in Column A. Thus, the cumulative frequency is equal
to 2, since there are 2 scores in Group 1 that are equal to 0 (a depression rating score cannot be
less than 0). Thus, the cumulative frequency 2 is divided by n, = 5, yielding 215 = .40. The
value .40 in Column B represents the cumulative proportion in Group 1 associated with a score
of 0. It means that the proportion of scores in Group 1 that is equal to 0 is .40. The proportion
of scores in Group 1 that is larger than 0 is .60 (since 1 - -40 = .60). In the case of Row 2, the
score 1 is recorded in Column A. The cumulative frequency is equal to 3, since there are 3
scores in Group 1 that are equal to or less than 1 (2 scores of 0 and a score of 1). Thus, the
cumulative frequency 3 is divided by nl = 5, yielding 315 = .60. The value .60 in Column B
represents the cumulative proportion in Group 1 associated with a score of 1. It means that the
proportion of scores in Group 1 that is equal to or less than 1 is .60. The proportion of scores
in Group 1 that is larger than 1 is .40 (since 1 - .60 = .40). In the case of Row 3, the score 2
is recorded in Column A. The cumulative frequency is equal to 4, since there are 4 scores in
Group 1 that are equal to or less than 2 (two scores of 0, a score of 1, and a score of 2). Thus,
the cumulative frequency 4 is divided by n, = 5, yielding 415 =.80. The value -80 in Column
B represents the cumulative proportion in Group 1 associated with a score of 2. It means that
the proportion of scores in Group 1 that is equal to or less than 2 is .80. The proportion of
scores in Group 1 that is larger than 2 is .20 (since 1 - .80 = .20). Note that the value of the
cumulative proportion in Column B remains .8 in Rows 4,5, and 6, since until a new score is
Copyright 2004 by Chapman & Hal/CRC
Test 13
recorded in Column A, the cumulative proportion recorded in Column B will remain the same.
In the case of Row 7, the score 11 is recorded in Column A. The cumulative frequency is equal
to 5, since there are 5 scores in Group 1 that are equal to or less than 11 (i.e., all of the scores
in Group 1 are equal to or less than 11). Thus, the cumulativefrequency 5 is divided by n, = 5 ,
yielding 515 = 1. The value 1 in Column B represents the cumulative proportion in Group 1
associated with a score of 11. It means that the proportion of scores in Group 1 that is equal to
or less than 11 is 1. The proportion of scores in Group 1 that is larger than 11 is 0 (since 1 1 = 0).
The values ofthe psychiatrist's depression ratings for the subjects in Group 2 are recorded
in Column C. Note that there are five scores recorded in Column C, and if the same score is
assigned to more than one subject in Group 2, each of the scores of that value is recorded in the
same row in Column C.
Each value in Column D represents the cumulative proportion associated with the value
of the X score recorded in Column C. The notation S2(X) is commonly employed to represent
the cumulative proportions for GroupISample 2 recorded in Column D. The value in Column
D for any row is obtained as follows: a) The Group 2 cumulative frequency for the score in that
row (i.e., the frequency of occurrence of all scores in Group 2 equal to or less than the score in
that row) is divided by the total number of scores in Group 2 (n, = 5). To illustrate, in the case
of Rows 1,2, and 3, no score is recorded in Column C. Thus, the cumulative frequencies for
each of those rows are equal to 0, since up to that point in the analysis there are no scores
recorded for Group 2. Consequently, for each of the first three rows, the cumulative frequency
0 is divided by n, = 5, yielding 015 = 0. In each of the first three rows, the value 0 in Column
D represents the cumulative proportion for Group 2 up to that point in the analysis. For each
of those rows, the proportion of scores in Group 2 that remain to be analyzed is 1 (since 1 - 0
= 1). In the case of Row 4, the score 4 is recorded in Column C. The cumulative frequency is
equal to 1, since there is 1 score in Group 2 that is equal to or less than 4 (i.e., the score 4 in that
row). Thus, the cumulative frequency 1 is divided by n2 = 5, yielding 115 =.20. The value .20
in Column D represents the cumulative proportion in Group 2 associated with a score of 4. It
means that the proportion of scores in Group 2 that is equal to or less than 4 is .20. The
proportion of scores in Group 2 that is larger than 4 is .80 (since 1 - .20 = .SO). In the case of
Row 5, the score 5 is recorded in Column C. The cumulative frequency is equal to 2, since
there are 2 scores in Group 2 that are equal to or less than 5 (the scores of 4 and 5). Thus, the
cumulative frequency 2 is divided by n, = 5, yielding 215 = .40. The value .40 in Column D
represents the cumulative proportion in Group 2 associated with a score of 5. It means that the
proportion of scores in Group 2 that is equal to or less than 5 is .40. The proportion of scores
in Group 2 that is larger than 5 is .60 (since 1 - .40 = .60). In the case of Row 6, the score 8
is recorded in Column C. The cumulative frequency is equal to 3, since there are 3 scores in
Group 2 that are equal to or less than 8 (the scores of 4, 5, and 8). Thus, the cumulative
frequency 3 is divided by n2 = 5, yielding 315 = .60. The value .60 in Column D represents
the cumulative proportion in Group 2 associated with a score of 8. It means that the proportion
of scores in Group 2 that is equal to or less than 8 is .60. The proportion of scores in Group 2
that is larger than 8 is .40 (since 1 - .60 = .40). In the case of Row 7, the score 1 1 is recorded
twice in Column C. The cumulative frequency is equal to 5, since there are 5 scores in Group
2 that are equal to or less than 1 1 (i.e., all ofthe scores in Group 2 are equal to or less than 11).
Thus, the cumulative frequency 5 is divided by n2 = 5, yielding 515 = 1. The value 1 in
Column D represents the cumulative proportion in Group 2 associated with a score of 11. It
means that the proportion of scores in Group 2 that is equal to or less than 11 is 1. The
proportion of scores in Group 2 that is larger than 11 is 0 (since 1 - 1 = 0).
Copyright 2004 by Chapman & Hal/CRC
458
Test 13
459
greater than or equal to the tabled critical one-tailed values M,,. = .600 and M g l = .800.
= ,801 > IS2(* = 01, the data are
Additionally, since in Row 3 of Table 13.1 [S,(X)
consistent with the alternative hypothesis H I : FAX} > F2(X). In other words, in computing the
value ofM, the cumulative proportion for Sample 1 is larger than the cumulativeproportion for
Sample 2 (which results in a positive sign for the value of M).
c) If the directional alternative hypothesis HI: F,(X)< F^X) is employed, the null
hypothesis cannot be rejected, since in order for the latter alternative hypothesis to be supported,
in computing the value of M, the cumulative proportion for Sample 2 must be larger than the
cumulative proportion for Sample 1 (which would result in a negative sign for the value of M
-which is not the case in Row 3 of Table 13.1).
A summary of the analysis of Example 13.1 with the Kolmogorov-Smirnovtest for two
independent samples follows: It can be concluded that there is a high likelihood the two
groups are derived from different populations. More specifically, the data indicate that the
depression ratings for Group 1 (i.e., the group that receives the antidepressant medication) are
significantly less than the depression ratings for Group 2 (the placebo group).
When the same set of data is evaluated with the t test for two independent samples and
the Mann-Whitney U test (i.e., Examples 11.1/12.1), in the case ofboth ofthe latter tests, the
null hypothesis can only be rejected (and only at the .05 level) if the researcher employs a
directional alternative hypothesis which predicts a lower level of depression for Group 1. The
latter result is consistent with the result obtained with the Kolmogorov-Smirnov test, in that
the directional alternative hypothesis H I : F,(X) > F2(X) is supported. Note, however, that the
latter directional alternative hypothesis is supported at both the .05 and .O1 levels when the
Kolmogorov-Smirnovtest is employed. In addition, the nondirectional alternativehypothesis
is supported at both the .05 and .O1 levels with the Kolmogorov-Smirnov test, but is not
supported when the t test and Mann-Whitney U test are used. Although the results obtained
with the Kolmogorov-Smirnov test for two independent samples are not identical with the
results obtained with the t test for two independent samples and the Mann-Whitney U test,
they are reasonably consistent.
It should be noted that in most instances the Kolmogorov-Smirnov test for two independent samples and the t test for two independent samples are employed to evaluate the
same set of data, the Kolmogorov-Smirnov test will provide a less powerful test of an
alternative hypothesis. Thus, although it did not turn out to be the case for Examples 11.1t13.1,
if a significant difference is present, the t test will be the more likely of the two tests to detect
it. Siege1 and Castellan (1988) note that when compared with the t test for two independent
samples, the Kolmogorov-Smirnovtest has apower efficiency (which is defined in Section VII
of the Wilcoxon signed-ranks test (Test 6)) of .95 for small sample sizes, and a slightly lower
power efficiency for larger sample sizes.
1999) employs a graphical method for computing the Kolmogorov-Smirnov test statistic that
is based on the same logic as the graphical method which is briefly discussed for computing the
test statistic for the Kolmogorov-Smirnov goodness-of-fit test for a single sample. The
method involves constructing a graph ofthe cumulativeprobability distribution for each sample
and measuring the point of maximum distance between the two cumulative probability
distributions. The latter graph is similar to the one depicted in Figure 7.1. Daniel (1990)
describes a graphical method that employs a graph referred to as a pair chart as an alternative
Copyright 2004 by Chapman & Hal/CRC
460
way of computing the Kolmogorov-Smirnov test statistic. The latter method is attributed to
Hodges (1958) and Quade (1973) (who cites Drion (1952) as having developed the pair chart).
2. Computing sample confidence intervals for the Kolmogorov-Smirnov test for two independent samples The same procedure that is described for computing a confidence interval
for cumulative probabilities for the sample distribution that is evaluated with the KolmogorovSmirnov goodness-of-fit test for a single sample can be employed to compute a confidence
interval for cumulative probabilities for either one of the samples that are evaluated with the
Kolmogorov-Smirnov test for two independent samples. Specifically, Equation 7.1 is
employed to compute the upper and lower limits for each of the points in a confidence interval.
Thus, for each sample, Md is added to and subtracted from each ofthe S(X) values. Note that
the value of Ma employed in constructing a confidence interval for each of the samples is
derived from Table A21 (Table of Critical Values for the Kolmogorov-Smirnov Goodnessof-Fit Test for a Single Sample) in the Appendix. Thus, if one is computing a 95% confidence interval for each of the samples, the tabled critical two-tailed value M,n5= .563 for
n, = ni = % = 5 is employed to represent Ma in Equation 7.1.
Note the notation SJ(X)is used to represent the points on a cumulative probability distribution for the Kolmogorov-Smirnov test for two independent samples, while the notation
S(X,) is used to represent the points on the cumulative probability distribution for the sample
evaluated with the Kolmogorov-Smirnovgoodness-of-fittest for a single sample. In the case
of the latter test, there is only one sample for which a confidence interval can be computed,
while in the caseofthe Kolmogorov-Smirnov test for two independent samples, a confidence
interval can be constructed for each of the independent samples.
3. Large sample chi-square approximation for a one-tailed analysis of the KolmogorovSmirnov test for two independent samples Siegel and Castellan (1 988) note that Goodman
(1954) has shown that Equation 13.1 (which employs the chi-square distribution) can provide
a good approximation for large sample sizeswhen a one-tailecUdirectiona1alternativehypothesis
is eval~ated.~
(Equation 13.1)
The computed value of chi-square is evaluated with Table A4 (Table of the Chi-square
Distribution) in the Appendix. The degrees of freedom employed in the analysis will always
be df = 2. The tabled critical one-tailed .05 and .O1 chi-squared values in Table A4 for df= 2
2
are x 2 . =~ 5.99 and x,OI
= 9.21 . Ifthe computed value ofchi-square is equal to or greater than
either of the aforementioned values, the null hypothesis can be rejected at the appropriate level
of significance (i.e., the directional alternative hypothesis that is consistent with the data will
be supported). Although our sample size is too small for the large sample approximation, for
purposes of illustration we will use it. When the appropriate values for Example 13.1 are substituted in Equation 13.1, the value y2 = 6.4 is computed. Since x2 = 6.4 is larger than
2
Y~~
= 5.99 but less than
= 9.21, the null hypothesis can be rejected, but only at the .05
level. Thus, the directional alternative hypothesis H,: F,(X) > F2(X) is supported at the .05
level. Note than when the tabled critical values in Table A23 are employed, the latter
alternative hypothesisis also supported at the .O1 level. The latter is consistent with the fact that
Siegel and Castellan (1988) note that when Equation 13.1 is employed with small sample sizes,
it tends to yield a conservative result (i.e., it is less likely to reject a false null hypothesis).
2,
Test 13
VIII. Additional Examples Illustrating the Use of the KolmogorovSmirnov Test for Two Independent Samples
Since Examples 11.4 and 11.5 in Section VIII of the t test for two independent samples
employ the same data as Example 13.1, the Kolmogorov-Smirnov test for two independent
samples will yield the same result if employed to evaluate the latter two examples. In addition,
the Kolmogorov-Smirn~vtest can be employed to evaluate Examples 11.2 and 11.3. Since
different data are employed in the latter examples, the result obtained with the KolmogorovSmirnov test will not be the same as that obtained for Example 13.1. Example 11.2 is evaluated
below with the Kolmogorov-rSmirnov test for two independent samples. Table 13.2 summarizes the analysis.
Table 13.2 Calculation of Test Statistic for Kolmogorov-Smirnov Test
for Two Independent Samples for Example 11.2
The obtained value of test statistic is M = .60, since .60 is the largest absolute value for
a difference score recorded in Column E of Table 13.2. Since n, = 5 and n2 = 5, we
employ the same critical values used in evaluating Example 13.1. If the nondirectional
Copyright 2004 by Chapman & Hal/CRC
462
alternative hypothesis H,: Fl(X) * F2(X) is employed, the null hypothesis cannot be
rejected at the .05 level, since M = .60 is less than the tabled critical two-tailed value Mo, =
.800. The data are consistent with the directional alternativehypothesis H,:F,(X) < F2(X),
since in Row 3 of Table 13.2 [Sl(X) = .40] < [S^(X) = 11 . In other words, in computing
the value of M, the cumulative proportion for Sample 2 is larger than the cumulative
proportion for Sample 1 (resulting in a negative sign for the computed value of A/). The
directional alternative hypothesis HI: FAX) < F2(X) is supported at the .05 level, since
M = .60 is equal to the tabled critical one-tailed value Mor = .600. It is not, however,
supported at the .O1 level, since M = .60 is less than the tabled critical one-tailed value Mnl
= .800. The directional alternative hypothesis HI: FAX) > F2(X) is not supported, since it
is not consistent with the data (i.e., the sign of the value computed for M is not positive).
When the null hypothesis H; p, = u, is evaluated with the t test for two independent
samples, the only alternative hypothesis which is supported (but only at the .05 level) is the
directional alternative hypothesis HI: p, > p2. The latter result (indicating higher scores in
Group 1) is consistent with the result that is obtained when the Kolmogorov-Smirnovtest for
two independent samples is employed to evaluate the same set of data.
References
Conover, W. J. (1980). Practical nonparametric statistics (2nd ed.). New York: John Wiley
& Sons.
Conover, W. J. (1999). Practical nonparametric statistics (3rd ed.). New York: John Wiley
& Sons.
Daniel, W. W. (1990). Applied nonparametric statistics (2nd 4.). Boston: PWS-Kent Publishing Company.
Drion, E. F. (1952). Some distribution-free tests for the difference between two empirical
cumulative distribution functions. Annals of Mathematical Statistics, 23, 563-574.
Goodman, L. A. (1954). Kolmogorov-Smimov tests for psychological research. Psychological
Bulletin, 51, 160-168.
Hodges, J. L., Jr. (1958). The significance probability of the Smirnov two-sample test. Ark.
Mat., 3,469-486.
Hollander, M. and Wolfe, D. A. (1999). Nonparametricstatistical methods. New York: John
Wiley & Sons.
Khamis, H. J. (1990). The 8 corrected Kolmogorov-Smimov test for goodness-of-fit. Journal
of Statistical Plan. Infer., 24,3 17-355.
Kolmogorov, A. N. (1933). Sulla deterrninazioneempirica di una legge di distribuzione. Giorn
dell'lnst. Ital. degli. Att., 4, 89-91.
Marascuilo, L. A. and McSweeney, M. (1977). Nonparametricand distribution-freemethods
for the social sciences. Monterey, CA: BrookdCole Publishing Company.
Massey, F. J., Jr. (1952). Distribution tables for the deviation between two sample cumulatives.
Annals of Mathematical Statistics, 23, pp. 435-441.
Noether, G. E. (1963). Note on the Kolmogorov statistic in the discrete case. Metrika, 7,
115-1 16.
Noether, G. E. (1967). Elements of nonparmetric statistics. New York: John Wiley & Sons.
Quade, D. (1973). The pair chart. Statistica Neerlandica, 27,2945.
Siegel, S. and Castellan, N. J., Jr. (1988). Nonparametric statistics for the behavioral
sciences (2nd ed.). New York: McGraw-Hill Book Company.
Smimov, N. V. (1936). Sur la distribution de w 2(criterium de M. R v. Mises). Comptes
Rendus (Paris), 202,449-452.
Copyright 2004 by Chapman & Hal/CRC
Test 13
463
Endnotes
Marasucilo and McSweeney (1977) employ a modified protocol that can result in a larger
absolute value for M in Column E than the one obtained in Table 13.1. The latter protocol
employs a separate row for the score of each subject when the same score occurs more than
once within a group. If the latter protocol is employed in Table 13.1, the first two rows of
the table will have the score of 0 in Column A for the two subjects in Group 1 who obtain
that score. The first 0 will be in the first row, and have a cumulative proportion in Column
B of 115 = .20. The second 0 will be in the second row, and have a cumulative proportion
in Column B of 215 = .40. In the same respect the first of the two scores of 11 (obtained by
two subjects in Group 2) will be in a separate row in Column C, and have a cumulative
proportion in Column D of 415 = 30. The second score of 11 will be in the last row of the
table, and have a cumulative proportion in Column D of 515 = 1. In the case of Example
13.1, the outcome of the analysis will not be affected if the aforementioned protocol is
employed. In some instances, however, it can result in a larger M value. The protocol
employed by Marasucilo and McSweeney (1977) is used by sources who argue that when
there are ties present in the data (i.e., the same score occurs more than once within a group),
the protocol described in this chapter (which is used in most sources) results in an overly
conservative test (i.e., makes it more difficult to reject a false null hypothesis).
2. When the values of n, and n2 are small, some of the .05 and .O1 critical values listed in
Table A23 are identical to one another.
3. The last row in Table A23 can also be employed to compute a critical M value for large
sample sizes.