You are on page 1of 13

Research & Reviews: Journal of Statistics Volume 2, Issue 1, ISSN: 2278-2273 __________________________________________________________________________________________

An Exact KolmogorovSmirnov Test for the Negative Binomial Distribution with Unknown Probability of Success
Arnab Hazra* Indian Statistical Institute, 203, Barrackpore Trunk Road, Kolkata, India Abstract
In this paper, we develop an exact KolmogorovSmirnov goodness-of-fit test in the case of negative binomial distribution with an unknown probability of success. This test is conditional, with the test statistic being the maximum absolute difference between the empirical distribution function and its conditional expectation given the sample total. The results are not asymptotic, but exact. We illustrate the test with three examples in case the size parameter equals one i.e. the geometric distribution. We also include some simulations in order to check the power of the procedures. The new test seems to be the first exact negative binomial goodness-of-fit test for which critical values are available without simulation or exhaustive enumeration.

Keywords: conditional test, Cramrvon Mises statistics, Anderson-Darling statistics, goodness of fit *Author for Correspondence: E-mail: arnabhazra09@gmail.com

INTRODUCTION
Let be a random sample from a distribution with cumulative distribution function (CDF) F. In its simplest form, the KolmogorovSmirnov test is an exact test of : against : , where, is a fully specified continuous CDF. The test statistic is ( )| | ( ) where, ( ) is the empirical distribution function obtained from the data , and the level- version of the test consists of rejecting when , where satisfies for all ( ) continuous CDFs G. If is not continuous, then rejecting when leads to a conservative test. A less conservative exact test can be obtained by tailoring the critical value to the particular discrete [1,2]. When testing the composite null hypothesis that F comes from a particular parametric family, additional modifications are needed. If the parametric family is a location, scale, or locationscale family and if the parameters of the family are estimated in an appropriate way, then an exact test can still be obtained [3]. However, the critical values vary from one

parametric family to another. Simulation studies were used to obtain appropriate critical values for testing whether F is normal [4], and the same thing was done for testing whether F is exponential. The procedure for testing : F is normal against : F is not normal consists of estimating and using = and = , computing the test statistic ( )| | ( ) where, ( ) is the standard normal CDF, and then rejecting if , where satisfies for all normal ( ) CDFs G [4]. The negative binomial families of distributions are not location, scale, or locationscale families. Thus, like Poisson family, the procedure suggested [3] does not apply. An approximate KolmogorovSmirnov goodnessof-fit test for the Poisson distribution with unknown mean was developed [6]. Their procedure for testing : F is Poisson against : F is not Poisson consists of estimating , computing using | ( ) ( )| where, ( ) is the CDF for the Poisson( ) distribution, and then rejecting if

RRJoS (2013) 1-13 STM Journals 2013. All Rights Reserved

Page 1

Test for Negative Binomial Distribution Arnab Hazra __________________________________________________________________________________________

exceeds the critical value available in the five tables [6]. As of authors knowledge, no such literature is available in case of negative binomial distribution and even if a similar approach is available and tables were prepared, one can eliminate the need for tables by doing a bootstrap KolmogorovSmirnov test [7]. But neither of them is an exact test and simulations are required to obtain appropriate critical values. An exact KolmogorovSmirnov goodness-of-fit test for the Poisson distribution with unknown mean was developed and also an algorithm was proposed to obtain p-values and exact critical levels and so, critical values are available without simulation or exhaustive enumeration [8]. When is a random sample from the negative binomial ( ) distribution, the sample total is a sufficient statistic for p. For M=1, negative binomial ( ) is same as geometric ( ) and the theory developed in this article is also for geometric distribution with unknown probability of success. As a result, a variety of conditional exact tests of : F is negative binomial with size M against : F is not negative binomial with size M (M is known in our case) can be created using the conditional distribution of ( ) given T . A general method for creating conditional exact goodness-of-fit tests was described [9] and an interesting conditional exact test for the binomial, Poisson, geometric and negative binomial case was developed [10]. However, conducting these tests still requires either simulation or exhaustive enumeration. In this paper, we develop an exact KolmogorovSmirnov goodness-of-fit test for the negative binomial distribution with unknown probability of success. We obtain this test by conditioning on the sample total T as described above, and our conditional test statistic is ( ) ( ) Critical values for this test could be obtained using the proper simulation ideas [9]. However, we instead obtain p-values and exact critical values using an algorithm proposed for Poisson distribution [8]. As a result, we obtain what seems to be the first exact negative binomial goodness-of-fit test for which critical

values are available without simulation or exhaustive enumeration. We explain the test in detail in Section 2, and we develop the algorithm needed for computing critical values in Section 3. We explore some properties of the test in Section 4, and we illustrate the test with three examples in Section 5. We compare the power of the test to that of other conditional tests via a simulation study in Section 6, and we give our conclusions in Section 7. The Conditional KolmogorovSmirnov Test Suppose that is a random sample from the negative binomial ( ) distribution. Let be the sample total. It then follows that, regardless of the value of p, the conditional distribution of ( ) given T = t is identified as , given by its probability mass function (PMF) ) ( ) (( ) ( ( ) )
( )

which is Dirichlet-multinomial density with parameters ( ( )) and n classes i.e. ) follows multinomial given ,( distribution with parameters ( ( )) where ( ) follows Dirichlet distribution with parameters ( ). Suppose now that ( ) . For , define to be the number of times i appears in the list . The vector ( ) then satisfies and . Using the relationship between the distribution and the distribution, we have that for , = ( ) = ( ) =
( )( (
( )

Consequently, we have that for


( )( (
( )

, .... (1)

Let ( ) be the empirical distribution function obtained from the data . We then have, for integers that ( ) = ( )= . where, ( ) is the indicator for the event A. Thus, by Eq. (1),

RRJoS (2013) 1-13 STM Journals 2013. All Rights Reserved

Page 2

Research & Reviews: Journal of Statistics Volume 2, Issue 1, ISSN: 2278-2273 __________________________________________________________________________________________

( )

)( (

a discrete set of -levels can be achieved exactly. Computation of p-value As in Section 2, suppose that ( ) and define to be the number of times appears in the list . The probability that the list is a specific sequence in which appears times for is ( ) ( ) and the number of sequences in which appears times for is . Thus (( ) ( )) =
( ( ) )

and the conditional KolmogorovSmirnov test statistic is given by ( ) ( ) ( ) ( ) Consider computing ( ) for some value . This is equivalent to computing ( ) , and having requires that ( ) be within of its conditional expected value at each point . Thus, ( ) = ( ) .... (2) where, for , is the smallest integer strictly larger than and is the largest integer strictly smaller than . Probabilities of the form (2) can be computed using the algorithm described in Section 3. If is the observed value of the test statistic, then ( ) is the p-value for the conditional test. Thus, the test can be carried out by computing ( ) , comparing this value to , and rejecting : F is negative binomial with fixed number of success M if and only if ( ) . To obtain the critical value corresponding to a particular choice of M, n, t and , we use the bisection root-finding algorithm to find a value (not unique) such that ( ) is as close as possible to without exceeding . Since the test statistic D has a discrete distribution, only ( ) (( ) ) (( ) ((( and if we define ( )
( ) (( ) ((( ) (( ) ) ((( )) )
(

)(

Since that ( = ( (

and ) ) ( ) = ) =
(

, we have , ,

(( ) ((
)

) , ))

((( )) ) , = ( ) = (( ) ) , (( ( ) =
( ) )

) )( )) ((

( (

=
) (

(((

. ))

)
(
) )

(((

))

) (

) )) )( perform this using the ( ) expressions obtained here with Frey (2012)s algorithm [8].
) )( )

)) (
)

) ( )) (( ( ), .......(3) Now we wish to compute ( ) , where and are integer bounds that satisfy for . We

Frey (2012)s Algorithm For , , and , we define ( ) to be the set of all ) such that (i) vectors ( for , (ii) for , (iii) , and (iv) . If we then set

RRJoS (2013) 1-13 STM Journals 2013. All Rights Reserved

Page 3

Test for Negative Binomial Distribution Arnab Hazra __________________________________________________________________________________________

)
(

) ( )

( ) = ( ), and we can obtain the value ( ) through recursive calculations. When t = 0, and ( ) = ( ) = ( ) ) = ( ) ( ( ) ,......(4) { It follows from Equation (3) that and given the values for some fixed

) , we can

( ) compute the values . To see this, note that to obtain a sequence ( ) ( ) , we must start with a sequence ( ) that belongs to ( ) for some ( ) and then add an element. If the added element is , then ( ) must satisfy and . In addition, ( ) is non-zero ( only when , and ) is non-zero only when . Thus, letting be the greatest integer function, we have that ) ( ) ( )

where the sum is taken to be zero if the lower bound exceeds the upper bound. Combining this recursive equation with our earlier observations yields the following algorithm. Algorithm: Compute ( ) as follows

( ) Step 1. Compute using Equation (4). Step 2. For ( ) ( ) equation (5). Step 3. Obtain ( ) from

compute from using

).

In this paper, we use Frey (2012)s algorithm only to compute exact p-values and critical values for the conditional KolmogorovSmirnov goodness-of-fit test. However, the algorithm could also be used to obtain exact pvalues and critical values for other conditional negative binomial goodness-of-fit tests. The sole requirement for doing this is that it must be possible to write the exact p-value for the test in the form ( ). The algorithm was actually proposed in case of Poisson distribution where the expression for ( ) is less complicated [8]. Thus, one may wonder about its accuracy when applied in case of negative binomial distribution. To assess this accuracy, we ran an experiment that involved computing known probabilities. The

experiment consisted of using Algorithm 1, implemented in R, to compute the probability ( ), which is known to be 1, for many different choices of M, n and t . We consider this choice because it considers the maximum number of calculations necessary for a specific choice of M, n and t. Some representative results are given in Table 1, which shows the absolute errors (in scientific notation) that resulted when the probability ( ) was computed for M=1, 5, and 20; n = 50, 100, 200 and 400; t = n/2, n, 2n and 4n. We have considered the case M=1 as it describes the case of geometric distribution and the other cases consider a wide range of values of M. The choice of n and t also considers a wide range of values. Table 1 shows that, increasing either M or n or t tends to reduce accuracy but it also shows that all of the errors are quite small. Thus, it seems clear that p-values obtained from Algorithm 1 will typically be accurate to many more decimal places than 3 or 4 that would usually be reported. While studying the accuracy of Algorithm 1, we also studied the speed of the algorithm. Table 2 reports the times (in seconds) required to compute each of the probabilities involved in creating Table 1. These calculations were done in R on a Dell desktop. We see that the time required for the algorithm increases dramatically as n and t increase but not with M, with the longest calculation requiring around 4 min of computing time for M=20

RRJoS (2013) 1-13 STM Journals 2013. All Rights Reserved

Page 4

Research & Reviews: Journal of Statistics Volume 2, Issue 1, ISSN: 2278-2273 __________________________________________________________________________________________

which is large enough case with M, n and t. Note, however, that the probabilities computed here are the most time-consuming ones we

would ever encounter for a given choice of M, n and t. The time required to compute a typical p-value would usually be significantly smaller.

Table 1: Absolute Numerical Errors when Algorithm 1 is used to Compute the Probability ( ), which is known to be 1, for Different Values of M, n and t. . M=1 n 50 100 200 400 n 50 100 200 400 n 50 100 200 400 t=n/2 2.02E-14 1.62E-14 1.14E-13 5.68E-13 t=n/2 2.91E-13 1.93E-13 7.44E-14 1.95E-12 t=n/2 9.28E-13 2.57E-13 3.01E-12 7.19E-13 t=n 4.34E-14 8.83E-14 1.57E-13 2.58E-13 M=5 t=n 8.62E-14 1.01E-13 3.40E-13 1.36E-12 M=20 t=n 4.96E-13 1.62E-12 4.44E-12 6.69E-12 t=2n 3.22E-13 6.11E-13 1.84E-13 2.80E-13 t=2n 2.37E-13 2.19E-12 9.96E-12 1.81E-11 t=4n 2.99E-13 1.64E-13 1.41E-12 3.67E-12 t=4n 4.49E-13 1.50E-12 4.76E-12 9.68E-13 t=2n 8.00E-14 3.12E-14 2.72E-13 3.45E-13 t=4n 1.87E-13 2.55E-13 6.35E-14 3.94E-13

Note: The errors are reported in scientific notation so that 2.02E14, for example, is 2.02 1014 Properties of the Test Since the test statistic has a discrete distribution, it is usually not possible to obtain an exact conditional level- test. Instead, we use the conditional test with the largest level not exceeding . The critical value for this test is the value from Section 2, and the exact conditional level is ( ) . In this section, we examine how the values change with M, n and t , and we also examine how close the exact levels come, both conditionally and unconditionally, to achieving the desired level. The exact conditional level for a given test is computed using Algorithm 1 from Section 3, and the unconditional level of the test for a fixed M, for a given true negative binomial probability p, is obtained by averaging the conditional levels over the distribution of T .We did calculations for several different choices of , M, n, and , and we present a representative subset of our results here. Figure 1 shows critical values for conditional level-0.05 KolmogorovSmirnov tests as a function of the sample total t. In each Figure, the left-hand plots are for n = 20, and the righthand plots are for n = 40. We see from the figure that when the sample mean is small, the critical values are also small. As t increases, the critical values tend to increase, but the rate of increase is slow once the sample mean exceeds 1. We also see the impact of discreteness in Figure 1. In particular, we see that the critical values

RRJoS (2013) 1-13 STM Journals 2013. All Rights Reserved

Page 5

Test for Negative Binomial Distribution Arnab Hazra __________________________________________________________________________________________

Table 2: Times in Seconds Required for Performing the Calculations Reported in Table 1. M=1 n 50 0.01 100 200 400 1.95 n 50 100 200 400 t=n/2 0.01 0.03 0.21 1.92 t=n/2 0.02 0.05 0.20 1.95 11.81 M=5 t=n 0.03 0.11 0.79 11.44 M=20 n 50 100 200 400 t=n 0.04 0.12 0.81 11.65 t=2n 0.06 0.40 3.90 55.86 t=4n 0.22 1.65 21.45 244.89 t=2n 0.08 0.40 3.82 55.33 t=4n 0.22 1.64 20.75 242.98 56.04 236.87 0.05 0.20 0.03 0.12 0.82 0.09 0.42 4.18 0.25 1.67 21.59 t=n/2 t=n t=2n t=4n

do not increase smoothly with t, but instead follow a jagged pattern. By comparing the lefthand and right-hand plots in Figure 1, we see that the critical value corresponding to a given t tends to be larger when n = 20 than when n = 40 for all three values of M considered. Figure 2 shows exact conditional levels for level-0.05 KolmogorovSmirnov tests as a function of t, and Figure 3 shows exact unconditional levels for level-0.05 KolmogorovSmirnov tests as a function of the true negative binomial failure probability 1-p. We see from Figure 2 that when t is small, many of the conditional levels are much less than 0.05. However, once t is large, most of the conditional levels are 0.04 or higher. We see from Figure 3 that the conservatism of the conditional tests makes the test also conservative in an unconditional sense. Indeed, Figure 3 may be thought of as a

smooth, a somewhat rescaled version of Figure 2 for large values of M. We see from Figure 3 that when the population mean i.e., ( ) , the true level of the test tends to be much less than 0.05. However, when ( ) , the level of the test tends to be 0.04 or greater. Thus, when , the test is only mildly conservative. Data Examples In this section, we apply the new conditional test for M=1 (i.e. the geometric distribution) to the data on counts of the number of European red mites on apple leaves [11], the data on counts of number of borers in each plot under a specific treatment [12] and to the data on the number of accidents experienced by

RRJoS (2013) 1-13 STM Journals 2013. All Rights Reserved

Page 6

Research & Reviews: Journal of Statistics Volume 2, Issue 1, ISSN: 2278-2273 __________________________________________________________________________________________

Fig. 1: Critical Values for Conditional Level-0.05 KolmogorovSmirnov Tests as a Function of the Sample Total t. The Left Plots are for n = 20 and Right Plots for n = 40 (M=1, 5, 20).

Fig. 2: Conditional exact levels for conditional level-0.05 KolmogorovSmirnov tests as a function of the sample total t . The left plots are for n = 20 and right plots for n = 40 (M=1,5, 20).

RRJoS (2013) 1-13 STM Journals 2013. All Rights Reserved

Page 7

Test for Negative Binomial Distribution Arnab Hazra __________________________________________________________________________________________

Fig. 3: Unconditional Exact Levels for Conditional Level-0.05 KolmogorovSmirnov Tests as a Function of the True Binomial Probability p. The Left Plots are for n = 20 and Right Plots for n = 40 (M=1, 5, 20). machinists [13]. For each data set, we computed a p-value using the technique described in Section 2.We also computed the conditional critical value appropriate for testing significance at level 0.05. The three examples include one case where we find no reason to doubt the negative binomial hypothesis, one borderline case, and one case where the hypothesis can be rejected at level 0.05. The collected data in [11] are the counts of the number of European red mites on the total 150 apple leaves; 25 leaves were selected at random from each of the McIntosh trees in a single orchard receiving the same spray treatment and the number of adult females was counted on each leaf. The counts appear in Table 3, which also gives ( ) and, since t = ( ) 172, . The maximum absolute difference between ( ) and its conditional expected value occurs when x = 3, and it gives D 0.0189. This corresponds to a p-value of 0.8416616. Thus, there is a very little evidence that the data are not geometric. The cut-off for significance at level 0.05 when n = 150 and t = 172 is = 0.0597. Time required for calculating the p-value is 0.21 s.

RRJoS (2013) 1-13 STM Journals 2013. All Rights Reserved

Page 8

Research & Reviews: Journal of Statistics Volume 2, Issue 1, ISSN: 2278-2273 __________________________________________________________________________________________

Table 3: Counts of Red Mites on Apple Leaves [11]. Value Count Absolute difference Observed Expected 0 70 0.4667 0.4641745 0.0025255 1 38 0.72 0.7136682 0.0063318 2 17 0.8333 0.8474094 0.0141094 3 10 0.9 0.9189063 0.0189063 4 9 0.96 0.9570229 0.0029771 5 3 0.98 0.9772874 0.0027126 6 2 0.9933 0.9880308 0.0052692 7 1 1 0.9937105 0.0062895 Notes: The table also gives , the conditional expected , and the absolute difference between the two. The bold value is D 0.0189. The obtained data in [12] are counts of the number of borers per hill where the total land was divided into 15 randomized blocks and eight hills of corn were selected at random from each block. The counts appear in Table 4, which also gives ( ) and, since t=380, ( ) . The maximum absolute difference between ( ) and its conditional expected value occurs when x = 2, and it gives D 0.0925. This corresponds to a p-value of 0.01586. Thus, there is some evidence that the data are not geometric, enough to reject the geometric hypothesis at level 0.05, but not enough evidence for us to reject at level 0.01. The cut-off for significance at level 0.05 when n = 120 and t = 380 is = 0.0788 and the cut-off for significance at level 0.01 is =0.0962. Time required for calculating the p-value is 0.78 s. The obtained data in [13] are counts of the number of accidents experienced by 414 machinists in three months. The counts appear in Table 5, which also gives ( ) and, since ( ) t=200, . The maximum absolute difference between ( ) and its conditional expected value occurs when x = 0, and it gives D 0.04124. This corresponds to a p-value of 0.001247. Thus, we have strong evidence that the data did not come from a geometric distribution. The cut-off for significance at level 0.05 when n = 414 and t = 200 is = 0.02639. Time required for calculating the p-value is less than 0.64 s.

Table 4: Counts of Borers per Hill [12]. Value Count Absolute difference Observed Expected 0 24 0.2 0.238477 0.038477 1 16 0.3333333 0.4204473 0.087114 2 16 0.4666667 0.5592135 0.0925468 3 18 0.6166667 0.6649667 0.0483 4 15 0.7416667 0.74551 0.0038433 5 9 0.8166667 0.8068143 0.0098524 6 6 0.8666667 0.8534453 0.0132214 7 5 0.9083333 0.8888925 0.0194408 8 3 0.9333333 0.9158208 0.0175125 9 4 0.9666667 0.9362643 0.0304024 10 3 0.9916667 0.9517746 0.0398921 11 0 0.9916667 0.9635345 0.0281322 12 1 1 0.972445 0.027555 Note: The table also gives , the conditional expected , and the absolute difference between the two. The bold value is D 0.0925.

RRJoS (2013) 1-13 STM Journals 2013. All Rights Reserved

Page 9

Test for Negative Binomial Distribution Arnab Hazra __________________________________________________________________________________________ Table 5. counts of accidents experienced by machinists [13] Absolute difference Observed Expected 0.7149758 0.6737357 0.0412401 0.8937198 0.8939108 0.000191 0.9565217 0.9656208 0.0090991 0.9758454 0.9888972 0.0130518 0.9855072 0.9964267 0.0109195 0.9951691 0.998854 0.0036849 0.9975845 0.9996337 0.0020492 0.9975845 0.9998833 0.0022988 1 0.999963 3.7E-05

Note: The table also gives , the conditional expected , and the absolute difference between the two. The bold value is D 0.0412. Power comparisons To assess the power of the new conditional test, we did a simulation study in which we compared the power of the test to that of tests based on three statistics analogue of Cramrvon Mises, Anderson- Darling and a modified Cramervon Mises statistics studied in [14]. We used level 0.05 and sample sizes 20 and 40 for three choices of M 1, 5 and 20, and we ran all tests as conditional tests in order to make the power comparison as fair as possible. Power-based arguments for using these sorts of conditional tests were given in [15]. The three alternative statistics were defined as follows. As in Section 2, we let be the number of observed values equal to for . We then, following Choulakian et ( al. [14], set ) for . Here
( )( (
( )

Value 0 1 2 3 4 5 6 7 8

Count 296 74 26 8 4 4 1 0 1

where the set S includes all such that and . Each test rejects the negative binomial hypothesis when the test statistic is large, and we obtained appropriate conditional critical values for the tests based on , and via simulation. For M=1, the negative binomial distribution and the equal mixtures of negative binomial distributions are highly over-dispersed when the population mean is 5 and so, when n = 20, for and when n = 40, for , and for other choices of M, n and t with M=1, 5, 20; n = 20, 40, for , we generated 10,000 samples from the conditional distribution of ( ) given T = t .We then took the conditional critical value to be the smallest value that gave a simulated conditional level of 0.05 or less. Conditional critical values for the new conditional test (based on D) were obtained using the method from Sections 2 and 3. Modelling our power study based on the dispersion test suggested by Fisher [16], we considered over-dispersed distributions, underdispersed distributions and equally-dispersed (variance to mean ratio was one) distributions. The over-dispersed distributions that we considered were mixtures of Poisson distributions and mixtures of negative binomial distributions. We used the notation 0.5 ( ) + 0.5 ( ) to indicate an equal mixture of Poisson distributions with means and , and we used the notation 0.5 ( ) + 0.5 ( ) to indicate an equal mixture of two negative binomial distributions both having size M and the probabilities of success are and respectively. The equally dispersed distributions that we considered were Poisson

) )( (
( )

by our work in Section 2. , we let


)

For
( )

be the conditional

probability that a particular observed value is equal to j , and we let be the conditional probability that a particular observed value is less than or equal to j . We then, following Equations (1) and (3) and modified version of Equation (1) in section 4 in [14], define

RRJoS (2013) 1-13 STM Journals 2013. All Rights Reserved

Page 10

Research & Reviews: Journal of Statistics Volume 2, Issue 1, ISSN: 2278-2273 __________________________________________________________________________________________

distributions and Beta-binomial distributions. We used the notation ( ) to indicate the Poisson distribution with mean , and we used the notation Beta-Bin( ) to indicate the distribution in which p is marginally Beta( ) and is Bin( ) , where the notation Bin( ) indicates the binomial distribution with size m and probability . The under-dispersed distributions that we considered were binomial distributions. For each choice of a distribution and a sample size, we simulated 10,000 samples and computed the simulated power for each of the four tests. Results for distributions with mean

1 are given in Table 6, and results for distributions with mean 5 are given in Table 7. We see from Table 6 and Table 7 that though the power of the conditional Kolmogorov Smirnov test is less than that of the three other tests, in most of the cases it is comparable too. One reason for this lower power is that the KolmogorovSmirnov test has an unconditional exact level smaller than that of the other three test statistics. Indeed, we see in both tables that the test based on D has smaller power than the other three tests when the data are actually negative binomial.

Table 6. Simulated powers for conditional exact tests based on W2, A2,

and D for distributions with mean 1

M=1 Distribution 0.5P(0.293)+0.5P(1.707) 0.5P(0)+0.5P(2) 0.5NB(1,0.667)+0.5NB(1,0.4) 0.5NB(1,0.773)+0.5NB(1,0.369) P(1) Beta-Bin(3,1,2) Beta-Bin(4,2,6) Bin(3,1/3) Bin(2,1/2) NB(1,1/2) M=5 Distribution 0.5P(0.293)+0.5P(1.707) 0.5P(0)+0.5P(2) 0.5NB(5,0.909)+0.5NB(5,0.769) 0.5NB(5,0.965)+0.5NB(5,0.734) P(1) Beta-Bin(3,1,2) Beta-Bin(4,2,6) Bin(3,1/3) Bin(2,1/2) NB(5,5/6) M=20 Distribution 0.5P(0.293)+0.5P(1.707) 0.5P(0)+0.5P(2) 0.5NB(20,0.973)+0.5NB(20,0.933) 0.5NB(20,0.983)+0.5NB(20,0.924) P(1) Beta-Bin(3,1,2) Beta-Bin(4,2,6) Bin(3,1/3) Bin(2,1/2) NB(20,20/21) R 1.50 2.00 1.25 1.50 1.00 1.00 1.00 0.67 0.50 1.05 W
2

n=20 R 1.50 2.00 2.50 3.00 1.00 1.00 1.00 0.67 0.50 2.00 W2 0.0552 0.1141 0.0789 0.1563 0.3337 0.1979 0.2689 0.7276 0.9171 0.0449 A2 0.0516 0.1103 0.0864 0.1717 0.3201 0.2049 0.2590 0.7244 0.929 0.0458 Wm2 0.0528 0.1148 0.0847 0.1712 0.3142 0.2232 0.2594 0.7300 0.9331 0.0459 n=20 R 1.50 2.00 1.50 2.00 1.00 1.00 1.00 0.67 0.50 1.20 W2 0.1197 0.6080 0.0922 0.3383 0.0562 0.0462 0.0415 0.2283 0.5062 0.0477 A2 0.1303 0.6054 0.1100 0.3784 0.0488 0.0384 0.0345 0.2148 0.4875 0.0463
2

n=40 D 0.0412 0.0858 0.0573 0.1179 0.2705 0.1528 0.2074 0.6367 0.8590 0.0329 W2 0.0787 0.2445 0.1040 0.2519 0.6546 0.4637 0.5716 0.9735 0.9992 0.0484 A2 0.076 0.2399 0.1108 0.2670 0.6491 0.4986 0.5783 0.9771 0.9997 0.0481 Wm2 0.0749 0.2587 0.1147 0.2704 0.6283 0.5298 0.5707 0.9759 0.9997 0.0481 n=40 D 0.1021 0.5617 0.0800 0.3113 0.0512 0.0327 0.0338 0.2175 0.4737 0.0396 W2 0.2088 0.9041 0.1287 0.6061 0.0863 0.0605 0.0558 0.5200 0.8910 0.0480
2

D 0.0649 0.1908 0.0841 0.2136 0.6022 0.3943 0.5104 0.9602 0.9975 0.0384

Wm2 0.1281 0.612 0.1038 0.3645 0.0544 0.0433 0.039 0.2302 0.5272 0.0484 n=20 Wm2 0.2301 0.7385 0.0890 0.2143 0.0387 0.0660 0.0413 0.1366 0.3570 0.0464

A2 0.2231 0.9004 0.1535 0.6362 0.0791 0.0636 0.0532 0.5261 0.9297 0.0488
2

Wm2 0.2170 0.9008 0.1434 0.6251 0.0821 0.0644 0.0556 0.5343 0.9210 0.0478 n=40 Wm2 0.3971 0.9653 0.1133 0.3627 0.0487 0.1002 0.0455 0.3122 0.8108 0.0468

D 0.1844 0.8815 0.1132 0.5796 0.0733 0.0519 0.0476 0.4730 0.8577 0.0396

D 0.2068 0.7293 0.0761 0.1832 0.0317 0.0515 0.0350 0.1192 0.3090 0.0403

D 0.3442 0.9592 0.0904 0.3114 0.0403 0.0695 0.0332 0.2859 0.7130 0.0375

0.2029 0.7163 0.0771 0.1825 0.0405 0.0715 0.0430 0.1454 0.3570 0.0464

0.2476 0.7460 0.1008 0.2369 0.0362 0.0641 0.0371 0.1134 0.3225 0.047

0.3641 0.9608 0.0984 0.3269 0.0514 0.1018 0.0461 0.3019 0.7520 0.0464

0.405 0.9642 0.1242 0.3759 0.0479 0.0971 0.0431 0.3085 0.8184 0.0467

RRJoS (2013) 1-13 STM Journals 2013. All Rights Reserved

Page 11

Test for Negative Binomial Distribution Arnab Hazra __________________________________________________________________________________________

Table 7: Simulated Powers for Conditional Exact Tests based on W2, A2, Distributions with Mean 5.

and D for

M=1 Distribution 0.5P(3.419)+0.5P(6.581) 0.5P(2.764)+0.5P(7.236) 0.5NB(1,0.205)+0.5NB(1,0.140) 0.5NB(1,0.226)+0.5NB(1,0.132) P(5) Beta-Bin(7,1,0.4) Beta-Bin(8,2,1.2) Bin(15,1/3) Bin(10,1/2) NB(1,1/6) R 1.50 2.00 6.50 7.00 1.00 1.00 1.00 0.67 0.50 6.00 W2 0.9516 0.7385 0.0593 0.0685 0.9981 0.4284 0.0472 1.0000 1.0000 0.0463 A2 0.9509 0.7425 0.0606 0.0732 0.9979 0.4568 0.0451 1.0000 1.0000 0.045

n=20 Wm2 0.9233 0.6740 0.0618 0.0739 0.9967 0.4564 0.0444 1.0000 1.0000 0.0478 D 0.8510 0.5648 0.0521 0.0615 0.9878 0.3921 0.0353 0.9999 1.0000 0.0419 W2 1.0000 0.9851 0.0621 0.0839 1.0000 0.7287 0.0497 1.0000 1.0000 0.0483 A2 1.0000 0.9869 0.0602 0.0868 1.0000 0.7630 0.0463 1.0000 1.0000 0.0482

n=40 Wm2 0.9999 0.9778 0.0668 0.0929 1.0000 0.7626 0.0455 1.0000 1.0000 0.0482 D 0.9985 0.9299 0.0586 0.0757 1.0000 0.7392 0.0426 1.0000 1.0000 0.0433

M=5 Distribution 0.5P(3.419)+0.5P(6.581) 0.5P(2.764)+0.5P(7.236) 0.5NB(5,0.584)+0.5NB(5,0.437) 0.5NB(5,0.628)+0.5NB(5,0.415) P(5) Beta-Bin(7,1,0.4) Beta-Bin(8,2,1.2) Bin(15,1/3) Bin(10,1/2) NB(5,1/2) R 1.50 2.00 2.50 3.00 1.00 1.00 1.00 0.67 0.50 2.00 W2 0.0593 0.0628 0.0974 0.1974 0.3029 0.3237 0.0454 0.7059 0.9189 0.0469 A2 0.0422 0.0646 0.1282 0.2657 0.2707 0.3630 0.0469 0.6921 0.9187 0.0498

n=20 Wm2 0.0481 0.0665 0.1215 0.2477 0.2785 0.3681 0.0450 0.7004 0.9229 0.0486 D 0.0485 0.0548 0.0890 0.1730 0.2114 0.3284 0.0386 0.5372 0.7791 0.0447 W2 0.0978 0.0792 0.1293 0.3367 0.6189 0.5694 0.0480 0.9739 0.9993 0.0496 A2 0.0896 0.0813 0.1623 0.4176 0.6388 0.6078 0.0466 0.9826 0.9998 0.0500

n=40 Wm2 0.0936 0.0847 0.1544 0.3987 0.6332 0.6090 0.0452 0.9810 0.9998 0.0505 D 0.0741 0.0728 0.1111 0.2831 0.4693 0.5917 0.0394 0.8981 0.9902 0.0448

M=20 Distribution 0.5P(3.419)+0.5P(6.581) 0.5P(2.764)+0.5P(7.236) 0.5NB(20,0.837)+0.5NB(20,0.767) 0.5NB(20,0.865)+0.5NB(20,0.744) P(5) Beta-Bin(7,1,0.4) Beta-Bin(8,2,1.2) Bin(15,1/3) Bin(10,1/2) NB(20,0.8) R 1.50 2.00 1.50 2.00 1.00 1.00 1.00 0.67 0.50 1.25 W2 0.0956 0.3203 0.0793 0.2473 0.0672 0.2738 0.0431 0.2375 0.4928 0.0429 A2 0.1316 0.4381 0.1090 0.3677 0.0481 0.3084 0.0430 0.2090 0.4760 0.0461

n=20 Wm2 0.1229 0.4191 0.1004 0.3411 0.0543 0.3116 0.0425 0.2240 0.4953 0.0470 D 0.0883 0.2811 0.0763 0.2247 0.0478 0.2685 0.0370 0.1434 0.3133 0.0421 W2 0.1208 0.5712 0.0983 0.4357 0.0849 0.4951 0.0469 0.4937 0.8672 0.0510 A2 0.1604 0.6828 0.1445 0.5753 0.0758 0.5239 0.0462 0.5238 0.9009 0.0495

n=40 Wm2 0.1540 0.6660 0.1332 0.5479 0.0803 0.5319 0.0474 0.5262 0.8992 0.0493 D 0.1089 0.5083 0.0920 0.3920 0.0632 0.4938 0.0401 0.3551 0.7169 0.0482

DISCUSSION
We have developed an exact Kolmogorov Smirnov goodness-of-fit test for the negative

binomial distribution with unknown probability of success. The test involves conditioning on the sample total, and exact

RRJoS (2013) 1-13 STM Journals 2013. All Rights Reserved

Page 12

Research & Reviews: Journal of Statistics Volume 2, Issue 1, ISSN: 2278-2273 __________________________________________________________________________________________

critical values may be obtained without simulation or exhaustive enumeration. The test is slightly conservative, but the true level is close to the nominal level except when the unknown population mean is very small. Our power study in Section 6 suggests that whether the mean of the unknown distribution is small or large, though the power of the exact KolmogorovSmirnov test falls short of that provided by tests such as the conditional version of the Cramrvon Mises test recommended in [14] because of lacking to reach the desired level due to discreteness, in most of the cases it is also comparable. The advantage of the new test over such alternative tests is that one can compute exact p-values using the method from Sections 2 and 3. Thus, it is not necessary to simulate or to consult tables of critical values.

REFERENCES
1. Conover W.J. Journal of American Statistical Association 1972. 67. 591 596p. 2. Pettitt A. N. and Stephens M. A. Technometrics 1977. 19. 205210p. 3. David F. N. and Johnson N. L. Biometrika 1948. 35. 182190p.

4. Lilliefors H. W. Journal of American Statistical Association 1967. 62. 399 402p. 5. Lilliefors H. W. Journal of American Statistical Association 1969. 64. 387 389p. 6. Campbell D. B. and Oprian C. A. Biometrical Journal 1979. 21. 1724p. 7. Henze N. Canadian Journal of Statistics 1996. 24. 8193p. 8. Frey J. Journal of Statistical Computation and Simulation 2012. 82. 10231033p. 9. Lockhart R. A. et al. Biometrika 2007. 94. 992998p. 10. Gonzlez-Barrios J. M. et al. Metrika 2006. 64. 7794p. 11. Bliss C. I. and Fisher R. A. Biometrics 1953. 9. 176200p. 12. Stirrett et al. Scientific Agriculture. 1937. 17. 587591p. 13. Greenwood M. and Yule G. U. Journal of the Royal Statistical Society. 1920. 83. 255289p. 14. Choulakian V. et al. Canadian Journal of Statistics. 1994. 22. 125137p. 15. OReilly F. and Gracia-Medrano L. Communications in Statistics Theory Methods. 2006. 35. 541549p. 16. Kendall M. G. and Stuart A. The Advanced Theory of Statistics. 1979. Vol. 2. 4th Edn.

RRJoS (2013) 1-13 STM Journals 2013. All Rights Reserved

Page 13

You might also like