Professional Documents
Culture Documents
2064
METHODS
2(p1i p2i)2
.
i1 p1i p2i
k
(1)
(Rki1i) k i1
pi .
ki1(i) i1
(2)
Here, (1, , k) are regarded as hyperparameters specifying the prior distribution. We use this distribution as a prior for allele frequencies.
The posterior distribution is obtained by multiplying
the likelihood function, which is multinomial distribution in this case, by the prior. The posterior distribution
of p is then given by
P(p|n)
(Rki1ni)! (Rki1i)
ni! ki1(i)
k
(Rki1i)
(Rki1ni)!
(i ni)
,
(
k
n
!
(R
Rn
)
i
1
i
i
i)
i
1
i
piini1dp
i1
1 ij
Cjh (Rkj 1ij i
Rkj 1nhij)
J
j1h1
(Rkj )
(ij nhij)
, (3)
(ij)
i1
kj
Rki1ni Rki1i
.
1 Rki1i
(4)
(5)
(6)
We estimate RJj1(kj 1) 1 free parameters numerically, including 2, which is assumed to be the same
among loci, and 1j, , kj1,j for locus j, and kj is
estimated by Rikj11
ij.
The binomial and multinomial counts are assumed to
be taken by a simple random sampling, so the dispersion
parameter 2 is considered to indicate the magnitude
of overshooting from a simple random sample. McCullagh and Nelder (1983) stated that The simplest and
perhaps the most common mechanism of overdispersion is clustering in the populations. Kitada et al.
(1994) estimated the dispersion parameter for fish tag
recovery data and showed that the variances of the estimated mortality rates were
2( 14.73) times larger
than those assuming the multinomial model, which was
considered to be caused by the aggregation of the
tagged fish in the fishing ground. In genetic data analysis, overdispersion corresponds to the variance of an
2065
j
2n1.n2.RJj1D
(n . n .)
1
((RJj1(kj 1) 1)/2)
,
(RJj1(kj 1)/2)
(7)
2066
1
k1
i1
2(p0i pti)2
.
p0i pti
(8)
(9)
t
.
2[Fk
2/(2n0)
2/(2nt) 1/N]
(10)
CASE STUDIES
2067
TABLE 1
Allele frequencies of the mtDNA D-loop region from Tabata and Mizuta (1997) and estimated
hyperparameters for four populations of red sea bream from eastern Japan
Sample size:
HinfI
MspI
TaqI
RsaI
Tanabe
Bay
72
Tomogashima
Channel
95
Sea of
Japan
93
Bingo
Nada
90
Hyperparameter
0.458
0.375
0.167
0.903
0.097
0.958
0.042
0.097
0.264
0.306
0.180
0.153
0.411
0.442
0.147
0.863
0.137
0.811
0.189
0.137
0.305
0.316
0.116
0.126
0.376
0.409
0.215
0.882
0.118
0.806
0.194
0.215
0.312
0.312
0.075
0.086
0.378
0.456
0.166
0.867
0.133
0.856
0.144
0.178
0.266
0.289
0.078
0.189
43.855
43.872
18.826
92.864
13.688
91.248
15.305
17.015
30.253
32.871
11.802
14.613
TABLE 2
Means and SDs of the posterior distribution of the genetic distance for the red sea bream populations
Mean
SD
D12
D13
D14
D23
D24
D34
0.0585
0.0207
0.0523
0.0196
0.0530
0.0192
0.0229
0.0112
0.0241
0.0116
0.0244
0.0118
2068
two alleles and two had three alleles. Following the data
processing techniques of Williamson and Slatkin
(1999), we also combined the two least common allelic
classes for the two loci with three alleles and created a
diallelic frequency set. To estimate the hyperparameters, we allocated sample sizes of 86 for 1961, 110 for
1977, and 72 for 1993 according to the diallelic frequencies for each locus to obtain the number of individuals
corresponding to the frequencies. Because the data had
one sample for each year, it was not possible to estimate
the hyperparameters for each locus. Therefore, we assumed that the seven loci were independent of each
other and had common hyperparameters. The two common hyperparameters and the dispersion parameter
were estimated at 16.232, 3.089, and 9.74, with 2n.
(172 220 144)/3 178.7.
Fk between the year of 1977 and 1993 ranged from
0.0087 to 0.1264 with the mean and SD being 0.0479
and 0.0150, respectively. The posterior distribution of
Fk is given in the left side of Figure 5. The posterior
distribution of the Ne estimate was obtained by substituting the posterior distribution of Fk into Equation 10,
eliminating the term 1/N, with t 4, as given in Miller
2069
TABLE 3
Means and 95% credibility regions of the posterior distribution of the standardized genetic distance
for the red sea bream populations
2 1.80a
I12
I13
I14
I23
I24
I34
Mean
SD
0.4671
0.2706
0.2735
0.6241
0.5866
0.5768
0.5774
0.5747
0.5567
0.5284
0.5332
0.5276
2 1.72b
95% CR
[0.4819,
[0.6706,
[0.6259,
[1.4582,
[1.4226,
[1.4047,
1.4232]
1.2169]
1.1977]
0.2767]
0.3163]
0.3319]
Mean
SD
0.5448
0.3435
0.3464
0.5729
0.5345
0.5244
0.5914
0.5886
0.5702
0.5411
0.5461
0.5404
95% CR
[0.4272,
[0.6205,
[0.5747,
[1.4272,
[1.3907,
[1.3723,
1.5240]
1.3127]
1.2931]
0.3497]
0.3903]
0.4062]
Figure 3.Posterior distributions of the standardized genetic distance (I ) for the red sea bream populations taking the point
estimate of the dispersion parameter 1.80 (left) and the 95% lower limit 1.72 (right) into account.
2070
1996b
March
103
April
107
Male
84
Female
74
Hyperparameter
0.282
0.718
0.447
0.553
0.243
0.757
0.393
0.607
0.191
0.809
0.321
0.679
0.149
0.851
0.324
0.676
28.630
102.326
48.993
81.963
Figure 4.Posterior distributions of the standardized variance of allele frequency change Fk and effective population size Ne
of herring for the 95% lower limit of the dispersion parameter (2 1) using data in Table 4.
2071
TABLE 5
TABLE 6
Meana
95% CR
/10,000b
1.00c
2.39d
7.51e
350
16,841
[20]
[35]
[]
1,221
6,832
10,000
Meana
95% CR
1.00c
M and Kd
5.51e
9.74f
73
72
1,065
20,606
[29190]
[17258]
[123]
[]
/10,000b
6
8,480
9,998
Figure 5.Posterior distributions of the standardized variance of allele frequency change Fk and effective population size Ne
of northern pike for the 95% lower limit of the dispersion parameter (2 5.51) and that with no overdispersion (2 1.00)
using 19771993 data from Miller and Kapuscinski (1997).
2072
1997
1997
Male
95
Female
105
Male
64
Female
65
Male
50
Female
50
Hyperparameter
0.184
0.816
0.521
0.479
0.200
0.800
0.400
0.600
0.141
0.859
0.422
0.578
0.192
0.808
0.315
0.685
0.290
0.710
0.280
0.720
0.230
0.770
0.300
0.700
6.341
22.235
10.794
17.783
TABLE 8
Means and 95% credibility regions of the posterior distribution of the effective population size of ayu for
95% lower limit of the dispersion parameter (3.04) obtained using data in Table 7
Fk
Period
19961997
19971998
19971999
19971999
19971999
a
t
(F1)
(F2)
(F3)
(Fmean)
(Fmean)d
1
1
2
1
Mean
0.0263
0.0379
0.0474
0.0321
Ne
SD
Mean
SD
95% CR
/10,000c
0.0121
0.0189
0.0191
0.0120
350
240
796
491
136
4,024
1,388
18,538
3,294
3,720
[32]
[17]
[22]
[35]
[13589e]
8,453
8,094
4,927
8,677
377
Number of generations.
Mean of positive Ne.
c
Number of Ne that took in 10,000 simulations.
d
No overdispersion is assumed (2 1).
e
90% credibility region.
b
2073
Figure 6.Posterior distributions for the four estimators of the standardized variance of allele frequency change Fk and
effective population size Ne of ayu broodstock using 19961998 data in Table 7.
2074
2
HinfI
MspI
TaqI
RsaI
0.5
Original
106.844
1.40
43.956
43.989
18.899
93.049
13.795
90.981
15.863
16.961
30.155
32.976
11.841
14.910
106.553
1.80
43.855
43.872
18.826
92.864
13.688
91.248
15.305
17.015
30.253
32.871
11.802
14.613
106.168
2.62
43.603
43.887
18.679
93.007
13.161
90.853
15.315
16.759
30.091
33.032
11.777
14.509
106.119
3.44
44.301
43.932
17.886
92.027
14.093
91.949
14.170
16.886
29.762
32.855
11.732
14.885
106.075
4.25
43.683
43.986
18.406
93.033
13.042
91.253
14.822
16.383
30.310
32.791
11.910
14.682
p(1 p)
{FST(2n 1) 1},
2n
(12)
2 FST(2n 1) 1,
from which we can see larger FST gives larger overdispersion. From Equation 13, we also have the relationship
FST
2 1
.
2n 1
(14)
Rannala and Hartigan (1996) proposed the pseudomaximum-likelihood method (PMLE) for estimating
the rate of gene flow into island populations using the
distribution of alleles in samples from a number of
islands. We confirmed that their likelihood function
for multiple loci (p. 149 Equation 10) coincides with
Equation 3 by using the relationship of (n)
(n 1)!. In PMLE, i is treated by pi. Here, pi is a
TABLE 10
Estimated hyperparameters and the dispersion parameter
from the mtDNA haplotype distribution among islands
for Channel Island foxes (Table 2 of Rannala
and Hartigan 1996), using the full-likelihood
function (Equation 3)
Parameter
Rannala and
Hartigan (PMLE)
0.41
(0.35)a
0.1189
0.0574
0.0082
0.0779
0.1476
18.38c
1
2
3
4
5
2
a
(13)
This article
(MLE)
0.45
[0.18, 0.84]b
0.0945
0.0428
0.0382
0.0992
0.1760
17.89
SD.
95% confidence interval estimated by the likelihood-ratio
test.
c
Estimated by Equation 5.
b
2075
lations exist, overdispersion arises and affects the estimation of the effective population size. It is then important
to collect data on the spatial variation. At the same time,
when many isolated subpopulations exist, the effective
population size is considered to be close to the size of
a subpopulation. When this occurs, it seems dangerous
to dismiss the variation between generations as overdispersion. It needs further consideration.
Practical considerations on estimating Ne: From the
approximate variance formula of Ne estimate (Pollak
1983, Equations 28 and 29; Waples 1989, Equation 17),
it is clear that increasing the sample size, the number
of loci, and the number of generations t simultaneously
ensures greater precision for the estimate of Ne (Waples
1989). Miller and Kapuscinski (1997) stated that if
Ne is expected to be moderately large, the sample size,
the number of loci, and the number of generations
should all be as large as possible. To improve the precision of the estimate of Ne, it is essential to reduce the
sampling variance and increase information on genetic
drift.
Sample size: The idea of the temporal method is to
estimate Ne from the genetic change over time described
by F-statistics estimated from the sample allele frequencies. F-statistics, then, consist of the genetic drift and
the sampling variance. To evaluate the genetic drift, we
have to subtract the sampling variances from the
F-statistics. The second and third terms in the denominator of Equation 10 are the sampling variances at generations 0 and t. If Ne is large, the genetic drift may be
small, so the denominator of Equation 10 would take
a negative value, which leads to an infinite Ne for small
sample sizes n0 and nt. If overdispersion arises, the effect
of subtracting the sampling variances becomes 2 times
larger, which is why we failed to estimate the upper limit
of the credibility region of Ne. As pointed out by Waples
(1989), the temporal method should be useful for cases
of small Ne, where larger genetic drift is expected. Even
in the case of a small Ne, the problem of an infinite Ne
estimate can occur due to large sampling variance, as
shown in the ayu studies, because of the small sample
sizes. When one uses the temporal method, reducing
the sampling variance is indispensable. The sample size
should be kept as large as possible. A larger sample size
also provides greater information on the genetic drift.
Number of loci: Williamson and Slatkin (1999) developed a maximum-likelihood temporal method to estimate Ne and compared estimates with those derived with
the F-statistic method. The simulation result in their
Table 1 showed that increasing the number of loci reduced the variance and bias in both estimators, although
when the number of loci was 50, the corresponding
reduction of variance and bias was not large, and the
total information on allele frequency changes did not
increase much. The results of Williamson and Slatkin
(1999) suggest that information on genetic drift was
not improved much even if the number of loci was
2076
2077
males taken in 1996 were analyzed as independent samples even though they were the same sample, leading
to a smaller value of Ne, which caused the rate of inbreeding to be overestimated. The mean for northern pike
was smallest with the narrowest credibility region. However, these values may be underestimated because of an
overestimated dispersion parameter of northern pike,
which was the largest among our four case studies. There
was only one sample for one sampling year, and we
assumed that the seven loci had common hyperparameters, so the estimated dispersion parameter may include
the change of the allele frequencies.
Multistage sampling in hatcheries: All existing methods assume that Ne is drawn from a gamete pool by a
simple random sampling. This is an appropriate assumption for the reproduction of a wild population. However,
for broodstocks cultured over generations in hatcheries,
candidates of the next broodstock are sampled from
the progenies produced by the broodstock. Therefore,
Ne is drawn from the progenies by a two-stage sampling.
If artificial fertilization using a part of the candidates is
performed, as in the case study of ayu, Ne is drawn from
the progenies by a three-stage sampling and the sample
is drawn from the candidates to estimate the allele frequencies, which is therefore a two-stage sampling of the
progenies. The multistage sampling must lead to the
different form of V(x y) given in Waples (1989). This
is a problem that needs further research, but it should
be noted that the variances corresponding to the twostage and three-stage sampling become small when the
sample sizes are large. In the case of ayu, a total of
30004000 candidates were sampled from the progenies
and cultured in rearing tanks, and 1500 adult fish from
the candidates were used for artificial fertilization.
Hence the sample allele frequencies of ayu were expected to represent those of the progenies produced
by the broodstock. However, if the sample sizes are
small, V(x y) is seriously affected.
We thank Zhao-Bang Zeng and two anonymous referees for their
comments on an earlier version of this article. We also thank Ray Timm
for critical review of the manuscript, Fumio Tajima for important
suggestions made during our research, Kazutomo Yoshizawa for biological information on ayu broodstocks including unpublished data,
and Masashi Yokota for helpful discussions.
LITERATURE CITED
Ando, T., and N. Ohkubo, 1997 A study for estimating effective
population size from allozyme allele frequencies, Heisei 8 nendo
seitaikei hozengata syubyo seisan gijyutu kaihatsu kenkyu seika no gaiyo,
pp. 7479. Fishery Agency of Japan, Tokyo (in Japanese).
Bartley, D. M., 1999 Marine ranching, a global perspective, pp.
7990 in Stock Enhancement and Sea Ranching, edited by B. R.
Howell, E. Moksness and T. Svasand. Blackwell, Oxford.
Bartley, D. M., D. B. Kent and M. A. Drawbridge, 1995 Conservation of genetic diversity in a white seabass hatchery enhancement
program in southern California. Am. Fish. Soc. Symp. 15: 249
258.
Blankenship, H. L., and K. M. Leber, 1995 A responsible approach
to marine stock enhancement. Am. Fish. Soc. Symp. 15: 165175.
2078
Nei, M., and F. Tajima, 1981 Genetic drift and estimation of effective
population size. Genetics 98: 625640.
Peterman, R. M., 1990 Statistical power analysis can improve fisheries research and management. Can. J. Fish. Aquat. Sci. 47: 215.
Pollak, E., 1983 A new method for estimating the effective population size from allele frequency changes. Genetics 104: 531548.
Rannala, B., and J. A. Hartigan, 1996 Estimating gene flow in
island populations. Genet. Res. 67: 147158.
Ritter, J. A., 1997 The contribution of Atlantic salmon Salmo salar
L. enhancement to a sustainable resource. ICES J. Mar. Sci. 54:
11771187.
Roff, D. A., and P. Bentzen, 1989 The statistical analysis of mitochondrial DNA polymorphisms: 2 and the problem of small
sample. Mol. Biol. Evol. 6: 539545.
Sanghvi, L. D., 1953 Comparison of genetical and morphological
methods for a study of biological differences. Am. J. Phys. Anthropol. 11: 385404.
Snedecor, G. W., and W. G. Cochran, 1967 Statistical Methods, Ed.
6. Iowa State University Press, Ames, IA.
Tabata, K., and A. Mizuta, 1997 RFLP analysis of the mtDNA
D-loop region in red sea bream Pagrus major population from
four locations of western Japan. Fish. Sci. 63: 211217.
Tajima, F., 1992 Statistical method for estimating the effective population size in Pacific salmon. J. Hered. 83: 309311.
Utter, F., 1998 Genetic problems of hatchery-reared progeny released into the wild, and how to deal with them. Bull. Mar. Sci.
62(2): 623640.
Walters, C. J., 1988 Mixed-stock fisheries and the sustainability
of enhancement production for chinook and coho salmon, pp.
109115 in Salmon Production, Management, and Allocation, edited
by W. J. McNeil. Oregon State University Press, Corvallis, OR.
Waples, R. S., 1989 A general approach for estimating effective
population size from temporal changes in allele frequencies.
Genetics 121: 379391.
Waples, R. S., 1990 Conservation genetics of Pacific salmon: III.
Estimating effective population size. J. Hered. 81: 277289.
Waples, R. S., 1991 Genetics interactions between hatchery and wild
salmonids: lessons from the Pacific Northwest. Can. J. Fish. Aquat.
Sci. 48(Suppl. 1): 124133.
Waples, R. S., 1999 Dispelling some myths about hatcheries. Fisheries 24(2): 1221.
Weir, B. S., 1996 Genetic data analysis II. Sinauer Associates, Sunderland, MA.
Williamson, E. G., and M. Slatkin, 1999 Using maximum likelihood to estimate population size from temporal changes in allele
frequencies. Genetics 152: 755761.
Wright, S., 1931 Evolution in Mendelian populations. Genetics 16:
97159.
Wright, S., 1945 The differential equation of the distribution of
gene frequencies. Proc. Natl. Acad. Sci. USA 31: 383389.
Wright, S., 1951 The general structure of populations. Ann. Eugen.
15: 323354.
Wright, S., 1969 Evolution and Genetics of Populations: The Theory of
Gene Frequencies, Vol. 2. University of Chicago Press, Chicago.
Yoshizawa, K., 1997 Inbreeding of cultured ayu from allozyme allele
frequencies, Heisei 8 nendo seitaikei hozengata syubyo seisan gijyutu
kaihatsu kenkyu seika no gaiyo, pp. 3038. Fishery Agency of Japan,
Tokyo (in Japanese).
Communicating editor: Z-B. Zeng
APPENDIX
1
x(k 1)/2 1ex/2dx.
2k/2(k/2)
1
(2y)(k1)/21ey 2dy.
2k/2(k/2)
2079
y(k1)/21eydy
k1
,
2
finally we have
E(X) 2
((k1)/2)
.
(k/2)