You are on page 1of 128

Principles and

Procedures of QTL
Mapping

Zhiqiu Hu & Shizhong Xu
4/14/2010









The correct bibliographic citation for this program is
Zhiqiu Hu and Shizhong Xu (2009). PROC QTL - A SAS Procedure for Mapping
Quantitative Trait Loci. International Journal of Plant Genomics 2009: 3
doi:10.1155/2009/141234.
PROC QTL Version 2.0
Copyright 2008, University of California, Riverside, CA, USA
All rights reserved.
University of California, Riverside
900 University Ave., Riverside, CA 92521

i

Contents
INTRODUCTION TO QTL MAPPING .............................................................................. 1
Quantitative traits .............................................................................................. 1
Mapping populations ......................................................................................... 2
Molecular markers ............................................................................................. 9
Linkage map of markers .................................................................................. 10
Interval mapping .............................................................................................. 10
Multiple QTL mapping ..................................................................................... 11
Association mapping ....................................................................................... 12
MULTIPOINT METHOD FOR QTL GENOTYPE INFERENCE ............................................. 13
Mapping function ............................................................................................. 13
Markov chain property ..................................................................................... 14
Virtual map ...................................................................................................... 20
MAXIMUM LIKELIHOOD METHOD ............................................................................... 23
Likelihood function ........................................................................................... 23
Newton-Raphson algorithm ............................................................................. 24
Information matrix and estimation errors ......................................................... 25
Likelihood ratio test statistics ........................................................................... 26
Wald test statistics ........................................................................................... 27
BAYESIAN METHOD ................................................................................................ 28
Introduction to Bayesian method ..................................................................... 28
Markov chain Monte Carlo algorithm ............................................................... 29
Diagnoses of convergence of Markov chain .................................................... 32
Post MCMC analysis ....................................................................................... 35
INTERVAL MAPPING FOR NORMALLY DISTRIBUTED TRAITS .......................................... 36
Simple least squares method .......................................................................... 37
Weighted least squares ................................................................................... 39
Fisher scoring algorithm .................................................................................. 42
Maximum likelihood method ............................................................................ 44
ii

Hypothesis testing ........................................................................................... 51
Remarks on the four methods of interval mapping .......................................... 51
INTERVAL MAPPING FOR DISCRETE TRAITS ............................................................... 54
Generalized linear model for ordinal traits ....................................................... 54
Expectation substitution method ...................................................................... 57
Fisher scoring method ..................................................................................... 59
Approximate mixture model ............................................................................. 60
Mixture model maximum likelihood method .................................................... 62
Variance-covariance matrix for estimated parameters .................................... 63
Hypothesis testing ........................................................................................... 64
Extension to other traits ................................................................................... 64
MAPPING QUANTITATIVE TRAIT LOCI UNDER SEGREGATION DISTORTION ...................... 69
The likelihood of markers ................................................................................ 69
The likelihood of phenotypes ........................................................................... 71
Joint likelihood of markers and phenotypes .................................................... 72
EM algorithm for the joint analysis ................................................................... 72
Hypothesis testing ........................................................................................... 74
Standard errors of the estimated parameters .................................................. 76
INTERVAL MAPPING FOR MULTIPLE TRAITS ................................................................ 80
Multivariate model ........................................................................................... 80
Least square method ....................................................................................... 81
Maximum likelihood method ............................................................................ 82
Hypothesis testing ........................................................................................... 83
BAYESIAN SHRINKAGE METHOD FOR QTL MAPPING .................................................. 85
Multiple QTL model ......................................................................................... 85
Prior, likelihood and posterior .......................................................................... 86
Fixed interval ................................................................................................... 91
Random walk................................................................................................... 92
Moving interval ................................................................................................ 93
Summary of the MCMC sampling process ...................................................... 95
Post MCMC analysis ....................................................................................... 96
iii

Bayesian mapping for ordinal traits ................................................................. 97
Sampling missing phenotypic values ............................................................... 99
Permutation ................................................................................................... 100
BAYESIAN MAPPING FOR DISCRETE TRAITS ............................................................. 102
Generalized linear model .............................................................................. 102
Binary data .................................................................................................... 104
Binomial data................................................................................................. 105
Poisson data ................................................................................................. 106
EMPIRICAL BAYESIAN METHOD .............................................................................. 107
Main QTL effect model .................................................................................. 107
Epistatic QTL effect model ............................................................................ 108
Simplex algorithm .......................................................................................... 109
BAYESIAN MAPPING FOR MULTIPLE TRAITS ............................................................. 111
Multiple continuous traits ............................................................................... 111
Multiple binary traits ...................................................................................... 117
Mixture of continuous and binary traits .......................................................... 118
Missing values ............................................................................................... 118
REFERENCE ........................................................................................................ 119
1

INTRODUCTION TO QTL MAPPING
QUANTITATIVE TRAITS
A quantitative trait, by definition, is defined as a trait that varies
quantitatively. The phenotypic values of individuals vary by degree rather
than by kind. These traits are usually controlled by the segregation of
multiple genes plus environmental factors. Some genes have large effects
and some have small effects. Some traits are influenced more by genetic
effects than by environmental effects and some are influenced more by
environments than by genetic effects. Genes that control the genetic
variation of quantitative traits are called quantitative trait loci (QTL)
(TANKSLEY 1993). Because of the polygenic nature and sensitiveness to
environmental changes, these traits must be studied in large populations
and using sophisticated statistical tools to dissect the genetic architecture.
Finding the genome locations of the QTL and estimating the effects of the
QTL using molecular markers as anchors is called QTL mapping (TANKSLEY
1993). QTL mapping almost exclusively uses the linear model to describe
the relationship between the phenotypic value and the putative QTL. The
most commonly used method is the maximum likelihood method. Likelihood
ratio test (WILKS 1938) is often used as the test statistic.
Some traits have a discrete distribution, e.g., disease resistance traits,
where the phenotype is measured by kind, e.g., affected and normal (XU
and ATCHLEY 1996). Very few disease resistance traits are controlled by a
single gene (TURNPENNY and ELLARD 2005). Most traits, however, are
controlled by multiple genes plus environmental effects. These traits,
although phenotypically very simple, are genetically complicated. They are
polygenic traits and thus are often defined as complex traits (LANDER and
SCHORK 2006). QTL mapping also covers this kind of complex traits. The
way to handle these traits is to hypothesize an underlying continuously
distributed liability under each discrete trait (WRIGHT 1934). The connection
between the unobserved liability and the observed phenotype is through a
threshold. Below the threshold, the individual will have the normal
phenotype. Above the threshold, it will show the abnormal (disease)
phenotype. Using the threshold model, we can map QTL controlling the
unobserved liability (RAO and XU 1998; XU et al. 2005b; XU and ATCHLEY
1996). The QTL parameters are estimated in the scale of liability. Therefore,
QTL mapping also covers these complex traits. Because we are mapping
QTL for the liability and the liability is not observable, we often use the
generalized linear model. A generalized linear model, by definition, is a
generalization of the linear model. All technical tools developed in general
linear model apply to the generalized linear model.
2

MAPPING POPULATIONS
Quantitative genetic theory largely deals with allelic effects and allele
frequencies. The accuracy of parameter estimation depends on the allelic
frequencies. In wild populations, we cannot control the allelic frequencies of
the population. Therefore, estimation of genetic effects cannot be
guaranteed with the optimal accuracy. In highly designed experiments, we
can control the allelic frequencies and thus can design an optimal
experiment to ensure that the genetic parameters are estimated with high
accuracy. The current version of PROC QTL can handle the following
mating designs: BC (backcross), FW (four-way cross), RIL (recombinant
inbred lines) and DH (double haploid).
F
2
mating design
The most popular design of experiments in QTL mapping is the F
2
mating
design. An
2
F design is through a line crossing experiment involving two
inbred lines, called P
1
and P
2
. The
1
F hybrid from the cross of P
1
and P
1
is
then selfing to generate a segregating
2
F family. QTL mapping can be
performed using the
2
F family. In terms of allelic frequencies of segregating
loci, they are optimal because each parent contributes an equal number of
alleles. Let
1 1
A A and
2 2
A A be the genotypes of the two parents, respectively,
and
1 2
A A be the genotype of the hybrid. There are three possible genotypes
in the progeny of the
2
F family:
1 1
A A ,
1 2
A A and
2 2
A A . The ratio of the three
genotypes is 1: 2:1. Let
11
G ,
12
G and
22
G be the genotypic values for the
three genotypes, respectively. The additive and dominance effects of the
locus are defined as

11 11 22 11 22
1 1
( ) ( )
2 2
a G G G G G = + = (1.1)
and

12 11 22
1
( )
2
d G G G = + (1.2)
respectively. Rather than estimating the genotypic values, we actually
estimate and test a and d in QTL mapping. The linear model for a single
QTL is given

j j j j
y X a W d e = + + +
(1.3)
where
j
y is the phenotypic value of individual j , is the population mean
(or intercept),
j
X is an indicator variable (for the additive effect) assigned a
3

value of 1, 0 or -1, respectively, for the three genotypes,
1 1
A A ,
1 2
A A and
2 2
A A ,
j
W is an indicator variable (for the dominance effect) assigned a
value of 0, 1 or 0, respectively, for the three genotypes,
1 1
A A ,
1 2
A A and
2 2
A A ,
j
e is the residual error following a
2
(0, ) N o distribution. The genotype
indicator variables,
j
X and
j
W , can be defined in many different scales.
The scales are usually chosen for statistical convenience rather than for
biological meaningfulness because the scales only affect the estimation of
the genetic effects and do not affect the results of statistical tests. PROC
QTL actually estimates the genotypic values, not the genetic effects. Users
are asked to provide scales of users choice in the ESTIMATE statement of
PROC QTL. If there is no segregation distortion, the following scales for
j
X
and
j
W are recommended (YANG et al. 2006),
{ }
2 0 2
j
X = and
{ } 1 1 1
j
W = for the three genotypes,
1 1
A A ,
1 2
A A and
2 2
A A . This scale
choice leads to var( ) var( ) 1
j j
X W = = and cov( , ) 0
j j
X W = , and thus

2 2 2
var( )
j
y a d o = + + (1.4)
which is mathematically more attractive than any other scales.
BC mating design
Starting from the two parents and the
1
F hybrid, a BC family is generated
through crossing the
1
F individual back to one of the two parents. If P
1
is the
backcrossed parent, the BC family is called BC
1
. If P
2
is the backcrossed
parent, the BC is called BC
2
. Take BC
1
for example, there are two possible
genotypes,
1 1
A A and
1 2
A A . It is impossible to estimate the dominance effect
because there is no enough degree of freedom to do so. The QTL effect is
defined as

11 12
' a G G a d = = (1.5)
Apparently, the QTL effect defined this way is the difference between the
additive effect and the dominance effect. This effect is equivalent to the
additive effect only if the dominance effect is absent. Using a BC family for
QTL mapping is not as powerful as using the
2
F family because: (1) for the
same sample size, a BC family only carries half the number of meioses as
the
2
F family; (2) the additive and dominance effects are confounded. When
the dominance effect is absent, we need a double sample size for a BC
design to achieve the same statistical power as the
2
F design. Under the
assumption of no dominance, the model appears
4


j j j
y X a e = + +
(1.6)
where { } 1 0
j
X = for the two genotypes,
1 1
A A and
1 2
A A . The fact that the
BC design is not as powerful as the
2
F design can be shown by looking at
the variances of
j
X in the two different families. The scale of
j
X in the
2
F
design must be { } 1 0 1
j
X = for the three genotypes in order to compare
the powers for the two different designs. For the BC design var( ) 1/ 4
j
X = ,
but var( ) 1/ 2
j
X = for the
2
F design. The design with a larger var( )
j
X has
more power than the design with a smaller var( )
j
X , which explains why the
2
F design is more powerful than the BC design.
RIL mating design
Recombinant inbred line design also involves two inbred parents, the
1
F
hybrid and the
2
F family. Each
2
F progeny is undergoing many generations
of continuous selfing until all loci are fixed. This may take about 20
generations to reach 99% homozygosity. Eventually, each line descended
from the cross is an inbred line, but carries genes from different parents
across loci. In other words, within each locus, an RIL line carries the same
allele from one parent, but between loci, the contributing parents may
alternate. Therefore, an RIL carries a mosaic genome of the two parents.
The two homozygotes are
1 1
A A and
2 2
A A with the QTL effect at this locus
defined as

11 22
' ( ) 2 a G G a a a = = = (1.7)
Therefore, using the RIL mating design is more powerful than a BC or
2
F
design because the QTL effect defined is doubled. If we define { } 1 1
j
X =
for the two homozygotes,
1 1
A A and
2 2
A A , the genetic variance at this locus
is

2 2 2
var( )( ') var( )(2 ) 4
j j
X a X a a = = (1.8)
which is much larger than the corresponding genetic variance for the BC
design (
2
/ 4 a ) and the genetic variance for the
2
F design (
2
/ 2 a ) under the
same scale of variable
j
X . In addition, for a sample size n of an
2
F family,
there are 2n meioses, but the same number of RIL individuals will cumulate
many more meioses. The genetic material has altered many times across
the genome (loci), leading to a high frequency of recombination between
loci. Therefore, using the RIL design has both advantages of high power
5

and high resolution (fine mapping) over using the
2
F design. Let r be the
recombination frequency between two loci per meiosis (in a BC for
example), after many generations of cumulated meioses, the recombination
frequency in the RIL will become

1
2
1 2
r
c
r
=
+
(1.9)
This multiple meiosis corrected recombination fraction is larger than the
original recombination fraction. Therefore, we will expect to see many more
crossovers between two loci in the RIL design than those in a BC family.
The genome essentially gets longer and thus allows fine mapping.
Recombinant inbred lines generated through selfing are called RIL
1
. In
animals where selfing does not happen, recombinant inbred lines can be
generated through continuous brother-sister mating. The sib-mating
approach will take more generations to reach the same homozygosity as
the selfing approach. This type of RIL is called RIL
2
. The corresponding
correction for the recombination frequency is

2
4
1 6
r
c
r
=
+
(1.10)
Statistical methods of QTL mapping for RIL and BC are identical once r in
BC is replaced by c in RIL.
DH mating design
A double haploid individual is created by duplicating a gamete via chemical
treatment. A DH individual is a diploid homozygote for all loci. Starting from
an
2
F progeny derived from the cross of two inbred lines, each of the two
gametes of an
2
F is duplicated and the two copies are fused to give two DH
individuals. We need n
2
F progeny to produce 2n independent DH
individuals. Like the RIL design, there are two possible genotypes in a DH
population,
1 1
A A and
2 2
A A with the QTL effect defined as

11 22
' ( ) 2 a G G a a a = = = (1.11)
Therefore, DH design should provide the same power as an RIL design.
However, the resolution of DH mapping is equivalent to an
2
F design
because the number of meioses is the same. The mating designs discussed
so far all involve only two parents. Therefore, these mating design are
called bi-parental mating designs.
FW mating design
The four-way cross mating design requires four inbred lines and two rounds
of crossing (XU 1996). In the first round of crossing,
(12)
1 2 1
P P F and
6

(34)
3 4 1
P P F , two independent
1
F hybrids are generated. In the second
round of crossing,
(12) (34)
1 1
F F FW , the two different
1
F hybrids are
crossing to generate a four-way cross family. The mating design is more
clearly described using the genotypic labels. In the first round of cross, we
have
1 1 2 2 1 2
AA A A AA and
3 3 4 4 3 4
A A A A A A . In the second round of
cross, we get
{ }
1 2 3 4 1 4 1 4 2 3 2 4
, , , AA A A AA AA A A A A (1.12)
There are four possible genotypes in the four-way cross family. The labels
of the alleles need to be changed again in order to describe the genetic
model for the FW cross design. In the first round of cross, we have
1 1 2 2 1 2
p p p p p p
A A A A A A and
1 1 2 2 1 2
m m m m m m
A A A A A A . Note that the four alleles
involved in the FW progeny,
1
A ,
2
A ,
3
A and
4
A , have been relabeled as
1
p
A ,
2
p
A ,
1
m
A and
2
m
A , respectively, where the superscripts p and m indicate the
paternal and maternal origins of the progeny and the subscripts 1 and 2
indicate the paternal and maternal origins of the parents. With this new
notation, we get the FW cross family
{ }
1 2 1 2 1 1 1 2 2 1 2 2
, , ,
p p m m p m p m p m p m
A A A A A A A A A A A A
(1.13)
We now assign a value to each allele, say
1
p
a ,
2
p
a ,
1
m
a and
2
m
a , for the four
alleles. The corresponding genotypic values are now defined as

11 1 1 11
12 1 2 12
21 2 1 21
22 2 2 22
p m
p m
p m
p m
G a a d
G a a d
G a a d
G a a d

( + + + (
( (
+ + +
( (
=
( (
+ + +
( (
+ + +
(

(1.14)
where
ij
d is the interaction effect between the two alleles involved in the
genotype. The model is over parameterized because we cannot estimate
nine parameters using four genotypes. Therefore, some restrictions are
required to reduce the number of parameters. Many different schemes of
restrictions can be used, but one particular scheme lead to the following
reduced parameters (XU 1998b),

1
1 2 2
1
1 2 2
1
11 12 21 22 4
( )
( )
( )
p p p
m m m
a a
a a
d d d d
o
o
o
( (
( (
=
( (
( (
+

(1.15)
Including , we have four estimable parameters that are expressed as
linear contrasts (combinations) of the genotypic values,
7


1 1 1 1 1 1 1 1
11 12 21 22 11 4 4 4 4 4 4 4 4
1 1 1 1 1 1 1 1
11 12 21 22 12 4 4 4 4 4 4 4 4
1 1 1 1 1 1 1 1
11 12 21 22 21 4 4 4 4 4 4 4 4
1 1 1 1 1 1 1 1
11 12 21 22 22 4 4 4 4 4 4 4 4
p
m
G G G G G
G G G G G
G G G G G
G G G G G

o
o
o
+ + + ( ( ( (
( ( ( (
+
( ( ( (
= =
( ( ( ( +
( ( ( (
+

(1.16)
The reverse relationship is

11
12
21
22
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
p m
p p m
m p m
p m
G a a
G a a
G a a
G a a
o
o o
o o
o o
( + + + ( ( (
( ( ( (
+
( ( ( (
= =
( ( ( ( +
( ( ( (
+

(1.17)
Let us define

1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
H
(
(

(
=
(
(


(1.18)
We can see that

1 1 1 1
4 4 4 4
1 1 1 1
4 4 4 4 1
1 1 1 1
4 4 4 4
1 1 1 1
4 4 4 4
H

(
(

(
=
(
(


(1.19)
We now give the model expressed as functions of only the estimable
parameters,

j j j
y X e | = +
(1.20)
where

1 1 1
2 1 2
3 2 1
4 2 2
for
for
for
for
p m
p m
j
p m
p m
H A A
H A A
X
H A A
H A A

and
p
m

o
|
o
o
(
(
(
=
(
(

(1.21)
and
k
H is the k th row of matrix H . Under this scale of definition for
j
X ,
the total phenotypic variance can be partitioned into four components,

2 2 2 2
var( ) ( ) ( )
p m
j
y o o o o = + + + (1.22)
where
2
o is the residual error variance.
8

Full-sib family
A full-sib family is a family of individuals generated by repeated matings
between two parents. Individuals within the family are called full-siblings
and they all share the same father and the same mother. The genetic model
described in the FW cross design directly applies to QTL mapping in the
full-sib family. The father (paternal parent) of the full-sib family is equivalent
to the
1
F hybrid used as the paternal parent of a FW cross. The mother
(maternal parent) of the full-sib family is equivalent to the
1
F hybrid used as
the maternal parent of a FW cross. We can only estimate the allelic
difference between the two alleles of the father (
p
o ), the allelic difference
between the two alleles of the mother (
m
o ) and the interaction effect ( o ).
The full-sib family design differs from a FW cross is that we need to infer the
linkage phases of the markers prior to QTL mapping because we do not
necessarily have the genotypic information of the grandparents in the full-
sib family. Once the linkage phases are inferred for the full-sib family, the
statistical model and method in the FW cross apply to the full-sib family.
Half-sib family
Each member of the family has a different mother but all share the same
father. This type of family is common in large animals such as beef cattle.
Half-sib families can also be found in forest trees but the common parent of
each half-sib family is the female parent. PROC QTL can handle half-sib
families using the same mating design as BC. The common parent in the
half-sib family is treated as the
1
F hybrid in the BC. The other parents (all
independent) in the half-sib family are treated as the backcrossed parent.
This comparison is hard to understand, but it is true from the statistical
model point of view. In a BC family of this mating type
1 2 1 1
AA AA , we
estimate the difference between two genotypes of the progeny,
1 1
A A and
1 2
A A . In fact, we are estimating the difference between the two alleles
carried by the
1
F hybrid (
1 2
A A ). The common parent (
1 1
A A ) plays no roles
other than providing a background for evaluation of the two alleles of the
1
F
hybrid. In half-sib QTL mapping, we are estimating the difference between
the two alleles of the common parent. The background alleles are provided
by all other independent parents. The difference between the two different
designs occurs in the different background alleles. A BC design has a
uniform or homogeneous background allele while a half-sib family has a
heterogeneous background allelic array. The background alleles play no
role in the statistical model. You need to manipulate the data a little bit to
fool the program. First, you need to infer the linkage phases of all markers
for the common parent and label the paternal allele of the common parent
by
1
A and the maternal allele by
2
A . Secondly, you need to recode the
9

genotypes of the progeny by
1
_ A A = and
2
_ B A = , where the underscore
mean a wild card representing the background alleles. You have now
relabeled the genotypes of the progeny so that there are only two possible
genotypes in the progeny. This half-sib family can now be mapped using
the BC mating design.
MOLECULAR MARKERS
QTL mapping requires two sets of data, one is the phenotype data of a
quantitative trait and the other is the marker genotype data. Depending on
the method of QTL mapping, a marker map may be needed. We assume
that the markers are already mapped in the genome. What we are going to
do in QTL mapping is to put detected QTL in the genome with positions
defined relative to the positions of markers. What are molecular markers
and what are the differences between a marker and a gene? A molecular
marker is a piece of labeled DNA in the genome. The alleles of a marker
are inherited following the Mendels laws of inheritance. A marker acts like a
gene because of its Mendelian behavior, but there is no known function on
any traits. If the function is known, it would be called a gene. However, the
genotype of a marker in an individual can be observed or measured using
some molecular technique. This is in contrast to a gene whose genotype is
rarely observed. Because the genotypes of markers can be observed, their
relative locations in the genome can be inferred. Genes have functions on a
trait of interest, but their genotypes cannot be observed. Through linkage
analysis, we can find the association of markers with the phenotype of
interest, from which the relative positions of the genes (QTL) can be
inferred. This process is called QTL mapping.
There are two kinds of molecular markers: dominant markers and co-
dominant markers. A dominant marker only has two observed states,
presence and absence. One allele is said to be dominant over other alleles
if one copy of the allele is sufficient to suppress the expression of all other
alleles. For example, if
1
A allele is dominant over
2
A allele, you cannot tell
the difference between
1 1
A A and
1 2
A A because the
2
A allele will not be
expressed. In terms of the
1
A allele, an individual only has two observed
states, presence (
1
_ A ) and absence ( _ _) of the
1
A allele. A co-dominant
marker is a marker in which each allele is expressed (observed) so that you
can directly see the alleles of a genotype. Dominant markers provide less
information than co-dominant markers, but very often dominance markers
have much higher density than co-dominance markers regarding their
occurrence (distribution) along the genome.
10

LINKAGE MAP OF MARKERS
Special software packages, e.g., MapMaker (LANDER et al. 1987), are
required to put markers in different linkage groups and order the markers
within each linkage group. The linear arrangement of the markers in the
genome is called the linkage map of markers. The marker map is usually
stored in a separate file with three columns and m rows where m is the
total number of markers in the map. The first column stores the names of
the markers, the second column gives the positions (cM) of the markers
within the chromosomes and the third column shows the chromosome
identifications of the markers. The position of each marker is measured in
cM relative to the position of the first marker of the chromosome. The
distance in cM between two consecutive markers are converted from the
recombination fraction between the two markers using either the Haldane
(1919) mapping function or the Kosambi (1944) mapping function. The
maker map is required for interval mapping, but not required for individual
marker analysis.
INTERVAL MAPPING
Interval mapping was originally developed by Lander and Botstein (1989).
Prior to interval mapping, QTL mapping already existed but it was called
individual marker analysis (SOLLER et al. 1976), which is simply a linear
regression analysis or t-test repeatedly used for every marker of the
genome. The problem with the individual marker analysis occurs when a
QTL is located between two markers. In such a situation, part of the effect
of the QTL is absorbed by the marker in the left and part absorbed by the
marker in the right. The true location of the QTL and its effect is never
estimated precisely. Lander and Botstein (1989) realized that a putative
position between two markers with known positions can be evaluated for its
association with the phenotype of a quantitative trait. When the position is
fixed, the distances of this putative position from the two flanking markers
are automatically given. The genotype of an individual for that putative locus
is, of course, missing but it probabilistic distribution can be inferred from the
genotypes of the flanking markers. Lander and Botstein (1989) then used a
mixture model to fit the data so that the QTL effect of that position can be
estimated and test for statistical significance. We can evaluate every
possible position within the interval and the position that has the highest test
statistic is a candidate of QTL in that interval. We then search for the QTL in
another interval using different flanking markers. All intervals within a
chromosome must be searched. Eventually, all chromosomes in the
genome are searched. The putative position of the entire genome that has
the highest test statistic is a candidate QTL. If the test statistic passes a
critical value, the candidate QTL can be safely claimed as a QTL.
Interval mapping, by definition, refers to the method of using two markers
11

only each time to infer the genotype of an internal locus. If one or both of
the two flanking marker are missing (have missing genotypes), the nearest
non-missing markers must be used in place of the missing markers. This
makes the interval making complicated because the interval defined varies
from one individual to another, depending on the missing pattern of the
flanking markers. As the advent of the multipoint method (JIANG and ZENG
1997), all markers can be used simultaneously to infer the genotypes of any
putative locus of the genome. This multipoint approach makes the name of
interval mapping no longer appropriate. There is no such a thing as an
interval because every putative position is evaluated using markers of the
entire genome. It is better to call the multipoint implemented interval
mapping genome scanning. Anyway, the so called interval mapping
performed by PROC QTL is the multipoint genome scanning.
MULTIPLE QTL MAPPING
Interval mapping is designed for mapping a single QTL per chromosome
because the statistical model only contains a single QTL. If more than one
QTL are present in a chromosome, interval mapping can still detect multiple
QTL if these QTL are not too close to each other. If two QTL are closely
linked, interval mapping may only show a single large peak in the test
statistic profile. The estimated QTL effects under the interval mapping
strategy will be biased if multiple QTL exist in the same linkage groups. The
best model to map multiple QTL is the multiple regression model. For
marker analysis, one can evaluate the entire genome by fitting all markers
to a single model. Because the number of markers may be huge, a model
selection algorithm may be applied to select important markers. Forward
selection, backward selection or step-wise regression can be used to
perform variable selection. Existing SAS procedures are available for that
purpose. Extension of interval mapping to multiple QTL model has been
made, the so called multiple interval mapping (KAO et al. 1999). Once the
important markers or intervals are found, the estimated QTL effects are
achieved via the ordinary least squares method or the classical maximum
likelihood method by fitting all the detected markers or interval to a single
linear model. Bayesian method is the state-of-the-art method for handling
multiple QTL. Two approaches of Bayesian variable selection are currently
used for multiple QTL mapping. One is the reversible jump Markov chain
Monte Carlo method (SILLANP and ARJAS 1998). The other is the
Bayesian shrinkage method of Wang et al. (2005b). In the Bayesian
shrinkage method, all markers are fit into a single model. Each regression
coefficient (marker effect) is assigned a normal prior distribution with mean
zero and a specific variance component. The regression coefficient specific
prior variance is further assigned a scaled inverse chi-square hyper prior
distribution with hyper parameters in the hyper prior provided by the users
or set at some values as vague as possible. The current version of PROC
QTL handles multiple QTL using the Bayesian shrinkage approach. The
12

reversible jump MCMC algorithm will be added later.
ASSOCIATION MAPPING
In contrast to linkage mapping where a designed line crossing experiment is
required, association mapping uses a random sample of a population to
perform QTL mapping, an approach called association mapping as opposed
to linkage mapping. Association mapping assumes that cumulative
historical recombination events have destroyed the linkage disequilibrium
between a QTL and any marker nearby that does not overlap with the QTL.
It is a simple individual marker analysis applied to a random sample of a
target population. If a marker is strongly associated with the trait phenotype,
this marker is the QTL because if it is not, the association would have been
destroyed by the cumulative historical recombination events. PROC QTL
can perform association mapping by individual marker analysis or multiple
marker analysis. The difference between the association mapping and
linkage mapping conducted by PROC QTL is that the data for association
mapping must be pretreated by removing any influence of population
structures (PRITCHARD et al. 2000) and hidden genetic relatedness among
individuals (HANSEN et al. 1997). The processed data are then treated as
the input data of RPOC QTL for further analysis.
13

MULTIPOINT METHOD FOR QTL GENOTYPE
INFERENCE
The key difference between individual marker analysis and interval mapping
is the ability of the latter to estimate QTL effect for a putative position that
does not overlap with a marker. In interval mapping, a putative position
bracketed by two markers has missing genotypes. The probability
distribution of the genotype, however, is inferred from marker information.
This probability distribution provides the foundation for the mixture model
maximum likelihood method. The current method for the probability
inference is the multipoint method where all markers are used
simultaneously rather than using two markers at a time. PROC QTL takes
the most current multipoint method for such probability inference.
MAPPING FUNCTION
The marker distances in the linkage map are almost always measured by
the expected numbers of crossovers (additive distances) in centiMorgan
(cM). The multipoint method, however, takes the distances measured in
recombination fractions as the input data. Therefore, the additive distances
between consecutive markers must be converted into recombination
fractions prior to the multipoint analysis. There are two mapping functions
commonly used in QTL mapping, the Haldane (1919) mapping function and
the Kosambi (1944) mapping function. PROC QTL uses the Haldane
mapping function only. Let
ij
d and
ij
r be the additive distance and the
recombination fraction between loci i and j , respectively, the Haldane
mapping function is

1
1 exp( 2 )
2
ij ij
r d ( =

(2.1)
where the additive distance is measured in Morgan not in centiMorgan (1 M
= 100 cM). If you provided the additive distances measured in centiMorgan,
you should convert the distances into Morgan prior to applying the Haldane
mapping function. The Kosambi (1944) mapping function takes into
consideration the crossover interference between consecutive intervals and
thus it is more realistic than the Haldane (1919) mapping function. However,
Haldane mapping function is mathematically more attractive than the
Kosambi mapping function. For your convenience, we present the Kosambi
mapping function below, although it is not used by PROC QTL,

1 exp( 4 )
2 1 exp( 4 )
ij
ij
ij
d
r
d

=
( +

(2.2)
14

MARKOV CHAIN PROPERTY
The multipoint method of genotype probability inference was developed
based on the Markov chain properties (JIANG and ZENG 1997). A
chromosome is considered as a Markov chain. You can treat the left end of
the chromosome as the starting point of the chain and the right end as the
ending point of the chain or vice versa. A marker or any locus in the
chromosome is a time point within the chain. For each locus, the genotypes
are the states of the point. The transition probabilities between two
consecutive loci are functions of the recombination fraction between the two
loci. Each Mendelian locus occupies a specific point on a chromosome. A
linkage analysis requires two or more Mendelian loci, and thus involves two
or more points. When a linkage analysis involves two Mendelian loci, the
analysis is called two-point analysis. When more than two Mendelian loci
are analyzed simultaneously, the method is called multipoint analysis.
Multipoint analysis can extract more information from the data if markers
are not fully informative, e.g., missing genotypes, dominance alleles and so
on.
When there is no interference between the crossovers of two consecutive
chromosome segments, the joint distribution of genotypes of marker loci is
Markovian. We can imagine that the entire chromosome behaves like a
Markov chain, in which the genotype of one locus depends only on the
genotype of the previous locus. A Markov chain has a direction, but a
chromosome has no meaningful direction. Its direction is defined in an
arbitrary fashion. Therefore, we can use either a forward Markov chain or a
backward Markov chain to define a chromosome and the result will be
identical, regardless which direction has been taken.
A Markov chain is used to derive the joint distribution of all marker
genotypes. The joint distribution is eventually used to construct a likelihood
function for estimating multiple recombination fractions. Given the
recombination fractions, one can derive the conditional distribution of the
genotype of a locus bracketed by two marker loci given the genotypes of
the markers. The conditional distribution is fundamentally important in
genetic mapping for complex traits.
Joint distribution of multiple locus genotype
When three loci are considered jointly, the method is called three-point
analysis. Theory developed for three-point analysis applies to arbitrary
number of loci. Let ABC be three ordered loci on the same chromosome
with pairwise recombination fractions denoted by
AB
r ,
BC
r and.
AC
r We can
imagine that these loci forms a Markov chain as either A B C or
A B C . The direction is arbitrary. Each locus represents a discrete
variable with two or more distinct values (states). For an individual from a
four-way (FW) cross, each locus takes one of four possible genotypes, and
15

thus four states. Let
1 3
A A ,
1 4
A A , .
2 3
A A . and
2 4
A A be the four possible
genotypes for locus A,
1 3
B B ,
1 4
B B ,
2 3
B B and
2 4
B B be the four possible
genotypes for locus B , and
1 3
C C ,
1 4
CC ,
2 3
C C and
2 4
C C be the four possible
genotypes for locus C. For convenience, each state is assigned a numerical
value. For example, 1 A= or 2 A= indicates that an individual takes
genotype
1 3
A A or
1 4
A A . Let us take A B C as the Markov chain, the
joint distribution of the three-locus genotype is

( , , ) Pr( ) Pr( | ) Pr( | ), Pr A B C A B A C B =
(2.3)
where Pr( 1) Pr( 2) Pr( 3) Pr( 4) 1/ 4 A A A A = = = = = = = = assuming that there is
no segregation distortion. The conditional probabilities, Pr( | ) B A and
Pr( | ) C B , are called the transition probabilities between loci A and B and
between loci B and C, respectively. The transition probabilities depend on
the genotypes of the two loci and the recombination fractions between the
two loci. These transition probabilities from locus A to locus B can be found
from the following 4 4 transition matrix,

2 2
2 2
2 2
2 2
(1 ) (1 ) (1 )
(1 ) (1 ) (1 )
(1 ) (1 ) (1 )
(1 ) (1 ) (1 )

AB AB AB AB AB AB
AB AB AB AB AB AB
AB
AB AB AB AB AB AB
AB AB AB AB AB AB
r r r r r r
r r r r r r
T
r r r r r r
r r r r r r
(
(

(
=
(

(

(

(2.4)
The transition matrix from locus B to locus C is denoted by
BC
T , which is
equivalent to matrix
AB
T except that the subscript
AB
is replaced by
subscript
BC
. Note that this transition matrix is obtained by the Kronecker
square (denoted by a superscript
[ 2]
) of a 2 2 transition matrix,

1
,
1

AB AB
AB
AB AB
r r
H
r r
(
=
(


(2.5)
that is

2
1 1 1
.
1 1 1

AB AB AB AB AB AB
AB
AB AB AB AB AB AB
r r r r r r
T
r r r r r r
( ( (
= =
( ( (



The 4 4 transition matrix (2.4) may be called the zygotic transition matrix
and the 2 2 transition matrix (2.5) may be called the gametic transition
matrix. That the zygotic transition matrix is the Kronecker square of the
gametic transition matrix is very intuitive because a zygote is the product of
two gametes. Let ( , )
AB
T k l be the k th row and the l th column of the 4 4
transition matrix
AB
T , , 1, , 4 k l = . . The joint probability of the three locus
genotype is expressed as
16


1
( , , ) ( , ) ( , ).
4
AB BC
Pr A B C T A B T B C = (2.6)
Consider a single locus, say locus A. A FW progeny can take one of the
four genotypes:
1 3
A A ,
1 4
A A ,
2 3
A A and
2 4
A A . Let 1, , 4 A= . denotes the
numerical code for each of the four genotypes. The diagonal matrices,
A
D ,
B
D and
C
D , are defined as a 4 4 matrix. The numerical code of A k = is
translated into a
A
D matrix whose elements are all zero except that the k th
row and the k th column is unity. Having defined these diagonal matrices for
all loci, we can rewrite the joint distribution of the three locus genotype as

1
( , , ) ,
4
A AB B BC C
Pr A B C J D T D T D J ' = (2.7)
where J is a 4 1 vector of unity. For example, the joint probability that
3 A= , 1 B = and 4 C = is

( )
2
1
3, 1, 4
4
1
(3,1) (1, 4)
4
1
(1 ) .
4
A AB B BC C
AB BC
AB AB BC
Pr A B C J D T D T D J
T T
r r r
' = = = =
=
=

The FW cross design described early represents a situation where all the
four genotypes in the progeny are distinguishable. In reality, not all
genotypes are distinguishable, e.g., the presence of dominant alleles. This
may happen when two or more of the grandparents carry the same allele at
the locus of interest. The consequence is that the
1
F hybrid initiated by the
first level of the cross may be homozygous or the two
1
F parents may have
the same genotype. Assume that
(34)
1
F has a genotype of
3 3
A A , which is
homozygous. This may be caused by a cross between two parents, both of
which are fixed at
3
A allele. Regardless the reason that causes the
homozygosity of the
1
F hybrid, let us focus on the genotypes of the two
parents and consider the four possible genotypes of the FW progeny.
Assume that
(12)
1
F and
(34)
1
F have genotypes of
1 2
A A and
3 3
A A , respectively.
The four possible genotypes of the progeny are
1 3
A A ,
1 3
A A ,
2 3
A A and
2 3
A A .
The first and the second genotypes are not distinguishable, although the
3
A
allele carried by the two genotypes have different origins. This situation
applies to the third and forth genotypes. Considering the allelic origins, we
have four ordered genotypes, but we only observe two distinguishable
genotypes. This phenomenon is called incomplete information for the
genotype. Such a genotype is called partially informative genotype. If we
17

observe genotype
1 3
A A , the numerical code for the genotype is (1, 2) A= . In
matrix notation, it is represented by

1 0 0 0
0 1 0 0
.
0 0 0 0
0 0 0 0
A
D
(
(
(
=
(
(


If an observed genotype is
2 3
A A , the numerical code becomes (3, 4) A = ,
represented by

0 0 0 0
0 0 0 0
.
0 0 1 0
0 0 0 1
A
D
(
(
(
=
(
(


If both parents are homozygous and fixed to the same allele, say
1
A , then
all the four genotypes of the progeny have the same observed form,
1 1
A A .
The numerical code for the genotype is (1, 2, 3, 4) A = , a situation called no
information. Such a locus is called uninformative locus and usually excluded
from the analysis. The diagonal matrix representing the genotype is simply
a 4 4 identity matrix.
The following is an example showing how to calculate the three-locus joint
genotype using the FW cross approach with partial information. Let
1 3 2 3 1 1
A A B B C C and
4 4 2 3 1 2
A A B B C C be the three-locus genotypes for two
parents. The linkage phases of markers in the parents are assumed to be
known so that the order of the two alleles within a locus is meaningful. In
fact, the phase known genotypes of the parents are better denoted by
1 2 1
3 3 1
AB C
A B C
and
4 2 1
4 3 2
A B C
A B C
, respectively, for the two parents. Assume that a
progeny has a genotype of
3 4 2 2 1 1
A A B B CC . We want to calculate the
probability of observing such a progeny given the genotypes of the parents.
First, we examine each single locus genotype to see which one of the four
possible genotypes this individual belongs to. For locus A, the parental
genotypes are
1 2
A A and
4 4
A A . The four possible genotypes of a progeny
are
1 4
A A ,
1 4
A A ,
3 4
A A and
3 4
A A , respectively. The single locus genotype of
the progeny is
3 4
A A , matching the third and forth genotypes, and thus
(3, 4) A = . For locus B, the parental genotypes are
2 3
B B and
2 3
B B . The four
possible genotypes of a progeny are
2 2
B B ,
2 3
B B ,
3 2
B B and
3 3
B B ,
respectively. The single locus genotype
2 2
B B for the progeny matches the
18

first genotype, and thus 1 B = . For locus C, the parental genotypes are
1 1
C C
and
1 2
CC . The four possible genotypes of a progeny are
1 1
C C ,
1 2
CC ,
1 1
C C
and
1 2
CC , respectively. The single locus genotype of the progeny
1 1
C C
matches the first and the third genotypes, and thus (1, 3) C = . In summary,
the numerical codes for the three loci are (3, 4) A = , 1 B = and (1, 3) C = ,
respectively. We now convert the three single locus genotypes into their
corresponding diagonal matrices,

0 0 0 0 1 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
, and .
0 0 1 0 0 0 0 0 0 0 1 0
0 0 0 1 0 0 0

0 0 0 0 0

A B C
D D D
( ( (
( ( (
( ( (
= = =
( ( (
( ( (


Substituting these matrices into equation (2.7), we have

( ) ( )
( ) ( ) ( ) ( )
( )
1
Pr 3, 4 , 1, 1, 3
4
1
3,1 4,1 1,1 1, 3
4
1
1
4
A AB B BC C
AB AB BC BC
AB BC
A B C J D T D T D J
T T T T
r r
' = = = = (

= + + ( (

=

The populations handled by PROC QTL also include
2
F , BC, DH and RILs.
The four-way cross design is a general design where the BC, DH and
2
F
designs are special cases of the general design with partial information. For
example, the two parents of the
1
BC design have genotypes of
1 2
A A and
1 1
A A , respectively. If we treat the BC progeny as a special FW progeny, the
four possible genotypes are
1 1
A A ,
1 1
A A ,
2 1
A A and
2 1
A A , only two
distinguishable observed types. If a progeny has a genotype
1 1
A A , the
numerical code of the genotype in terms of a FW cross is (1, 2) A= . If a
progeny has a genotype of
2 1
A A , its numerical codes becomes (3, 4) A = .
The two parents of the
2
BC design have genotypes of
1 2
A A and
2 2
A A ,
respectively. In terms of a FW cross, the four possible genotypes are
1 2
A A ,
1 2
A A ,
2 2
A A and
2 2
A A . Again, there are only two distinguishable genotypes.
The two parents of the
2
F design have genotypes of
1 2
A A and
1 2
A A ,
respectively. If we treat the
2
F progeny as a special FW progeny, the four
possible genotypes are
1 1
A A ,
1 2
A A ,
2 1
A A and
2 2
A A , only three
distinguishable genotypes. The numerical code for the two types of
homozygote are 1 A= and 4 A= , respectively, whereas the numerical code
19

for the heterozygote is (2, 3) A = . In summary, when the general FW design
is applied to a BC design, only two of the four possible genotypes are
distinguishable and the numerical codes are (1, 2) A= for one observed
genotype and (3, 4) A = for the other observed genotype. When the general
FW design is applied to the
2
F design, the two forms of heterozygote are
not distinguishable. When coding the genotype, we use (3, 4) A = to
represent the heterozygote, and 1 A= and 4 A= to represent the two types
of homozygote, respectively. The transition matrices remain the same as
those used in the FW cross design. When using the FW design for the BC
problem, we have combined the first and second genotypes to form the first
observable genotype, and combined the third and forth genotypes to form
the second observable genotype for the BC design. It can be shown that the
joint probability calculated by the Markov chain with two states (using the
2 2 transition matrix) and that calculated by the Markov chain with four
states (the 4 4 transition matrix) are identical. The
2
F design we learned
early can be handled by combining the second and third genotypes into the
observed heterozygote. The 4 4 transition matrix is converted into a 3 3
transition matrix,

2 2
2 2
2 2
(1 ) 2(1 )
(1 ) (1 ) (1 ) .
2(1 ) (1 )
AB AB AB AB
AB AB AB AB AB AB AB
AB AB AB AB
r r r r
T r r r r r r
r r r r
(
(
= +
(
(



The joint probability of multiple locus genotype for an
2
F individual can be
calculated using a Markov chain with the 3 3 transition matrix. The
numerical code for a genotype must be redefined in the following way. The
three defined genotypes
1 1
A A ,
1 2
A A and
2 2
A A , are numerically coded by
1 A= , 2 A= and 3 A= , respectively. In matrix notation, the three genotypes
are denoted by

1 0 0 0 0 0 0 0
, a
0
0 0 0 0 1 0 0 0 0
0
nd .
0 0 0 0 0 0 0 1
A A A
D D D
( ( (
( ( (
= = =
( ( (
( ( (


Recombinant inbred lines (RILs) are another widely used mapping
population which produced by continuous selfing or sib mating the progeny
of individual members of a
2
F population until complete homozygous is
achieved. RILs shared the same genetic structure with DH population and
the model definition from BC population also can be used to RILs after little
modification, but more recombinant individuals will be detected in RILs
because the multi-cycles of meiosis. Therefore, recombinant fraction
calculated via the Haldane function need to be further converted using
equations (2.1) and (2.2) given in the introduction section
20

The general FW design using a Markov chain with four states is
computationally more intensive when applied to BC and
2
F designs
compared to the specialized BC (with 2 2 transition matrix) and
2
F (with
3 3 transition matrix) algorithm. However, the difference in computing
times is probably unnoticeable given the current computing power. In
addition, the 3 3 transition matrix is not symmetrical, a factor that may
easily cause a programming error. Therefore, the general FW design is
recommended and was actually used in PROC QTL for all the simple line
crossing experiments.
Conditional distribution of genotype for a putative position
The joint distribution described above is not used very often in QTL
mapping. It is mainly used for further calculating the conditional probabilities
of QTL genotypes. Consider the following five loci denoted by ABCDE in
that order. Assume that ABDE are markers and C is a putative position in
the center of the four markers. The conditional probability of genotype for
locus C is

4
' 1
Pr( | ) Pr( | ) Pr( ) Pr( | ) Pr( | )
Pr( | )
Pr( | ) Pr( | ') Pr( ') Pr( | ') Pr( | )
k
A B B C k C k E C k D E
C k ABED
A B B C k C k E C k D E
=
= = =
= =
= = =

(2.8)
In matrix notation, this conditional probability is expressed as

( )
4
( ')
1
Pr( | )
T
A AB B BC k CE E ED D
T
A AB B BC k CE E ED D
k
J D T D T D T D T D J
C k ABED
J D T D T D T D T D J
=
= =

(2.9)
where
( ) k
D is a diagonal matrix with all elements being zero except that the
k th row and the k th column is unity.
VIRTUAL MAP
A virtual map is required for interval mapping and the implementation of
interval mapping by PROC QTL. A virtual map is a map that contains more
or less evenly distributed putative locations of the entire genome. The
distance between two putative positions is 1 cM by default but users can
define their own virtual map with a different distance. The conditional
probabilities of the genotypes of the putative positions are calculated using
the above multipoint method prior to execution of the interval mapping
procedure. The separation of the conditional probabilities of genotypes from
the interval mapping is a cost saving strategy. PROC QTL allows user to
select three different ways to construct the virtual map, which are described
below.
21

1. Variable increment
Users can provide a maximum increment that forces markers to be included
in the virtual map. For example, if you put the STEP = d option in the PROC
QTL statement, the procedure will create a virtual map in which the distance
between two consecutive putative position equals or less than d cM and all
markers are included in the virtual mapping. If the distance between two
markers is a integer multiples of d cM, the increment within the interval is
exactly d cM, otherwise, the increment within this interval is

*
int( / ) 1
AB
AB
x
d
x d
=
+
(2.10)
where
AB
x is the distance measured in cM between locus A and locus B.
With this option, the increment can vary from one marker interval to another
but within a marker interval, the increment is the same.
2. Soft fixed increment
Soft fixed increment sets each increment to be exactly d cM except that the
distance between a marker and a putative position in either side of the
marker may be slightly less than d cM. All markers are forced to be included
in the virtual map. The fixed increment search is turned on when STEP =
d/soft option is used in the PROC QTL statement.
3. Hard fixed increment
Hard fixed increment sets each increment to be exactly d cM throughout the
entire genome. Markers are not necessarily included in the virtual map. A
marker will be included in the virtual map if and only if its distance from the
first marker of the chromosome is an integer multiple of d cM. The hard
fixed increment search is turned on when STEP = d/hard option is used in
the PROC QTL statement.
The following example shows the differences among the three options. In
this example, there are three linked markers M1, M2 and M3 in a
chromosome of 20 cM long. The two intervals are 9.8 cM and 10.2 cM,
respectively. User specified d = 2.0 cM as the step size for QTL mapping.
The virtual maps generated by the three different options are shown in
Figure 1 given below.
22


Figure 1. Virtual maps generated by the three options with d = 2.0: variable (left), soft
fixed (middle) and hard fixed (right)

M1
.
.
.
.
M2
.
.
.
.
.
M3
1.96
1.96
1.96
1.96
1.96
1.70
1.70
1.70
1.70
1.70
1.70
9
.
8
c
M
1
0
.
2
c
M
a
M1
.
.
.
.
M2
.
.
.
.
.
M3
2.00
2.00
2.00
2.00
1.80
2.00
2.00
2.00
2.00
2.00
0.20
b
M1
.
.
.
.
.
.
.
.
.
M3
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
c
23

MAXIMUM LIKELIHOOD METHOD
We used two major statistical methods to estimate QTL parameters, the
maximum likelihood method and the Bayesian method. PROC QTL uses
these two methods and some variants or special cases of the two methods
to perform QTL mapping. The maximum likelihood method is used for
interval mapping while the Bayesian method is used for mapping multiple
QTL. This chapter briefly describes the concept of maximum likelihood
method and the likelihood ratio test statistic in general. The basic framework
of the maximum likelihood method will be customized in latter chapters
when details of the QTL mapping procedures are discussed.
LIKELIHOOD FUNCTION
PROC QTL deals with line crossing data. All line crossing experiments
share a common property, that is different individuals within the family are
independent conditional on parameters. This property makes the log
likelihood function the sum of the individual log likelihoods. Let
j
y be the
data point from the j th individual for 1,..., j n = where n is the sample size.
Let u be an 1 m vector of parameters. The log likelihood function for
individual j is

( ) ln ( | )
j j
L f y u u ( =

(3.1)
where ( | )
j
f y u is the probability density. The overall log likelihood function
is

1 1
( ) ( ) ln ( | )
n n
j j
j j
L L f y u u u
= =
( = =


(3.2)
The log likelihood function of parameter and the logarithm of probability
density of the data differ only by a constant, which is a function of data but
not a function of the parameter. That constant is irrelevant to the maximum
likelihood solution and thus always is ignored. The maximum likelihood
estimate (MLE) of u is the one that maximizes ( ) L u and usually denoted by

u . A local maximum likelihood solution can be obtained by solving the


following simultaneous equations,

1
( )
( )
0
n
j
j
L
L
u
u
u u
=
c
c
= =
c c

(3.3)
In a very few situations, an explicit solution may exist, but most often there
is no explicit solution. Therefore, a numerical solution must be found with
some iteration schemes. Two numerical algorithms have been implemented
24

in PROC QTL. One is the Newton-Raphson algorithm, including the Fisher
scoring algorithm as an improved version of the Newton-Raphson algorithm.
The other is the Nelder and Mead (1965) simplex algorithm. The simplex
algorithm is a derivative free algorithm because it does not require the
partial derivative of the likelihood function with respect to the parameter
vector. We only describe the Newton-Raphson and the Fisher scoring
algorithms, leaving the simplex algorithm to the original paper (NELDER and
MEAD 1965).
NEWTON-RAPHSON ALGORITHM
The Newton-Raphson algorithm requires both the first and the second
partial derivatives. Let
( ) t
u be the parameter value at iteration t , the
parameter at iteration 1 t + is given by

1
2 ( ) ( )
( 1) ( )
( ) ( ) ( )
1 1
( ) ( )
t t
n n
j j t t
t t t
j j
L L u u
u u
u u u

+
= =
( ( c c
=
( (
c c c
( (


(3.4)
We often call

1 1
( )
( ) ( )
n n
j
j
j j
L
S S
u
u u
u
= =
c
= =
c

(3.5)
the score vector and

2
1 1
( )
( ) ( )
n n
j
j T
j j
L
H H
u
u u
u u
= =
c
= =
c c

(3.6)
the Hessian matrix. Therefore the Newton-Raphson algorithm can be
rewritten as

( 1) ( ) 1 ( ) ( )
( ) ( )
t t t t
H S u u u u
+
= (3.7)
The Newton-Raphson algorithm is a fast algorithm in the sense that it only
takes a few iteration to converge. Unfortunately, it is sensitive to the initial
value of the parameter. If the initial value is not close to the true solution,
the algorithm may fail to converge to the correct solution. The iteration
process often stops before it converges due to the fact that
1 ( )
( )
t
H u

does
not exist. One modification of the algorithm is called the Newton-Raphson-
Ridge algorithm. This algorithm adds a small positive number to the
diagonals of the Hessian matrix to make the matrix invertible, an idea
borrowed from the ridge regression method (HOERL and KENNARD 2000;
TYCHONOFF 1943). This modification is minor and thus the name of the
algorithm still preserves the Newton-Raphson prefix.
25

INFORMATION MATRIX AND ESTIMATION ERRORS
A further improvement on the Newton-Raphson algorithm is called the
Fisher scoring algorithm (HAN and XU 2008). Because the modification is
substantial, the Newton-Raphson name is no longer preserved in this
algorithm. But the idea of the new algorithm still came from the original
Newton-Raphson algorithm. The Fisher scoring algorithm first defines the
information matrix, called Fishers information, which is
| | ( ) ( ) I E H u u = (3.8)
negative of the expectation of the Hessian matrix. The expectation is taken
with respect to the observed data y . The Fisher scoring algorithm simply
replaces the Hessian matrix by the expectation of the Hessian matrix. You
may be confused by this "expectation with respect to y " because the data
do not explicitly appear in the notation of the Hessian matrix. Dependent on
the properties of the problem in question, sometimes ( ) H u is a function of
the data and sometimes it is not. When ( ) H u is not a function of the data,
( ) ( ) I H u u = holds and thus the Fisher scoring algorithm is the same as the
Newton-Raphson algorithm. The iteration equation of the Fisher scoring
algorithm is

1
( 1) ( ) ( ) ( ) ( ) 1 ( ) ( )
( ) ( ) ( ) ( )
t t t t t t t
E H S I S u u u u u u u

+
( = = +

(3.9)
The Fisher scoring algorithm is much more stable than the Newton-
Raphson algorithm because
1 ( )
( )
t
I u

almost always exists. Several other


properties make the Fisher scoring algorithm more appealing than the
original Newton-Raphson algorithm. First, the inverse of the information
matrix asymptotically represents the variance-covariance matrix of the
estimated parameter

1

var( ) ( ) I u u

~ (3.10)
This means that

var( ) u is a by-product of the Fisher scoring iteration


algorithm. Secondly, although
( ) ( ) ( )
T
H S S u u u = (3.11)
the following relationship holds
| | ( ) ( ) ( )
T
E H E S S u u u ( =

(3.12)
This means that we do not have to know the Hessian matrix to find the
expectation of the Hessian matrix. This way of finding the expectation of the
Hessian matrix can be substantially easier than deriving the Hessian matrix
because we can use
26

| |
1 1 1
( ) var ( ) ( ) ( )
n n n
T
j j j
j j j
E H S E S E S u u u u
= = =
( ( ( = +


(3.13)
to calculate the expectation.
LIKELIHOOD RATIO TEST STATISTICS
The null hypothesis can often be expressed by

0
: H K C u = (3.14)
where k is a known p m matrix for p m s and C is a known 1 p vector.
Under the null hypothesis, u can be estimated via maximizing the log
likelihood function under constraint K C u = . Let u be the MLE of parameter
under constraint K C u = . The likelihood ratio test statistic is given by

0 1

2 ( ) ( ) L L u u
(
A =

(3.15)
where

u is the MLE of the parameter obtained via maximizing the full log
likelihood function given in equation (3.2) and
1

( ) L u is the log likelihood


value evaluated at

u u = .
1

( ) L u is also called the log likelihood value for the


full model. Accordingly,
0
( ) L u is called the log likelihood value for the
restricted (or reduced) model.
0
( ) L u is simply the value of equation (3.2)
evaluated at u u = . Note that Ku is often a subset of u . In such a case, u
can be estimated by maximizing the so called reduced log likelihood
function, which has the same form as equation (3.2) except that only the
subset of parameter u appear in the function. Most hypothesis tests in
PROC QTL are this kind. A more general method to find u is through
maximization of the following quantity
( ) ( )
T
Q L K C u u = + (3.16)
where is a 1 p vector of Lagrange multipliers. Both u and are treated
as unknowns when maximizing Q. A more rigorous expression of the
solution is

argmax ( ) ( )
T
L K C u u u ( = +

(3.17)
where the arguments include both u and . Under the null hypothesis, the
likelihood ratio test statistics A asymptotically follows a chi-square
distribution with p degrees of freedom. This asymptotic property allows us
to use the Piepho's (2001) simple method to calculate the genome-wide
threshold value of the test statistics used for significance test of QTL.
27

WALD TEST STATISTICS
Wald test (WALD 1957) is another test alternative to the likelihood ratio test.
The test statistics is defined as

1
1

( ) var ( )( )

( ) var( ) ( )
T
T T
W K C K C K C
K C K K K C
u u u
u u u

=
(
=

(3.18)
The Wald test statistics has the same asymptotic property as the likelihood
ratio test, that is the chi-square distribution under the null hypothesis. When
the sample size is small, the likelihood ratio test is more preferable than the
Wald test. However, an obvious advantage of the Wald test over the
likelihood ratio test is the avoidance of evaluating multiple log likelihood
functions. Only the full log likelihood function is maximized with the Wald
test.
28

BAYESIAN METHOD
INTRODUCTION TO BAYESIAN METHOD
Parameters u and data y are the two items required by the maximum
likelihood analysis. The Bayesian method requires a distribution for the
parameters u , in addition to the parameters themselves and the data.
Therefore, the parameters have been treated as variables in the Bayesian
analysis. The purpose of Bayesian analysis is to infer the posterior
distribution of u , also called the conditional distribution of u given the data.
The posterior distribution of u contains much more information than the
point estimate of u from the maximum likelihood analysis. Let ( | ) p y u be
the probability density of data given the parameters and it is also called the
likelihood function of the parameters. Let ( ) p u be the probability density of
the parameters. In the Bayesian framework, it is called the prior distribution.
The posterior distribution of the parameters is

( | ) ( )
( | ) ( | ) ( )
( )
p y p
p y p y p
p y
u u
u u u =
(4.1)
which is proportional to the joint distribution of the data and the parameters.
The marginal distribution of the data in the denominator plays no role in the
Bayesian analysis and thus is often ignored. The posterior distribution does
not seem to be more complicated than the likelihood function except that a
prior distribution of the parameters is needed. This is true for a single
parameter problem. When the dimensionality of u is more than one, the
posterior distribution required in the Bayesian analysis is the marginal
posterior distribution for each element of the parameter vector. The
posterior distribution given by equation (4.1) is called the joint posterior
distribution, which is not what we want. Let us partition the parameter vector
into
k
u and
k
u

, where
k
u is the k th element of u and
k
u

is a vector
containing the remaining elements of u . The original parameter vector can
be expressed as ( , )
k k
u u u

= . The marginal posterior distribution of element


k of vector u is

( | ) ... ( , | )
k k k k
p y p y d u u u u

=
} }
(4.2)
This marginal posterior distribution is what we need in Bayesian analysis.
To find this marginal posterior distribution, multiple integrations are required.
In most situations, explicit multiple integral does not exist and numerical
multiple integrations are needed. This explains why Bayesian analysis was
not as popular as it is today in the past because numerical multiple
29

integrations were computationally unmanageable in the past. What is the
Bayesian estimate of a parameter? The marginal posterior distribution is the
Bayesian estimate of a parameter. It is a distribution rather than a single
point estimate. The marginal posterior distribution can be best described by
a few parameters in the distribution. Therefore, people often use the
posterior mean or the posterior mode as the Bayesian estimate of the
parameter. A more informative representative of the Bayesian estimate is
the posterior mean accompanied with the posterior standard deviation. The
most frequently used representatives of Bayesian estimate are the posterior
mean, equal tail interval and highest posterior density interval. Anyway, the
posterior distribution contains all information we need and it is the most
informative representative of the Bayesian estimate.
MARKOV CHAIN MONTE CARLO ALGORITHM
The Markov chain Monte Carlo (MCMC) algorithm is a special numerical
method for multiple integrations. It is more efficient than any other numerical
methods for multiple integrations when the dimensionality is high. The
MCMC algorithm is a sampling based algorithm in which parameters are
repeatedly sampled from their conditional posterior distributions. Recall that
the purpose of Bayesian analysis is to infer the marginal posterior
distribution for each parameter. The MCMC algorithm, however, does not
sample parameters from their marginal distributions because we do not
have explicit forms of the marginal distributions. If we had known the
marginal posterior distributions, the problem would have been solved. The
target distribution from which parameters are sampled is the joint posterior
distribution, but MCMC does not sample parameters from the joint posterior
distribution either. The MCMC algorithm actually samples each parameter
from its fully conditional posterior distribution. In the end, we collect the
posterior sample for the random draws and the posterior sample is
considered being drawn from the joint posterior distribution. When only one
parameter is considered in the posterior sample and all other parameters
are ignored, the sample of this parameter is drawn from the marginal
posterior distribution. The verbal description of MCMC may be confusing.
Let us describe the MCMC sampling process using equations and symbols.
Let ( | , )
k k
p y u u

be the likelihood of the parameters and


1
( , )
k
p u u

be the
prior density of the parameters. The joint posterior distribution of the
parameters is

1 1
( , | ) ( | , ) ( , )
k k k k
p y p y p u u u u u u

(4.3)
Let
( ) t
k
u

be the value of
k
u

at the t th iteration, the fully conditional posterior


distribution of
k
u is obtained simply by replacing
k
u

with
( ) t
k
u

in the above
equation. Therefore, the fully conditional posterior distribution of
k
u is
30


( ) ( ) ( ) ( )
( | , ) ( , | ) ( | , ) ( , )
t t t t
k k k k k k k k
p y p y p y p u u u u u u u u

= = (4.4)
This distribution usually has a simple form, e.g., normal, Bernoulli, chi-
square or other explicit form of distribution. As a result,
k
u can be sampled
directly from that distribution. Each and every element of vector is sampled
from its own fully conditional posterior distribution. This particular MCMC
sampling algorithm is called the Gibbs sampler (CASELLA and GEORGE 1992;
GEMAN and GEMAN 1984). Let
( ) t
u be the values of all parameters at
iteration t . The sequence
(0) (1) ( )
{ ... }
T
u u u forms a Markov chain. This
means that
( 1) t
u
+
depends on
( ) t
u . Since the initial value
(0)
u is arbitrarily
chosen, the Markov chain is highly dependent of the initial values of the
parameters in the early stage. The change of parameter values in the early
stage is usually chaotic. When the chain is sufficiently long, the parameters
will reach their stationary distribution, i.e., the change tends to be stabilized.
Parameter values in the early stage (before the chain reaches the stationary
distribution) should not be counted. This early stage of the Markov chain is
called the burn-in period. After the burn-in period,
( 1) t
u
+
may still highly
depend on
( ) t
u . To remove the auto correlation, the chain needs to be
trimmed or thinned. Depending on the autocorrelation, the thinning rate may
vary. By default, PROC QTL saves one observation in every 10 iterations.
The thinning rate is 1:10. The posterior sample contains only the save
observations after the burn-in and thinning. For example, suppose that the
burn-in period is 1000 and the thinning rate is 1:10. The total number of
iterations is 11000. The posterior sample contains only
(11000 1000) / 10 1000 = observations, as shown in the following table,
Table 1. Posterior sample for five parameters with burn-in period of 1000 and
thinning rate of 1:10.
Iteration
1
u
2
u
3
u
4
u
5
u
1010 (1010)
1
u
(1010)
2
u
(1010)
3
u
(1010)
4
u
(1010)
5
u
1020 (1020)
1
u
(1020)
2
u
(1020)
3
u
(1020)
4
u
(1020)
5
u
1030 (1030)
1
u
(1030)
2
u
(1030)
4
u
(1030)
5
u
1040 (1040)
1
u
(1040)
2
u
(1040)
3
u
(1040)
4
u
(1040)
5
u
1050 (1050)
1
u
(1050)
2
u
(1050)
3
u
(1050)
4
u
(1050)
5
u

10980 (10980)
1
u
(10980)
2
u
(10980)
3
u
(10980)
4
u
(10980)
5
u
11000 (11000)
1
u
(11000)
2
u
(11000)
3
u
(11000)
4
u
(11000)
5
u
31

Once the Markov chain reaches it stationary distribution, the parameter
vector
( ) t
u is considered to be sampled from the joint posterior distribution,
i.e., each row of Table 1 is an observation from the joint posterior
distribution. What is the marginal posterior distribution of a parameter? We
do not have such a distribution yet, but we have a sample from that
distribution. Take the first parameter for example, if we look at the column
headed by
1
u in Table 1 and ignore all other parameters. The posterior
sample formed by this column is the sample drawn from the marginal
posterior distribution. This is equivalent to the situation where we collected
a sample with two variables, X and Y . It is a joint sample if both X and Y
are considered, but if we only look at X and ignore Y , the sample is a
marginal sample of X . The MCMC algorithm does not explicitly infer the
marginal distribution; rather it provides a sample of observations drawn from
the target distribution.
The MCMC algorithm described above is called the Gibbs sampler. In order
to perform Gibbs sampler, we must know explicitly what distribution
( )
( | , )
t
k k
p y u u

is and know how to draw a random observation from that


distribution. Sometimes we may not know what
( )
( | , )
t
k k
p y u u

is and do not
know an easy way to draw a variable from that distribution. In this case, we
must use an alternative method to draw a variable. This method is called
the Metropolis-Hastings algorithm (HASTINGS 1970; METROPOLIS et al. 1953).
This algorithm is an accept-reject method of drawing random variable. With
this method, we first draw a variable from a proposal distribution ( )
k
q u . This
proposal distribution should be as close to
( )
( | , )
t
k k
p y u u

as possible. We
then decide whether the draw should be accepted or rejected. If we reject
the draw, the old value of
k
u will be carried over to the next iteration,
otherwise, the old value is replaced by the new draw. Let
( ) t
k
u be the old
value of parameter
k
u and
*
k
u be the new value drawn from the proposal
distribution
*
( )
k
q u . The acceptance probability is defined as

* ( ) ( )
( ) ( ) *
( | , ) ( )
max 1,
( | , ) ( )
t t
k k k
t t
k k k
p y q
p y q
u u u
o
u u u

(
=
(

(4.5)
If
*
k
u is accepted, we let
( 1) * t
k k
u u
+
= , otherwise,
( 1) ( ) t t
k k
u u
+
= . Depending on the
acceptance rate, the auto correlation may be high. The ideal acceptance
rate should be around 0.6. It can be shown that if the proposal distribution is
the fully conditional posterior distribution, then 1 o = and the acceptance
rate is 100%. Therefore, the Gibbs sampler is a special case of the
Metropolis-Hastings algorithm (TIERNEY 1994).
Another special case of the Metropolis-Hastings algorithm is called the
Metropolis algorithm or random walk algorithm. With the random walk
32

algorithm, the proposal distribution is symmetric, i.e.,
( ) *
( ) ( )
t
k k
q q u u = .
Therefore, the acceptance probability is simply the posterior ratio. For
example, let
( ) ( )
( , )
t t
k k
U u u o u o A = + be a uniform random variable around
the old value of the parameter and o is a small constant. Let
* ( ) t
k k k
u u u = +A .
The proposal distribution is
* ( )
( | ) 1/ (2 )
t
k k
q u u o = . The counter part of the
proposal distribution is
( ) *
( | ) 1/ (2 )
t
k k
q u u o = , and thus
* ( ) ( ) *
( | ) ( | )
t t
k k k k
q q u u u u = .
This random walk MCMC can be used for some parameters (e.g.,
regression coefficient) but not others (e.g., residual variance).
PROC QTL uses the Gibbs sampler, the random walk algorithm or the
general Metropolis-Hastings algorithm to sample a parameter. This is
different from PROC MCMC, which uses the random walk algorithm to
sample all parameters. Because of this, PROC MCMC is so general that it
can be used for almost any problems, including QTL mapping. However,
PROC MCMC is not effective for QTL mapping because the sample size
and number of markers are usually too large for the MCMC procedure to
complete the analysis within a reasonable time frame. Majority of the time
spent by the MCMC procedure is for tuning the parameters of the proposal
distribution to get the optimal acceptance rate rather than for the actual
sampling process. PROC QTL, however, does not manipulate the tuning
parameters and thus much faster than the MCMC procedure.
DIAGNOSES OF CONVERGENCE OF MARKOV CHAIN
The concept of convergence in MCMC differs from convergence in other
numerical iteration algorithm. The iteration of MCMC sampling converges to
a joint distribution rather than a constant vector. We cannot collect the
posterior sample before the chain converges to the stationary distribution.
Therefore, an assessment of convergence is needed.
The simplest method for convergence assessment is to visualize the trace
plots for a few parameters. The trace plot for a parameter is the plot of the
sample value against the iteration number. Before the chain reaches its
stationary distribution, the trace varies chaotically. Once it reaches the
stationary distribution, the change becomes stabilized.
Figure 2 shows the trace plot for a particular parameter. The chain appears
to have converged after 100 iterations. Therefore, the burn-in period of 100
seems to be sufficient for this parameter. Strictly speaking, all parameters
(not a few) must reach their stationary distributions before we can collect
observations to form the posterior sample. In practice, this may not be
realistic because some parameters may just never converge or may be
trapped to some fixed values. As long as these parameters are not
33


Figure 2. The posterior TAD panels (trace, autocorrelation and density) for parameter
beta ( | ).

Figure 3. The trace plot of a parameter for three independent chains.
Diagnostics for beta
Iteration
10000 20000 30000 40000 50000
b
e
t
a
4.0
4.4
4.8
5.2
beta
2 3 4 5 6 7
P
o
s
t
e
r
i
o
r

D
e
n
s
i
t
y
Lag
0 10 20 30 40 50
A
u
t
o
c
o
r
r
e
l
a
t
i
o
n
-1.0
-0.5
0.0
0.5
1.0
Iteration
0 200 400 600 800 1000
P
a
r
a
m
e
t
e
r
-0.5
-0.4
-0.3
-0.2
-0.1
0.0
0.1
0.2
0.3
34

important, the results can be used for inference. Bayesian theoreticians
may not agree with this statement, but our experience on simulated data
analyses does support this statement. A better assessment of convergence
is to run multiple independent chains and monitor the trace plots of a few
parameters for all chains. Figure 3 shows the trace plots of a parameter for
three independent chains. The convergence appears to happen after 200
iterations.
The multiple-chain MCMC algorithm also gives a chance to statistically test
the convergence, called the Gelman-Rubins diagnostic test (GELMAN and
RUBIN 1992). The test statistic is analogous to ANOVA where the between
chain variance is compared with the within chain variance. A substantially
larger between chain variance than the within chain variance indicates that
convergence has not been reached. Gewekes (1992) diagnostic test can
also be used to assess the convergence. This test does not require multiple
chains but a single long chain. We select x% of the posterior sample in
early stage of the chain and x% of the posterior sample in late stage of the
chain. The means of the two subsamples are compared. If the means are
not significantly different, the chain may have converged. PROC QTL does
not provide these convergence diagnostic tests, but it does provide the
entire posterior sample from which the diagnostic test statistics can be
obtained by the investigators using other programs.
Two additional diagnostic analyses are commonly used to evaluate the
mixing properties of the sampler. They are the autocorrelation and the
effective posterior sample size. Autocorrelation is a measurement of the
dependency of consecutive observations of the posterior sample. A high
autocorrelation indicates highly dependency of the posterior observations
within the posterior sample. Autocorrelation of lag h ( h n < ) is defined as
( )
( )
( )
cov |
| , | |
cov | 0
k
k
k
h
h h n
u
u
u
= <
(4.6)
where n is the posterior sample size and
( )( )
( ) ( )
1
1
cov( | ) , 0
k k k
n h
t h t
k
t
k
h h n
n h
u u u u u

+
=
= s <


(4.7)
is the covariance between the t th observation and the t h + observation for
parameter
k
u . The denominator of equation (4.6) is the covariance at 0 h = ,
equivalent to the variance of the posterior sample.
The effective sample size is determined by the autocorrelation. A high
autocorrelation will reduce the effective sample size. The functional
relationship of the effective sample size and the autocorrelation is
35


1
ESS( )
1 2 ( | )
k
k
h
n n
h
u
t
u

=
= =
+

(4.8)
where t is referred to as the autocorrelation time. In reality, the infinity is
replaced by a cutoff point of h , beyond which ( | )
k
h u is essentially zero.
You can choose the cutoff point at a value so that ( | ) 0.05
k
h u = .
POST MCMC ANALYSIS
By default, PROC QTL generates a posterior sample that contains
observations trimmed after the burn in period. Users can use other SAS
procedures to perform post MCMC analysis on the sampled parameters.
Alternatively, users may run PROC QTL again using the posterior sample
as the input dataset to report the summary statistics of the posterior sample.
The summary statistics reported by PROC QTL only contain the posterior
sample size, the posterior mean and the posterior standard deviation. More
summary statistics will be added in a later version of PROC QTL. These
additional summary statistics include the o equal-tail credible interval and
the o highest posterior density credible interval. The alpha equal tail
credible interval ( , ) a b is defined as
Pr( ) Pr( ) / 2
k k
a b u u o < = > = (4.9)
while the highest posterior density credible interval is defined as
Pr( ) 1
k
a b u o < < = (4.10)
such that b a is the shortest interval among all other choices of a and b .
36

INTERVAL MAPPING FOR NORMALLY
DISTRIBUTED TRAITS
Interval mapping was originally developed by Lander and Botstein (1989)
and further modified by numerous authors. Interval mapping has
revolutionized genetic mapping because we can really pinpoint the exact
location of a QTL. In each of the four sections that follow, we will introduce
one specific statistical method of interval mapping based on an
2
F design.
Maximum likelihood (ML) method of interval mapping (Lander and Botstein
1989) is the optimal method for interval mapping. Least squares (LS)
method (Haley and Knott 1992) is a simplified approximation of Lander and
Botstein (1989). The iteratively reweighted least squares (IRLS) method (Xu
1998b) is a further improved method over the least squares method.
Recently Feenstra et al. (2006) developed an estimating equation (EE)
method for QTL mapping, which is an extension of the IRLS with improved
performance. Han and Xu (2008) developed a Fisher scoring algorithm
(FISHER) for QTL mapping. Both the EE and FISHER algorithms maximize
the same likelihood function and thus generate identical result.
A special comment on interval mapping is presented here in the introduction.
Interval mapping is efficient for mapping single QTL. However, it is
frequently used for mapping multiple QTL and the results are not too bad as
long as the QTL are not too closely linked (Wang et al. 2005b). With the
advent of more advanced statistical methods for mapping multiple QTL
such us composite interval mapping, multiple interval mapping and
Bayesian mapping, interval mapping seems to be out of date. What is the
justification of using interval mapping while multiple QTL model and
methodology are already available? Our experience indicates that interval
mapping is still the most reliable method for QTL mapping in most situations.
Compared with interval mapping, advanced statistical methods tend to be
more sensitive to the experimental designs, the data structures, the initial
values of the parameters and other factors such us assumptions of the
models. The advanced methods require much experience of the
investigators to make them work properly while interval mapping does not.
When applying advanced statistical methods, e.g., the Bayesian method, to
analyze real data, we almost always compare the results with that of the
interval mapping to eliminate any suspicious artifacts. If the results are
drastically different from that of the interval mapping, we always double
check the models, the methods and the programs. In most situations, the
advanced methods are supposed to improve interval mapping, not to
change the result in a fundamental way. Having said that, we are not
discouraging the use of advanced models and methodologies, we simply
recommend using advanced methods with extra cautions to prevent any
potential artifacts.
37

In this chapter, we introduce the methods based on their simplicity rather
than their chronological orders of development. Therefore, the methods will
be introduced in the following order: LS, IRLS, FISHER and ML.
SIMPLE LEAST SQUARES METHOD
The LS method was introduced by Haley and Knott (1992) aiming to
improving the computational speed. The statistical model for the phenotypic
value of the j th individual is

j j j j
y X Z | c = + +
(5.1)
where | is a 1 p vector for some model effects that are irrelevant to QTL
effects,
j
X is a 1 p known design vector, { } , d a = is a 2 1 vector for
QTL effects of a putative locus ( a for additive effect and d for dominance
effect),
j
Z is a 1 2 vector for the genotype indicator variable defined as

1 1 1
2 1 2
3 2 2
for
for
for
j
H A A
H A A
H A A

Z (5.2)
where
k
H for 1, 2, 3 k = is the k th row of matrix

1 0
0 1
1 0
H
(
(
=
(
(

(5.3)
The residual error
j
c is assumed to be a
2
(0, ) N o variable. Although normal
distribution for
j
c is not a required assumption for the LS method, it is
required for the ML method. It is important to include non-QTL effects | in
the model to control the residual error variance as small as possible. For
example, location and year effects are common in replicated experiments.
These effects are not related to QTL but will contribute to the residual error
if not included in the model. If there is no such a non-QTL effect to consider
in a nice designed experiment, | will be a single parameter (intercept) and
j
X will be unity across all 1, , j n = . .
With interval mapping, the QTL genotype is never known unless the
putative QTL position overlaps with a fully informative marker. Therefore,
Haley and Knott (1992) suggested to replace the unknown
j
Z by the
expectation of
j
Z conditional on flanking marker genotype. Let (1)
j
p , (0)
j
p
and ( 1)
j
p be the conditional probabilities for the three genotypes given
38

flanking marker information. The LS model of Haley and Knott (1992) is

j j j j
y U e | = Z + +
(5.4)
where

1 2 3
( ) (1) (0) ( 1)
j j j j j
U E p H p H p H = Z = + +
(5.5)
is the conditional expectation of
j
Z . The residual error
j
e (different from
j
c )
remains to be normal with mean zero and variance
2
o , although this
assumption has been violated (see next section). The least squares
estimate of | and is

1 1 1
1 1
1
1

n n n
T T T
j j j j j j
j j j
n n n
T T T
j j j j j j
j j j
X X X U X y
U X U U U y
|

= = =
= =

=
( (
( (
(
( (
=
(
( (

( (



(5.6)
and the estimated residual error variance is

2 2
1
1

( )
2
n
j j j
j
y X U
n p
o |
=
=


(5.7)
The variance-covariance matrix of the estimated parameters is

1 1
1
1
2
1
v

ar
n n
T T
j j j j
j j
n n
T T
j j j j
j j
X X X U
U X U U
o
|

= =
= =

(
(
(
(
=
(
(

(



(5.8)
which is a ( 2) ( 2) p p + + matrix. Let

var( ) cov( , ),
var( )

cov( , ), var( )
a a d
V
a d d

(
= = (
(

(5.9)
be the 2 2 lower diagonal block of the matrix (5.8). The standard errors of
the estimated additive and dominance effects are the square roots of the
diagonal elements of matrix (5.9).
We can use either the F-test or the W-test statistic to test the hypothesis
of
0
: 0 H = . The W-test statistic is

1
1


var( ) cov( , )


cov( , ) var( )
T
a
a a d
W V a d
d
a d d

(
(
(
= = (
(

(


(5.10)
39

The corresponding F test statistic is / 2 F W = in the
2
F design. The
likelihood ratio test statistic can also be applied if we assume that
2
~ (0, )
j
e N o for all 1, , j n = . . The log likelihood function for the full model
is

2 2 2
1 2
1
1

ln( ) ( ) [ln( ) 1]
2 2 2
n
j j
j
n n
y X U L o | o
o
=
= ~ +

(5.11)
The reduce model under
0
: 0 H = is

2 2 2
0
2
1
1

ln( ) ( ) [ln( ) 1]

2 2

n
j
j
n n
y X L o | o
o =
~ = +

(5.12)
where

1 1
1

n n
T T
j j j j
j j
X X X y |
= =

( (
( (

=

(5.13)
and

2 2
1
1

( )
n
j j
j
y X
n p
o |
=
=


(5.14)
The likelihood ratio test statistic is

0 1
2( ) L L = (5.15)
WEIGHTED LEAST SQUARES
Xu (1995) realized that the LS method is flawed because the residual
variance is heterogeneous after replacing
j
X by its conditional expectation
j
U . The conditional variance of
j
X given marker information varies from
one individual to another and it will contribute to the residual variance. Xu
(1998a, b) rewrote the exact model

j j j j
X Z y | c = + +
(5.16)
by

( )
j j j j j j
y X U Z U | c = + + +
(5.17)
which differs from the Haley and Knott's (1992) model by ( )
j j
Z U . Since
j
Z is not observable, this additional term is merged into the residual error.
Let
40


( )
j j j j
Z U e c = +
(5.18)
be the new residual error. The Haley and Knott's model (1992) can be
rewritten as

j j j j
X e y U | = + +
(5.19)
Although we assume
2
(0, )
j
N c o = , this does not validate the normal
assumption of
j
e . The expectation of
2
(0, )
j
e N o is

) [ ( ) ] ( ) 0 (
j j j j
E Z U E e E c = + =
(5.20)
The variance of
j
e
is

2 2 2
2
1
) ( ) 1 var( var
T T
j j j j
e Z o o o
o
= = +
| |

\
+ E
|
.
= (5.21)
where var( )
j j
Z = , which is defined as a conditional variance matrix given
flanking marker information. The explicit forms of
j
is

2
( ) ( ) ( )
T T
j j j j
E Z Z E Z E Z E = (5.22)
where

1 1 2 2 3 3
) (1) (0) ( 1 ( )
T T T T
j j j j j
Z p H H p H H E Z p H H = + +
and

1 2 3
) (1) (0) ( 1) (
j j j j j
U p H p H p Z H E = = + +
. (5.23)
Let

2 2 2
2
1 1
1
T
j j
j
W
o o o
o
| |
= E + =
|
\ .
(5.24)
where

1
2
1
1
T
j j
W
o

| |
= E +
|
\ .
(5.25)
is the weight variable for the j th observation. The weighted least squares
estimate of the parameters are
41


1 1 1
1 1 1
1

n n n
T T T
j j j j j j j j j
j j j
n n n
T T T
j j j j j j j j j
j j j
W W W X X X U X y
U X U U U y W W W
|

= = =
= = =

( (
( (
(
( (
=
(
( (

( (



(5.26)
and

2 2
1
1

( )
2
n
j j j j
j
W y X U
n p
o |
=
=


(5.27)
Since
j
W is a function of
2
o , iterations are required. The iteration process is
demonstrated as below.
1. Initialize and
2
o
2. Update | and using equation (5.26)
3. Update
2
o using equation (5.27)
4. Repeat step 2 to step 3 until a certain criterion of convergence is
satisfied.
The iteration process is very fast, usually taking less than 5 iterations to
converge. Since the weight is not a constant (it is a function of the
parameters), repeatedly updating the weight is required. Therefore, the
weighted least squares method is also called iteratively reweighted least
squares (IRLS). The few cycle of iterations make the results of IRLS very
close to that of the maximum likelihood method (to be introduced later). A
nice property of the IRLS is that the variance-covariance matrix of the
estimated parameters is automatically given as a by-product of the iteration
process. This matrix is

1
1 1
2
1 1
v

ar
n n
T T
j j j j j j
j j
n n
T T
j j j j j j
j j
X X X U
U X U
W
W W U
W
|
o

= =
= =
(
(
(
(
=
(
(

(



(5.28)
As a result, the F- or W-test statistic can be used for significance test. Like
the least squares method, a likelihood ratio test statistic can also be
established for significance test. The
0
L under the null model is the same
as that described in the section of least squares method. The
1
L under the
alternative model is
42


2 2
1 2
1 1
2
1
1 1

ln( ) ln( ) ( )
2 2 2
1
ln( ) 1 ln(
2 2
)
n n
j j j j
j j
n
j
j
n
W W y X U L
n
W
o |
o
o
= =
=
+
( ~ +

+

=

(5.29)
FISHER SCORING ALGORITHM
The weighted least squares solution described in the previous section does
not maximize the log likelihood function (5.29). We can prove that it actually
maximizes equation (5.29) but with
j
W treated as a constant. The fact that
j
W is a function of parameters makes the above weighted least squares
estimates suboptimal. The optimal solution should be obtained by
maximizing (5.29) fully without assuming
j
W being a constant.
Recall that the linear model for
j
y is

j j j j
X e y U | = + +

where the residual error ( )
j j j j
e Z U c = + has a zero mean and variance

2 2 2
2
1 1
1
T
j j
j
W
o o o
o
| |
= E + =
|
\ .
(5.30)
If we assume that
2
(0, )
j j
e N o , we can construct the following log
likelihood function,

2 2
2
1 1
(
1 1
ln( ) ln( ) ( )
2 2 2
)
n n
j j j j
j j
n
W W y X U L o |
o
u
= =
+ =

(5.31)
where
{ }
2
, , u | o = is the vector of parameters. The maximum likelihood
solution for the above likelihood function is hard to obtain because
j
W is not
a constant but a function of the parameters. The Newton-Raphson
algorithm may be adopted but it requires the second partial derivative of the
log likelihood function with respect to the parameter, which is very
complicated. In addition, the Newton-Raphson algorithm often misbehaves
when the dimensionality of u is high. We now introduce the Fisher scoring
algorithm for finding the MLE of u (Han and Xu 2008). The method requires
the first partial derivative of ( ) L u with respect to the parameters, called the
score vector and denoted by ( ) S u , and the information matrix, denoted by
( ) I u . The score vector has the following form,
43


2
2 2
2 2 4
1 1 1
2 2
4 2
1 1
1
1
)
1 1 1
) ) )
(
( ( (
1
)
2
(
1
n
T
j j j j
n n n
T
j j j j j j j j j
j j
j
j
j
n n
j j j j
j j
X W y
S W y y U W
W
W
W y

o
u
o o o

o o
= =
=
=
=
=
(

(
(
(
= E + E (
(
(
(


+
(

(5.32)
where

j j j
X U | = +
(5.33)
The information matrix is given below

2 2
1 1
2 2
2 2 4 4
1 1 1 1
2 2
4 4
1 1
1 1
0
1 1 2 1
)
1 1
0
(
2
n n
T T
j j j j j j
j j
n n n n
T T
j j j j j j j j j
j j j j
n n
T
j j j
j j
j j
X W X X W U
U X U U W W
W
I W
W
W
o o
u
o o o o

o o
= =
= = = =
= =
+
(
(
(
(
= E E E (
(
(
( E
(




(5.34)
The Fisher scoring algorithm is implemented using the following iteration
equation,

( 1) ( ) 1 ( ) ( )
( ) ( )
t t t t
I S u u u u
+
= + (5.35)
where
( ) t
u is the parameter value at iteration t and
( 1) t
u
+
is the updated
value. Once the iteration process converges, the variance-covariance
matrix of the estimated parameters is automatically given, which is

1
va (

) ) r ( I u u

= (5.36)
The detailed expression of this matrix is

1
1 1
2 2
2 2
1 1 1 1
2
2 2
2 2
1 1
0

2 1
var

1 1
0
2
j j
n n
T T
j j j j j j
j j
n n n n
T T
j j j j j j j j j
j j j j
n n
T
j j j
j j
X W X X W U
U X U U W W
W
W W
W
|

o o
o

o o

= =
= = = =
= =
(
(
(
(
(
(
= E E E (
(
(
(

+
(
( E
(




(5.37)
which can be compared with the variance-covariance matrix of the
iteratively reweighted least squares estimate given in the previous section.
44

MAXIMUM LIKELIHOOD METHOD
The maximum likelihood method is the optimal one compared with all other
methods described in this chapter. Recall that the linear model for the
phenotypic value of
j
y is

j j j j
X Z y | c = + +
(5.38)
where
2
(0, )
j
N c o is assumed. The genotype indicator variable
j
Z is a
missing value because we cannot observe the genotype of a putative QTL.
Rather than replacing
j
Z by
j
U as done in the least squares and weighted
least squares methods, the maximum likelihood method takes into
consideration the mixture distribution of
j
y . When the genotype of the
putative QTL is observed, the probability density of
j
y is
( )
2
2
1 1
( ) Pr( | ) exp , 1, 2, 3
2 2
k j j j k j j k
y y Z H y X f H k |
o to
(
= = = + =
(

(5.39)
When flanking marker information is used, the conditional probability that
j k
Z H = is
1
(1) Pr( )
j j
p Z H = = ,
2
(0) Pr( )
j j
p Z H = = or
3
( 1) Pr( )
j j
p Z H = = , respectively, for the three genotypes (
1 1
A A ,
1 2
A A ,
2 2
A A ). These probabilities are different from the Mendelian segregation
ratio (0.25, 0.5, 0.25). They are the conditional probabilities given marker
information and thus vary from one individual to another because different
individuals may have different marker genotypes. Using the conditional
probabilities as weights, we get the mixture distribution

3
1
) (2 ) ( ) (
k
j j k j
f y p k f y
=
=

(5.40)
where

for 1
for
(1)
(2 ) (0)
( 1)
2
for 3
j
j j
j
k
p k
p p k
p k
=

= =

(5.41)
is a special notation for the conditional probability and should not be
interpreted as
j
p times(2 ) k . The log likelihood function is

1 1
( ln ( ) ( ) )
n
j
n
j
j
j
L L f y u u
= =
= =

(5.42)
where ( ) ln ( )
j j
L f y u = .
45

The EM algorithm
The MLE of u can be obtained using any numerical algorithms but the EM
algorithm is generally more preferable than others because we can take
advantage of the mixture distribution. Derivation of the EM algorithm is not
provided in this manual. Here we simply give the result of the EM algorithm.
Assuming that the genotypes of all individuals are observed, the maximum
likelihood estimates of parameters would be

1
1 1 1
1 1 1
n n
T T
j j j j j
j j
n n
T T
j j j j j
j
n
T
j
j
n
T
j
j j
X X X Z
Z X Z Z
X y
Z y
|

= =
= =
=
=
( (
( (
(
( (
=
(
( (

( (


(5.43)
and

2 2
1
1
( )
n
j j j
j
y X Z
n
o |
=
=

(5.44)
The EM algorithm takes advantage of the above explicit solutions of the
parameters by substituting all entities containing the missing value
j
Z by
their posterior expectations, i.e.,

1
1
1 1
1 1 1
( )
) ) ( ) ( (
n n
T T
j j j j j
j j
n n
T T
j j j j j
j j
n
T
j
j
n
T
j
j
E X X X X Z
E Z X E Z Z
y
E Z y
|

= = =
= = =
( (
( (
(
( (
=
(
( (

( (



(5.45)
and

2 2
1
1
( )
n
j j j
j
E y X Z
n
o |
=
( =

(5.46)
where the expectations are taken using the posterior probabilities of QTL
genotypes. Let
3
'
*
1
(2 ) ( )
(2 )
(2 ) ( )
j k j
j
j k j
k
p k f y
k
k f y
p
p
'
=

=
'

(5.47)
be the posterior probability of the k th genotype for 1, 2, 3 k = . The posterior
expectations are
46


3
*
1
3
*
1
3
2 * 2
1
) (2 )
) (2 )
( ) (2 )(
(
(
)
j j k
k
T T
j j j k k
k
j j j j j j k
k
p k H
Z p k H H
E
E Z
E
y X Z p k y X H
Z
| |
=
=
=
=
=
( =

(5.48)
Since ( )
k j
f y is a function of parameters, and thus
*
(2 )
j
p k is also a
function of parameters. However, the parameters are unknown and they are
the very quantities we want to find out. Therefore, iterations are required.
Here is the iteration process,
1. Initialize
( ) t
u u = for 0 t =
2. Calculate the posterior expectations using equations (5.47) and
(5.48)
3. Update parameters using equations (5.45) and (5.46)
4. Increment t by 1 and repeat step 2 to step 3 until a certain criterion
of convergence is satisfied.
Once the iteration converges, the MLE of the parameters is
( )

t
u u = where t
is the number of iterations required for convergence.
Variance-covariance matrix of

u
Unlike the weighted least squares and the Fisher scoring algorithms where
the variance-covariance matrix of the estimated parameters is automatically
given as a by-product of the iteration process, the EM algorithm requires an
additional step to calculate this matrix. The method was developed by Louis
(1982) and it requires the score vectors and the Hessian matrix for the
complete-data log likelihood function rather than the actual observed log
likelihood function. The complete-data log likelihood function is the log
likelihood function as if
j
Z were observed, which is

1
, ) ( , ) (
n
j
j
Z Z L L u u
=
=

(5.49)
where

2 2
2
1 1
( , ) ln( ) ( )
2 2
j j j j
Z y L X Z u o |
o
= (5.50)
The score vector is

1
( ( , ) , )
n
j
j
Z Z S S u u
=
=

(5.51)
47

where

2
2
2
2 4
2
1
( , )
)
1
( , ) ( , ) )
1 1
)
( , )
2 2
(
(
(
T
j
j j j j
T
j j j j j j
j j j
j
L
y
S L y
Z
X X Z
Z Z Z X Z
y
L
X Z
Z
u
|
|
o
u u |
o
|
u
o o
o
(
(

(
(
(
(
(
(
= =
(
(
(
(
(
(
+

c
c
c
(
(


c
c
c
(5.52)
The second partial derivative (Hessian matrix) is

1
, ) ( , ) (
n
j
j
Z H Z H u u
=
=

(5.53)
where

2 2 2
2
2 2 2
2
2 2 2
2 2 2 2
( , ) ( , ) ( , )
( , ) ( , ) ( , )
( , )
( , ) ( , ) ( , )
T T
j T T
T T
L Z L Z L Z
L Z L Z L Z
Z
L Z L Z Z
H
L
u u u
| | | | o
u u u
u
| o
u u u
o | o o o
( c c c
(
c c c c c c
(
(
c c c
=
(
c c c c c c
(
(
c c c
(
c c c c c c (

(5.54)
The six different blocks of the above matrix are

2
2
2
2
2
2 4
2
2
2
2 4
2
2
2 2 4 6
( , )
( , ) 1
( , ) 1
)
( , ) 1
( , ) 1
)
( , ) 1 1
)
2
1
(
(
(
T
T
j j T
T
j j j
T
j j
j
T
j j T
T
j j j j
j j j
L Z
L Z
X Z
L Z
X X Z
L Z
Z
X X
y
y
y
Z
L Z
Z X Z
L Z
X Z
u
| |
u
| o
u
|
| o o
u
o
u
|
o o
u
|
o o o
o
o
c
=
c c
c
=
c c
c
=
c c
c
=
c c
c
=
c c
c
=
c

c
(5.55)
We now have the score vector and the Hessian matrix available for the
complete-data log likelihood function. The Louis information matrix is
| | ) ( , ) ( , ) ( , ) (
T
E H Z E S Z S Z I u u u u ( =

(5.56)
48

where the expectations are taken with respect the missing value (
j
Z ) using
the posterior probabilities of QTL genotypes. At the MLE of parameters,

, ) ( 0 S Z E u
(
=

. Therefore,

( ) ( ) ( ) ( ) ( )
( )
, , var , , ,
var ,
T T
E S Z S Z S S Z E S Z E Z
S Z
u u u u u
u
( ( = + ( (

= (

(5.57)
As a result, an alternative expression of the Louis information matrix is

| | | |
( ) ( )
1 1
var ( ) ( , ) ( , )
, var ,
n n
j j
j j
I E H Z S Z
E H Z S Z
u u u
u u
= =
=
( ( =


(5.58)
The expectations and variances in the above information matrix are given
below.

2 2 2
2
2 2 2
2
2 2
( , ) ( , ) ( , )
( , ) ( , ) ( , )
( , )
( , ) ( , ) ( , )
j j j
T T
j j j
j T T
j j j
T T
L Z L Z L Z
E E E
L Z L Z L Z
E H Z E E E
L Z L Z L Z
E E E
u u u
| | | | o
u u u
u
| o
u u u
o | o
| | | | | | c c c
| | |
| | |
c c c c c c
\ . \ . \ .
| | | | | | c c c
( =
| | |

| | |
c c c c c c
\ . \ . \ .
| | | |
| |
c c c c c
\ . \ .
2 2
o o
(
(
(
(
(
(
(
(
| |
(
|
c
( \ .

(5.59)
The six different blocks of the above matrix are

2
2
2
2
2 4
2
2
2
2 4
2
2
( )
( )
1
( )
(
1
( )
)
1
( )
( )
1
( )
( )
1
( )
j
T
j T
j j T
j T
j j j j
j T
j j T
j
j j j
j
T
j j
T
j
L
E X X
L
E
L
E
L
E
L
E Z
X E Z
X y X E Z
E Z Z
E
L
X
E
y Z
o

u
| |
u
| o
u
|
| o o
u
o
u
|
o o
u
| |
=
|
|
\ .
| |
=
|
|
\ .
| |
( =
|

|
\ .
| |
=
|
|
\ .
| |
( =
|

|
c

c c
c
c c
c
c c
c
c c
c

c c
\ .
c
c
2
2 2 4 6
(
1
2
)
1
j j j
E y X Z |
o o o o

| |
( =


c
|

|
\ .
(5.60)
49

Again, all the expectations are taken with respect to the missing value
j
Z ,
not the observed phenotype
j
y . This is very different from the information
matrix of the Fisher scoring algorithm. The variance-covariance matrix of
the score vector is
| |
1
var ( , ) var ( , )
n
j
j
S Z S Z u u
=
( =

(5.61)
where

2
2
2
var[ ( , )]
( , ) ( , ) ( , ) ( , ) ( , )
var cov , cov ,
( , ) ( , ) ( , ) ( , ) ( , )
cov , var cov ,
( , ) ( , )
cov ,
j
j j j j j
T
j j j j j
T
j j
S Z
L Z L Z L Z L Z L Z
L Z L Z L Z L Z L Z
L Z L Z
u
u u u u u
| | | o
u u u u u
| o
u u
o |
c c c c c | | | | | |
| | |
c c c c c
\ . \ . \ .
c c c c c | | | | | |
=
| | |
c c c c c
\ . \ . \ .
c c
c c
2 2
( , ) ( , ) ( , )
cov , var
j j j
T T
L Z L Z L Z u u u
o o
(
(
(
(
(
(
(
c c c | | | | | | (
| | |
(
c c c
\ . \ . \ .
(5.62)
The variances are calculated with respect to the missing value using the
posterior probabilities of QTL genotypes. Following is the detailed
expression of all blocks of the above matrix.
( , ) ( , ) ( , ) ( , ) ( , ) ( , )
cov ,
j j j j j j
T T T
L Z L Z L Z L Z L Z L Z
E E E
u u u u u u
| | |
c c c c c c | | ( ( (
=
| ( ( (
c c c c c c
\ .

2 2
2
( , ) ( , ) ( , ) ( , ) ( , )
( , )
cov , ,
j j j j j
j
L Z L Z L Z L Z L Z
L Z
E E E
u u u u u
u
| o | | o o
c c c c c | | ( ( c (
=
| ( (
(
c c c c c c
\ .

( , ) ( , ) ( , ) ( , ) ( , )
var
j j j j j
L Z L Z L Z L Z L Z
E E E
u u u u u

c c c c c | | ( ( (
=
| ( ( (
c c c c c
\ .

2 2
2
( , ) ( , ) ( , ) ( , ) ( , )
( , )
cov ,
j j j j j
j
L Z L Z L Z L Z L Z
L Z
E E E
u u u u u
u
o o o
c c c c c | | ( ( c (
=
| ( (
(
c c c c c c
\ .

2
2 2 2 2
( , )
( , ) ( , ) ( , ) ( , )
var
j
j j j j
L Z
L Z L Z L Z L Z
E E E
u
u u u u
o
o o o o
c | | c c c c ( ( (
=
|
( ( (
c
c c c c
\ .

where
2
( , )
1
( )
j T
j j j j
L Z
E X y X E Z
u
|
| o
c (
( =
(

c


2
( , )
1
( )
j
j j j j
T
L Z
E E y X Z Z
u
|
o
c (
( =
(

c


50

2
2 2 4
( , )
1 1
( )
2 2
j
j j j
L Z
E E y X Z
u
|
o o o
c (
= +
(
c


2
4
( , )
1
( )
( , )
j j
j j j j j
T
L Z L Z
E X y X Z X E
u u
|
| | o
c c (
=
(
c c


2
4
( , ) ( , )
1
( )
j j T
j j j j j
T
L Z L Z
E X E y X Z Z
u u
|
| o
c c (
( =
(

c c


3
4 6 2
( ) ( ) ( , ) ( , )
) ,
2 2
T T
j j j j j j j j j j
y X E Z X X E y X Z L Z L Z
E
| | u u
o o | o
( c c (

= +
(
c c


2
4
( , ) ( , )
1
( )
j j
j j j j
j
T
L Z L Z
E E Z y X Z Z
u u
|
o
c c (
( =
(

c c


3
4 6 2
( ) ( )
( , ) ( , )
2 2
T T
j j j j j j j j
j j
E E Z y X Z Z y X Z
L Z L Z
E
| |
u u
o o o
( (
c c (

= +
(
c c


2 4
4 6 8
2 2
( )
( , ) ( , ) 1
4 2
( )
4
j j j j j j
j j
E E y X Z y X Z
L Z L Z
E
| |
u u
o o o
o o
(
c c (

= +
(
c
(
c


The expectations in the above equations are calculated using the posterior
probability of QTL genotype by

3
*
1
3
*
1
3
2 * 2
1
3
*
1
3 *
(2 )
( ) (2 )( )
( ) (2 ) ( )
( ) (2 )( )
(
) (2 )
)
(
j j k
k
j j j j j j j k k
k
T T
j j j j j j j j j k k
k
n n
j j j j j j k
k
T
j j j j j j
Z p k H
E y X Z Z p k y X H H
E Z y X Z Z p k H y X H H
E y X Z p k y X H
E Z y X Z H
E
p k
| |
| |
| |
|
=
=
=
=

( =

( =

( =

( =

=

3
3
1
( )
T
j j k
k
y X H |
=


When calculating the information matrix, the parameter u is substituted by

u , the MLE of u . Therefore, the observed information matrix is




( ) ( , ) var ( , ) I E H Z S Z u u u
( (
=

(5.63)
and the variance-covariance matrix of the estimated parameters is
51


1

var( ) ( ) I u u

= (5.64)
HYPOTHESIS TESTING
The hypothesis that
0
: 0 H = can be tested using several different ways. If

var( ) u is already calculated, we can use the F- or W-test statistic, which


requires
var( ) , the variance-covariance matrix of the estimated QTL
effects. It is a submatrix of

var( ) u . The W-test statistic is



1
var ( )
T
W

= (5.65)
Alternatively, the likelihood ratio test statistic can be applied to test
0
H . We
have presented two log likelihood functions, one is the complete-data log
likelihood function, denoted by ( , ) L Z u , and the other is the observed log
likelihood function, denoted by ( ) L u . The log likelihood function used to
construct the likelihood ratio test statistic is ( ) L u , not ( , ) L Z u . The
complete-data log likelihood function, ( , ) L Z u , is only used to derive the EM
algorithm and the observed information matrix. The likelihood ratio test
statistic is

0 1
2( ) L L = (5.66)
where
1

( ) L L u = is the observed log likelihood function evaluated at


{ }
2

, , u | o = and
0
L is the log likelihood function evaluated at
{ }
2


, 0, u | o =
under the restricted model. The estimated parameter

u under the restricted


model and
0
L are the same as those given in the section of least squares
(5.12).
REMARKS ON THE FOUR METHODS OF INTERVAL MAPPING
The LS method (Haley and Knott 1992) is an approximation of the ML
method, aiming to improve the computational speed. Because of the fast
speed, the method remains a popular method, even though the computer
power has increased by many orders of magnitude since the LS was
developed. In some literature (e.g., Feenstra et al. 2006), the LS method is
also called the H-K method in honor of the authors, Haley and Knott (1992).
Xu (1995) noticed that the LS method, although a good approximation to
ML in terms of estimates of QTL effects and test statistic, may lead to a
biased (inflated) estimate for the residual error variance. Based on this work,
Xu (1998a, b) eventually developed the iteratively reweighted least squares
52

(IRLS) method. In his works (XU 1998a, b), the iteratively reweighted least
squares was abbreviated IRWLS. Xu (1998b) compared LS, IRLS and ML
in a variety of situations and conclude that IRLS is always better than LS
and as efficient as ML. When the residual error does not have a normal
distribution, which is required by the ML method, LS and IRLS can be better
than ML. In other words, LS and IRLS are more robust than ML to departure
from normality. Kao (2000) and Feenstra et al. (2006) conducted more
comprehensive investigation on LS, IRLS and ML and found that when
epistatic effects exist, LS can generate unsatisfactory results, but IRLS and
ML usually map QTL better than LS. In addition, Feenstra et al (2006)
modified the weighted least square method by using the estimating
equations (EE) algorithm. This algorithm further improved the efficiency of
the weighted least squares by maximizing an approximate likelihood
function. Most recently, Han and Xu (2008) developed a Fisher scoring
(FISHER) algorithm to maximize the approximate likelihood function. Both
the EE and Fisher algorithm maximize the same likelihood function, and
thus they produce identical results.
The LS method ignores the uncertainty of the QTL genotype. The IRLS,
FISHER (or EE) and ML methods use different ways to extract information
from the uncertainty of QTL genotype. If the putative location of QTL
overlaps with a fully informative marker, all four methods produce identical
result. Therefore, if the marker density is sufficiently high, there is virtually
no difference for the four methods. For low marker density, when the
putative position is far away from either flanking marker, the four methods
will show some difference. This difference will be magnified by large QTL.
Han and Xu (2008) compared the four methods in a simulation experiment.
In this research, when the putative QTL position is fixed in the middle of a
10 cM interval, the four methods generated almost identical results.
However, when the interval expands to 20 cM, the differences among the
four methods become noticeable.
A final remark on interval mapping is the way to infer the QTL genotype
using flanking markers. If only flanking markers are used to infer the
genotype of a putative position bracketed by the two markers, the method is
called interval mapping. Strictly speaking, interval mapping only applies to
fully informative markers because we always use flanking markers to infer
the QTL genotype. However, almost all datasets obtained from real life
experiments contain missing, uninformative or partially informative markers.
To extract maximum information from markers, people always use the
multipoint method (Jiang and Zeng 1997) to infer a QTL genotype. The
multipoint method uses more markers or even all markers of the entire
chromosome (not just flanking markers) to infer the genotype of a putative
position. With the multipoint analysis, we no longer have the notion of
interval, and thus interval mapping is no long an appropriate phrase to
describe QTL mapping. Unfortunately, a more appropriate phrase has not
been proposed and people are used to the phrase of interval mapping.
53

Therefore, the so called interval mapping in the current literature means
QTL mapping under a single QTL model, regardless whether the genotype
of a putative QTL position is inferred from flanking markers or all markers.
54

INTERVAL MAPPING FOR DISCRETE TRAITS
Many disease resistance traits in agricultural crops are measured in ordered
categories. The generalized linear model (GLM) methodology (MCCULLAGH
and NELDER 1989) is an ideal tool to analyze these traits. In QTL mapping
for continuously distributed traits, mixture model (LANDER and BOTSTEIN
1989) is the most efficient way to take advantage of marker information.
The least square method of Haley and Knott (1992) is the simplest way to
incorporate linked markers. Performances of the weighted least squares
method of Xu (1998a, b) and estimating equations (EE) algorithm of
Feenstra et al. (2006) are usually between the least squares and mixture
model methods. These methods have been successfully applied to QTL
mapping for continuous traits, but they have not been investigated for
ordinal trait QTL mapping. In the current version of PROC QTL, there are
four GLM methods for mapping ordinal trait QTL including the ML under
homogeneous variance (Expectation substitution method, LS), ML under
heterogeneous variance (heterogeneous residual variance, Fisher scoring),
ML under mixture distribution (Quasi-likelihood method) and ML via the EM
algorithm. The algorithm of QTL mapping under the generalized linear
model used in PROC QTL can be found from Xu and Hu (2010a).
GENERALIZED LINEAR MODEL FOR ORDINAL TRAITS
Suppose that a disease phenotype of individual j ( 1, , j n = . ) is measured
by an ordinal variable denoted by 1, , 1
j
S p = . + , where 1 p + is the total
number of disease classes and n is the sample size. Let
{ }
j jk
Y Y = ,
1, , 1
k
p = . + , be a ( 1) 1 p + vector to indicate the disease status of
individual j . The k th element of
j
Y is defined as

1 if
0 if
j
jk
j
S k
S k
y
=
=
=

(6.1)
Using the probit link function, the expectation of
jk
Y is defined as
( ) ( )
1
( )
jk jk k j j k j j
E Y X Z X Z o | o |

= = u + + u + +
(6.2)
where
k
o (
0
o = and
1 p
o
+
= +) is the intercept, | is a 1 q vector for
some systematic effects (not related to the effects of quantitative trait loci),
and is an 1 r vector for the effects of a quantitative trait locus. The
symbol ( ) u is the standardized cumulative normal function. The design
matrix
j
X is assumed to be known, but
j
Z may not be fully observable
55

because it is determined by the genotype of j for the locus of interest.
Because the link function is probit, this type of analysis is called probit
analysis. Let
{ }
j jk
= be a ( 1) 1 p + vector. The expectation for vector
j
Y
is ( )
j j
E Y = and the variance matrix of
j
Y is
var( )
T
j j j j j
V Y = = + (6.3)
where diag( )
j j
= . The method to be developed requires the inverse of
matrix
j
V . However,
j
V is not of full rank. We can use a generalized inverse
of
j
V , such as
1
j j
V

= , in place of
1
j
V

. The parameter vector is


{ } , , u o | = with a dimensionality of ( ) 1 p q r + + . Binary data is a special
case of ordinal data in that 1 p = so that there are only two categories,
{ } 1, 2
j
S = . The expectation of
jk
Y is

1 0
2 1
for 1 ( ) ( )
( ) ( ) for 2
j j j j
jk
j j j j
X Z X Z
X Z X Z
k
k
o | o |

o | o |
=
=
u + + u + +
u + + u + = +

(6.4)
Because
0
o = and
2
o = + in the binary case, we have

1
1
( ) for 1
1 ( ) for 2
j j
jk
j j
X Z k
X Z k
o |

o |
u + + =
=

u + + =

(6.5)
We can see that
2 1
1
j j
= and

1
1 1
( )
j j j
X Z o |

u = + + (6.6)
The link function is
1
( )

and thus it is called the probit link function. Once


we take the probit transformation, the model becomes a linear model.
Therefore, this type of model is called a generalized linear model (GLM).
The usual linear model we deal with for continuous traits is a special case of
the GLM because the link function is simply the identity, i.e.,

1
1 1
( )
j j j
X Z I o |

= + + (6.7)
or simply

1 1 j j j
X Z o | = + +
(6.8)
Let us first assume that the genotypes of the QTL are observed for all
individuals. In this case, variable
j
Z is not missing. The log likelihood
function under the probit model is
56


1
( ) ( )
n
j
j
L L u u
=
=

(6.9)
where

1
1
1
( ) ln ( ) ( )
p
j jk k j j k j
k
j
X Z X Z L Y u o | o |

+
=
( = u + + u + +

(6.10)
and { } , , u o | = is the vector of parameters. This is the simplest GLM
problem and the classical iteratively weighted least squares approach for
GLM (NELDER and WEDDERBURN 1972; WEDDERBURN 1974) can be used
without any modification. The iterative equation under the classical GLM is
given below,

( 1) ( ) 1 ( ) ( )
( ) ( )
t t t t
I S u u u u
+
= + (6.11)
where
( ) t
u is the value of parameter at the current iteration,
( )
( ) t
I u is the
information matrix and
( )
( ) t
S u is the score vector, both evaluated at
( ) t
u .
We can interpret

1 ( ) ( )
( ) ( )
t t
I S u u u

A = (6.12)
in equation (6.11) as the adjustment for
( ) t
u to improve the solution in the
direction that leads to the ultimate maximum likelihood estimate of u .
Equation (6.3) shows that the variance of
j
Y is a function of the expectation
of
j
Y . This special relationship leads to a convenient way to calculate the
information matrix and the score vector, as given by Wedderburn (1974),

1
( )
n
j
j
T
j j
W D I D u
=
=

(6.13)
and

1
( ( ) )
n
T
j j j j
j
D W Y S u
=
=

(6.14)
where
1
j j
W

= . Therefore, the increment (adjustment) of the parameter can


be estimated using the following iteratively weighted least squares
approach,

1
1 1
( )
n n
T T
j j j j j j j
j j
D D W D W Y u

= =
( (
A =


(6.15)
where
j
D is a ( 1) ( ) p p q r + + + matrix for the first partial derivatives of
j

with respect to the parameters and
1
j j j
W V

= = is the weight matrix.
Matrix
j
D can be partitioned into three blocks,
57


T T T
j j j j
j
T
D

u o |
c c (
=
c c
(

=
c c c

c
(6.16)
The first block
{ }
/ /
T
j jk l
o o c c = c c is a ( 1) p p + matrix with

( )
( )
1
1
0, { 1, }
jk
k j j
k
jk
k j j
l
k
jk
X Z
X Z
l k k

o |
o

| |
o
|
o

c
=
c
c
=
c
c
= =
+
c
+

+
+
(6.17)
The second block / { / }
T
j jk
| | c c = c c is a ( ) 1 p q + matrix with

1
( ) ( )
jk
T
j k j j k j j
X X Z X Z

| o | | o |
|

c
( = + + + +

c
(6.18)
The third block
{ }
/ /
j jk
c c = c c is a ( ) 1 p r + matrix with

1
( ) ( )
jk
T
j k j j k j j
Z X Z X Z

| o | | o |


c
( = + + + +

c
(6.19)
where (.) | is the probability density of the standardized normal variable. In
all the above partial derivatives, the range of k is 1, , 1 k p = . + . The
sequence of parameter values during the iteration process converges to a
local maximum likelihood estimate, denoted by

u . The variance-covariance
matrix of

u is approximately equal to
1
var ( ) )

( I u u

= , which is a by-product
of the iteration process. Here, we are actually dealing with a situation where
the QTL overlaps with a fully informative marker because observed marker
genotypes represent the genotypes of the disease locus.
EXPECTATION SUBSTITUTION METHOD
This method is also called the homogeneous variance model. If the QTL of
interest does not overlap with any markers, the genotype of the QTL is not
observable, i.e.,
j
Z is missing. The classical GLM does not apply directly to
such a situation. The missing value
j
Z still has some information due to
linkage with some markers. Again, we use an
2
F population as an example
to show how to handle the missing value of
j
Z . The ML estimation of
parameters under the homogeneous variance model is obtained simply by
substituting
j
Z with the conditional expectation of
j
Z given flanking marker
58

information. Let

(2 ) Pr( | marker), 1, 2,3
j j
g Z Hg g p = = =
(6.20)
be the conditional probability of the QTL genotype given marker information,
where the marker information can be either drawn from two flanking
markers (interval mapping, LANDER and BOTSTEIN 1989) or multiple markers
(multipoint analysis, JIANG and ZENG 1997). Vector
g
H is the g th row of
matrix H ,

1 0
0 1
1 0
H
(
(
=
(
(

(6.21)
which has been defined early in the mixture model maximum likelihood
method of interval mapping for continuous traits. Using marker information,
we can calculate the expectation of
j
Z , which is

3
1
( ) (2 )
j j j g
g
E Z p g H U
=
= =

(6.22)
The method is called ML under the homogeneous residual variance model
because when we substitute
j
Z by
j
U , the residual error variance is no
longer equal to unity; rather it is inflated and varies across individuals.
However, the homogeneous variance model here assumed the residual
variance is a constant across individuals. This method is analogous to the
Haley and Knott's (1992) method of QTL mapping applied to continuous
traits. As a result, it is invoked by the "LS" method when the trait is ordinal.
In other words, if you choose the method option in the PROC QTL
statement as method = "LS", this expectation substitution algorithm will be
used to estimate the QTL effects. The method is exactly the same as that
described in the generalized linear model section except that
j
Z is replaced
by
j
U . In summary, the expectation of the data is
{ }
j jk
= where
( ) ( )
1
( )
jk jk k j j k j j
E Y X U X U o | o |

= = u + + u + +
(6.23)
The weight matrix for individual j is

1
diag ( )
j j j
W V

= = (6.24)
The partial derivative is

T T T
j j j j
j
T
D

u o |
c c (
=
c c
(

=
c c c

c
(6.25)
59

The first block
{ }
/ /
T
j jk l
o o c c = c c is a ( 1) p p + matrix with

( )
( )
1
1
0, { 1, }
jk
k j j
k
jk
k j j
l
k
jk
X U
X U
l k k

o |
o

| |
o
|
o

c
=
c
c
=
c
c
= =
+
c
+

+
+
(6.26)
The second block / { / }
T
j jk
| | c c = c c is a ( ) 1 p q + matrix with

1
( ) ( )
jk
T
j k j j k j j
X X U X U

| o | | o |
|

c
( = + + + +

c
(6.27)
The third block
{ }
/ /
j jk
c c = c c is a ( ) 1 p r + matrix with

1
( ) ( )
jk
T
j k j j k j j
U X U X U

| o | | o |


c
( = + + + +

c
(6.28)
where (.) | is the probability density of the standardized normal variable.
FISHER SCORING METHOD
The homogeneous variance model described above is only a first moment
approximation because the uncertainty of the estimated
j
Z has been
ignored. Let

3
1
var( ) (2 )
T T
j j j g g j j
g
Z p g H H U U
=
= =

(6.29)
be the conditional covariance matrix for
j
Z . Note that model (6.2) with
j
Z
substituted by
j
U is

1
( ) ( ) ( )
jk jk k j j k j j
E Y X U X U o | o |

+ = = + + +
(6.30)
An underlying assumption for this probit model is that the residual error
variance for the "underlying liability" of the disease trait is unity across
individuals. Once
j
U is used in place of
j
Z , the residual error variance
becomes

2
1
T
j j
o = + (6.31)
This is an inflated variance and it is heterogeneous across individuals. In
order to adjust for the inflation of the residual error variance, we need to
60

rescale the model effects as follows,

1
1 1
( ) ( ) ( )
jk jk k j j k j j
j j
E Y X U X U o | o |
o o

( (
= = + + + +
( (
( (

(6.32)
This modification leads to a change in the partial derivatives of
j
with
respect to the parameters. Corresponding changes in the derivatives are
given below.

1
1
1 1
( )
1 1
( )
0, { 1, }
jk
k j j
k j j
jk
k j j
k j j
jk
l
X U
X U
l k k

| o |
o o o

| o |
o o o

( c
= + +
(
c
(

( c
= + +
(
c
(

c
= =
c
(6.33)

1
1 1 1 1
( ) ( )
jk T T
k j j j k j j j
j j j j
X U X X U X

| o | | o |
| o o o o

( ( c
= + + + +
( (
c
( (

(6.34)
and

2
1 1
1 1 1
( ) ( )
1 1 1
( ) ( )
jk T
k j j j k j j j
j j j
T
k j j j k j j j
j j j
X U U X U
X U X U U

| o | o |
o o o
| o | o |
o o o

( ( c
= + + + +
( (
c
( (

( (
+ + + +
( (
( (

(6.35)
The iteration formula remains the same as (6.11) except that the modified
weight and partial derivatives are used under the heterogeneous residual
variance model. To invoke this method, the method option in the PROC
QTL statement must be set as method = "fisher".
APPROXIMATE MIXTURE MODEL
In this approximate model, we define genotype specific expectations and
then combine them using the probabilities of QTL genotypes inferred from
markers as the weights. Let

1
( ) ( ) ( ) ( )
jk jk k j g k j g
g E Y X H X H o | o |

= = + + u + +
(6.36)
be the genotype specific expectation of
jk
Y when individual j takes the g th
genotype for 1, 2, 3 g = . The weighted expectation is
61


3
1
(2 ) ( )
jk jk
g
p g g
=
=

(6.37)
Let
{ }
j jk
= be a column vector containing the
jk
s. The corresponding
weight for individual j is

1 1
diag ( )
j j j j
W V

= = = (6.38)
Let

( ) ( ) ( ) ( )
( )
T T
j j
T
j
T
j
j
g g g g
D g

u o |
(
=
c c c c
=
c c c

c
(

(6.39)
be the partial derivatives of the genotype specific expectation with respect
to the parameters. The corresponding values of ( )
j
D g are

1
1
1
( )
(
( )
( )
( )
0 1, ,
)
jk
k j g
k
jk
k j g
k
jk
l
g
g
g
X H
X H
l k k

| o |
o

| o |
o

c
=
c
c
=
+ +
+ +
= =
c
c

c
(6.40)

1
( )
( ) ( )
jk
T
j k j g k j g
X X
g
H X H

| o | | o |
|

+ +
c
+ =
c
+ (

(6.41)
and

1
( )
( ) ( )
jk
T
g k j g k j g
X X
g
H H H

| o | | o |


+ +
c
+ =
c
+ (

(6.42)
The weighted average of ( )
j
D g is

3
1
(2 ) ( )
j j
g
D D p g g
=
=

(6.43)
We have defined
j
,
j
W and
j
D , which are all what we need to establish
the following iteration equation to estimate the parameter,

1
( 1) ( )
1 1
( )
n n
t t T T
j j j j j j j
j j
D W D W Y D u u

+
= =
( (
= +


(6.44)
This approximate model is invoked by turning on the method = irls option
in the PROC QTL statement.
62

MIXTURE MODEL MAXIMUM LIKELIHOOD METHOD
The exact mixture model approach defines genotype specific expectation,
variance matrix and all derivatives for each individual. Let

1
( ) ( ) ( ) ( )
jk jk k j g k j g
g E Y X H X H o | o |

= = + + u + +
(6.45)
be the genotype specific expectation of
jk
Y when individual j takes the g th
genotype for 1, 2, 3 g = . The corresponding genotype specific weight matrix
is

1 1
( ) ( ) diag ( )
j j j
W g g g

( = =

(6.46)
Let ( )
j
D g be the genotype specific partial derivative of the expectation with
respect to the parameters whose elements are defined as

1
1
1
( )
(
( )
( )
( )
0 1, ,
)
jk
k j g
k
jk
k j g
k
jk
l
g
g
g
X H
X H
l k k

| o |
o

| o |
o

c
=
c
c
=
+ +
+ +
= =
c
c

c
(6.47)

1
( )
( ) ( )
jk
T
j k j g k j g
X X
g
H X H

| o | | o |
|

+ +
c
+ =
c
+ (

(6.48)
and

1
( )
( ) ( )
jk
T
g k j g k j g
X X
g
H H H

| o | | o |


+ +
c
+ =
c
+ (

(6.49)
Let us define the posterior probability of QTL genotype after incorporating
the disease phenotype for individual j as

*
3
1
(2 ) ( )
(2 )
(2 ) ( )
T
j j j
j
T
j j j
g
p g Y g
p g
p g Y g

'=

=
' '

(6.50)
The increment for parameter updating under the mixture model is

( )
1
1 1
) ( ( )
n n
T T
j j j j j j j
j j
E D W D E D W Y u

= =
( (
A =
( (


(6.51)
where

3
*
1
) ( ) ( ) ( ) ( ) (
T T
j j j j j j j
g
W D p g D g W g D g E D
=
=

(6.52)
63

( ) ( )
3
*
1
( ) (2 ) ( ) ( ) ( )
T T
j j j j j j j j j
g
Y p g D g W Y g E D g W
=
=

(6.53)
and

1
( ) ( )
j j
g W g

= (6.54)
This is actually an EM algorithm where calculating the posterior probabilities
of QTL genotype and using the posterior probabilities to calculate
) (
T
j j j
W D E D and
( )
( )
T
j j j j
W Y E D constitute the E-step and calculating the
increment of the parameter using the weighted least square formula makes
up the M-step.
VARIANCE-COVARIANCE MATRIX FOR ESTIMATED PARAMETERS
A problem with this EM algorithm is that

var( ) u is not a by- product of the


iteration process. For simplicity, if the markers are sufficient close to the trait
locus of interest, we can use

1
1

) var ) ( (
n
T
j j j
j
E D W D u

=
~
(

(6.55)
to approximate the covariance matrix of estimated parameters. This is an
underestimated variance matrix. A more precise method to calculate var(

) u
is to adjust the above equation by the information loss due to uncertainty of
the QTL genotype. Let

1

| ( ) ( )
n
T
j j j j
j
Y S Z D W u
=
=

(6.56)
be the score vector as if Z were observed. Louis (1982) showed that the
information loss is due to the variance-covariance matrix of the score vector,
which is

1

( | ) ( var r ) va
n
T
j j j j
j
S Z W D Y u
=
(

(
=


(6.57)
The variance is taken with respect to the missing value Z using the
posterior probability of QTL genotype. The information matrix after adjusting
for the information loss is

1 1
( va ( ) ( r

) )
n n
T T
j j j j j j j
j j
E D W D D I W Y u
= =
( =


(6.58)
The variance-covariance matrix for the estimated parameters is then
approximated by var(

) u =
1

) ( I u

. The first term in the above expression is


64

( )
3
*
1 1 1
( ) ( ) ( ( ) ) 2
n n
T T
j j j j j j j
j j g
E D W D p g D W D g g g
= = =
=

(6.59)
which is the expected value of the negative Hessian matrix. The second
term of equation (6.58) is

3
*
1 1 1
var ( | ) ( 2) ( | ) ( ) ( | ) ( )
n n
T
j j j j j j j
j j g
S Z p g S g S S g S u u u u u
= = =
( ( ( =


(6.60)
which is the variance matrix of the score vector, where
( )
( ) ) ) | ( ) ( (
T
j j j j j
S g D W Y g g g u =
(6.61)
and

3
*
1
( ) ( ) ( | )
j j j
g
S p g S g u u
=
=

(6.62)
The mixture model maximum likelihood method is invoked by turning on the
method = "ml" option in the PROC QTL statement.
HYPOTHESIS TESTING
The two statistical tests introduced in QTL mapping for continuous traits are
also applicable for the analysis of ordinal traits. The variance-covariance
matrix of the QTL effects is easily obtained from a subset of

var( ) u . The
Wald-test for QTL effects under the hypothesis
0
: 0 H = is
var( )
T
W = . (6.63)
Alternatively, the likelihood ratio test statistic can be applied to test
0
H ,

0 1
2( ) L L = (6.64)
where

1
1
log( )
n
T
j j
j
L Y
=
=

and
0
1

log( )
n
T
j j
j
L Y
=
=

(6.65)
In the above equation,
T
j
and

T
j
are the expectations of
j
Y at
{ }
2

, , u | o = and
{ }
2


, 0, u | o = , respectively.
EXTENSION TO OTHER TRAITS
Ordinal traits are the most commonly observed discrete traits in QTL
65

mapping experiments. Other discrete traits also commonly seen in QTL
mapping experiments are binary traits, binomial traits and Poisson traits.
This section is dedicated to these commonly observed discrete traits. The
mixture model algorithm and the heterogeneous variance approximation
apply to all traits as long as the traits can be analyzed by the generalized
linear model. To apply the algorithms to any specific trait, we only need to
find: (1) the distribution of trait (probability density of the data point), (2) the
expectation of the data point, (3) the weight (inverse of the variance) of the
data point and (4) the partial derivative of the expectation with respect to the
parameter. We now introduce these discrete traits and provide details of the
formulas for interested readers.
Binary traits
Binary traits can be treated as a special case of ordinal traits with 1 p = .
Without any modification, the method developed for ordinal traits can be
applied to binary traits with
1 2
[ ]
T
j j j
Y Y Y = defined as a 2 1 vector. Each of
the two components is defined as a binary variable and the two components
are perfectly correlation. Here we simplify the problem by defining
j
Y as a
univariate binary trait. This univariate treatment not only saves computing
time but also simplifies the notation. We now use the univariate definition to
define the binary phenotype,

1 for trait presence
0 for trait absence
j
Y

=

(6.66)
The expectation and variance of the phenotype given the parameter value
are
( )
( )
j j j j
E Y X Z | = = u +
(6.67)
and

var( ) (1 )
j j j j
Y V = =
(6.68)
The probability density is

1
( ) (1 )
j j
Y Y
j j j
p Y

= (6.69)
We now give the details for the mixture model and heterogeneous variance
model. Let 1, 2, 3 g = index the three genotypes and

( ) ( ) | ( )
j j j g
g g E Y X H | = = +
(6.70)
The D matrix for genotype g is defined as
66


( ) ( )
( )
j j
j
T T
g g
D g

|
c c (
=
(
c c

(6.71)
where

)
( )
(
j
T
j j g
X
g
X H

| |
|
= +
c
c
(6.72)
and

)
( )
(
j
T
g j g
X
g
H H

| |

= +
c
c
(6.73)
We now describe binary data under the heterogeneous variance model. Let
the expectation of
j
Y be

1
( ) ( )
j j j j
j
E Y X U |
o
(
= ~ +
(
(

(6.74)
The D matrix is defined as

j j
j
T T
D

|
c c (
=
(
c c

(6.75)
where

1 1
( )
j T
j j j
j j
X U X

| |
| o o
( c
= +
(
c
(

(6.76)
and

2
1 1 1
( ) ( )
j
T
j j j j j j
j j j
X U U X U

| | |
o o o
( ( c
= + +
( (
c
( (

(6.77)
Binomial traits
Let 0 / , 1/ , ..., /
j j j j j
Y n n n n = be the phenotype of a binomial trait
(expressed as a ratio or fraction) with
j
n trials. The expectation and
variance of the phenotype are

( ) ( )
j j j j
E Y X Z | = = u +
(6.78)
and

1
var( ) (1 )
j j j j
j
Y V
n
= =
(6.79)
The weight is
67


( )
1
1
j
j j
j j
n
W V

= =

(6.80)
The probability density is

(1 )
( )!
(1 )
( )!(
)
)!
(
j j j j
n Y n Y j
j j
j j j j
j
n
n n
p
n
Y
Y


=

(6.81)
The D matrix for the binomial data is exactly the same as that of the binary
data.
Poisson traits
Let 0, 1, ...,
j
Y = be the phenotype of a Poisson trait. The expectation and
the variance of the phenotype are equivalent, ( ) var( )
j j j
E Y Y = = , where
( )
exp
j j j j
V X Z | = = +
(6.82)
The weight is

( )
1
1
exp
j j
j g
W V
X H |

= =
+
(6.83)
The probability density is
( ) exp( )
( )!
j
Y
j
j j
j
p Y
Y

= (6.84)
Let the expectation of
j
Y given genotype g be
( )
( ) ( | ) exp , 1, 2, 3
j j j g
g E Y g X H g | = = + =
(6.85)
Define

( ) ( )
( )
j j
j
T T
g g
D g

|
c c (
=
(
c c

(6.86)
where

( )
( )
( )
exp
( )
exp
j
T
j g j
j T
j g g
g
X H X
g
X H H

|
|

c
= +
c
c
= +
c
(6.87)
Under the heterogeneous variance model, the expectation of
j
Y is
68


1
( ) exp ( )
j j j j
j
E Y X U |
o
(
= ~ +
(
(

(6.88)
The
D
matrix is

j j
j
T T
D

|
c c (
=
(
c c

(6.89)
where

2
ln( )
j
T
j
j
j j j
T
j
j
j
j
j j
X
U

o
c
=
c
c
= E
c

(6.90)

69

MAPPING QUANTITATIVE TRAIT LOCI UNDER
SEGREGATION DISTORTION
Segregation distortion is a phenomenon that the genotypic frequency array
of a locus does not follow a typical Mendelian ratio. Depending on the
population under investigation, Mendelian ratio of a locus varies from 1:1 for
a backcross to 1:2:1 for an
2
F and to 1:1:1:1 for a four-way cross. For some
reasons, a marker may not follow a typical Mendelian ratio. Such a marker
is called a distorted marker. For a long period of time, the effects of
distorted markers on the result of QTL mapping were not known. For
precaution, people simply discarded all the distorted markers in QTL
mapping. Recently, we found that distorted markers can be safely used for
QTL mapping with no detrimental effect on the result of QTL mapping (XU
2008). This finding can help QTL mappers save tremendous resources by
using all available markers, regardless whether they are Mendelian or not.
We also found that if distorted markers are handled properly, they can be
beneficial to QTL mapping.
Marker segregation distortion is only a phenomenon. The reason behind the
distortion is due to one or more segregation distortion loci (SDL). These loci
are subject to gametic selection (FARIS et al. 1998) or zygotic selection
(KRKKINEN et al. 1996) and their (unobservable) distorted segregation
cause the observed markers to distort. Several investigators (FU and
RITLAND 1994; LORIEUX et al. 1995a; LORIEUX et al. 1995b; LUO and XU 2003;
LUO et al. 2005; VOGL and XU 2000; WANG et al. 2005a; ZHU and ZHANG
2007) have attempted to map these segregation distortion loci using
molecular markers. It is natural to consider mapping QTL and SDL jointly in
the same population. Agricultural scientists are interested in mapping QTL
for economically important traits while evolutionary biologists are interested
in mapping SDL that respond to natural selection. Combining the two
mapping strategies into one is beneficial to both communities. Performing
such a joint mapping strategy is the main objective of this study. Since the
theory of segregation distortion has been introduced and discussed in
previous studies (LORIEUX et al. 1995a; LORIEUX et al. 1995b) and our own
research (XU 2008). This study only presents the EM (expectation-
maximization) implementation of the statistical method. The variance-
covariance matrix of estimated parameters under the EM algorithm is also
derived and presented in this chapter. Details of the method can be found
from our recent publication (XU and HU 2010b).
THE LIKELIHOOD OF MARKERS
Let M and N be the left and right flanking markers bracketing the QTL
(denoted by G for short). The interval of the genome carrying the three loci
70

is labeled by a segment MGN . The three genotypes of the QTL are
denoted by
1 1
G G ,
1 2
G G and
2 2
G G , respectively. Similar notation also applies
to the genotypes of the flanking markers. The interval defined by markers
M and N is divided into two segments. Let
1
r and
2
r be the recombination
fractions for segment MG and segment GN , respectively. The joint
distribution of the marker genotypes conditional on the QTL genotype can
be derived using the Markov chain property under the assumption of no
segregation interference between consecutive loci. Let us order the three
genotypes,
1 1
G G ,
1 2
G G and
2 2
G G as genotypes 1, 2 and 3, respectively. If
individual j takes the k th genotype for the QTL, we denote the event by
, 1, 2,3
j
G k k = = . The joint probability of the two markers conditional on
the genotype of the QTL is

Pr( , | ) Pr( | ) Pr( | )
j j j j j j j
M N G M G N G , k k , k = = = = = = = =
(7.1)
for all , , 1, 2, 3 k , = , where
1
Pr( | ) ( , )
j j
M G k k = = = I and
2
Pr( | ) ( , )
j j
N G , k k , = = = I . We use ( , )
i
k I to denote the k th row and
the th column of the following transition matrix

2 2
2 2
2 2
(1 ) 2 (1 )
(1 ) (1 ) (1 )
2 (1 ) (1 )
i i i i
i i i i i i i
i i i i
r r r r
r r r r r r
r r r r
(
(
I = +
(
(


,
1, 2 i =
(7.2)
For example,

( )
2
1 2 1 2 2
Pr 1, 2| 3 Pr( 1| 3) Pr( 2| 3)
(3,1) (3, 2) 2 (1 )
j j j j j j j
M N G M G N G
r r r
= = = = = = = =
= I I =
(7.3)
Let Pr( ), 1, 2,3 G
k
e k k = = = , be the probability that a randomly sampled
individual from the
2
F family has a genotype k . We use a generic notation
p for probability, so that ( )
j
p G k = represents Pr( )
j
G k = and
( , | )
j j j
p M N G k = stands for Pr( , | )
j j j
M N G k = . The log likelihood function
of the flanking marker genotypes in the
2
F population is

( )
3
1
1
3
1 2
1
1
| ln ( ) ( , | )
ln ( , ) ( , )
n
j j j j
j
n
j j
j
L m p G p M N G
M N
k
k
k
e k k
e k k
=
=
=
=
(
= = =

(
= I I



(7.4)
where
T
1 2 3
[ ] e e e e = is a vector of parameters with constraint
71

3
1
1
k
k
e
=
=

, where m in ( | ) L m e stands for marker information. Note that


without any prior information, ( ) , 1, ,
j
p G j n
k
k e = = = . Under the
assumption of Mendelian segregation, e | = where
T
1 2 3
[ ] [1/ 4 1/ 2 1/ 4] | | | | = = . However, we treat e as unknown
parameters. We postulate that deviation of e from | will cause a marker
linked to locus G to show distorted segregation. This likelihood function is
the one used in mapping viability loci (LUO et al. 2005).
THE LIKELIHOOD OF PHENOTYPES
Let
j
y be the phenotypic value of a quantitative trait measured from
individual j . The probability density of
j
y conditional on the genotype of
individual j is normal with mean

j
X H
k k
| = +
(7.5)
and variance
2
o
, i.e.,

2
2
2
1 1
( | ) exp ( )
2
2
j j j j
p y G y X H
k
k |
o
to
(
= =
(

(7.6)
where H
k
is the k th row of matrix H and

1 0
0 1
1 0
H
(
(
=
(
(

(7.7)
Vector [ ]
T
a d = contains the additive and dominance effects. The design
matrix
j
X and the regression coefficients | capture non-QTL effects, e.g.,
location effects, year effects and so on. The likelihood function of the
phenotypic values in the
2
F population is

( )
{ } 2
3
2
1
1
3
2 2
1
1 2
1
, , , | ln ( ) ( | )
ln( ) log exp ( )
2
n
j j j
j
n
j
j
L y p G p y G
n
y
k
k k
k o
| o e k k
o e
=
=
=
=
(
= = =

(
= +



(7.8)
where letter y in
2
( , , , | ) L y | o e stands for the phenotype. This likelihood
function is the one used in segregation analysis of quantitative traits
(ELSTON and STEWART 1973) because no marker information is required.
72

JOINT LIKELIHOOD OF MARKERS AND PHENOTYPES
Let
T 2 T
[ ]
T T
u | o e = be a vector of all parameters in the joint
analysis. The likelihood function can be obtained by combining equations
(7.4) and (7.8),

3
1 1
3
2 2
1 2
2
1 1
( | , ) ln ( ) ( | ) ( , | )
1
ln exp ( ) ( , ) ( , ) ln( )
2 2
n
j j j j j j
j
n
j j j
j
L m y p G p y G p M N G
n
y M N
k
k k
k
u k k k
e k k o
o
= =
= =
(
= = = =
(

(
= I I
`
(
)


(7.9)
For QTL mapping under segregation distortion, this log likelihood function is
the one that is subject to maximization. The previous two likelihood
functions (for markers and for phenotypes) were presented as background
information to introduce this joint log likelihood function.
EM ALGORITHM FOR THE JOINT ANALYSIS
The MLE (maximum likelihood estimate) of the parameters can be solved
via an EM algorithm (DEMPSTER et al. 1977). We need to rewrite the
likelihood function in a form of complete data. Let us define a delta function
as

if 1
( , )
if 0
j
j
j
G
G
G
k
o k
k
=

=

=

(7.10)
If the genotypes of all individuals are known, i.e., given ( , )
j
G o k for all
1, , j n = and 1, 2, 3 k = , the complete data log likelihood is

1
( , ) log ( | ) ( , | ) ( )
n
j j j j j j
j
L p y G p M N G p G u o
=
( =

(7.11)
where

3
2
2 2
1
1 1
( | ) exp ( , )( )
2 2
j j j j
p y G G y
k
k
o k
o to
=
(
=
(

, (7.12)

( )
3
( , )
1
3
( , )
1 2
1
, | ( , | )
( , ) ( , )
j
j
G
j j j j j j
k
G
j j
k
p M N G p M N G
M N
o k
o k
k
k k
=
=
= =
( = I I

[
[
(7.13)
and

3
( , )
1
( )
j
G
j
p G
o k
k
k
e
=
=
[
(7.14)
73

The delta variables are missing values. Therefore, we need to take
expectation of the likelihood with respect to o . The expected likelihood
function is

( ) ( )
0 1 2
( | ) ( , ) | ( ) ( )
t t
L E L
o
u u u o u u u ( = = + +

(7.15)
Note that
( )
[ ( , ) | ]
t
E l
o
u o u stands for the expectation of ( , ) L u o with respect
to o conditional on parameter at the current state
( ) t
u and the data (the
symbol for data is suppressed for simplicity). The three components of
equation (7.15) are

3
( )
0 1 2
1 1
[ ( , ) | ] log ( , ) log ( , )
n
t
j j j
j
E G M N
o
k
o k u k k
= =
( = I + I

, (7.16)

3
2 ( ) 2
1
2
1 1
1
( ) ln( ) [ ( , ) | ]( )
2 2
n
t
j j
j
n
E G y
o k
k
u o o k u
o
= =
=

(7.17)
and

3
( )
2
1 1
( ) [ ( , ) | ]ln
n
t
j
j
E G
o k
k
u o k u e
= =
=

. (7.18)
The first component
0
is a function of
( ) t
u but not a function of u .
Therefore, it is considered as a constant.
Expectation (E-step): The expectation step of the EM algorithm requires
computing the expectation of o conditional on the data and
( ) t
u . Because
o is a Bernoulli variable, the expectation is simply the probability of 1 o = ,
i.e.,

2
2
( ) ( )
3
1
2
1
1 2
2
3
2
1
1 2
2 1
( , ) | Pr[ ( , ) 1| , , ]
( ) ( | ) ( , | )
( ) ( | ) ( , | )
exp ( ) ( , ) ( , )
exp ( ) ( , ) ( , )
t t
j j
j j j j j j
j j j j j j
j j j
j j j
E G G m y
p G p y G p M N G
p G p y G p M N G
y M N
y M N
o

k k
o

o
o k u o k u
k k k

e k k
e
=
=
( = =

= = =
=
= = =
( I I

=
( I I

(7.19)
Maximization (M-step): The maximization step of the EM algorithm requires
taking the partial derivatives of
( )
( | )
t
L u u with respect to u , setting the
partial derivatives equal to zero, and solving for the parameters.

( )
1 2
( | ) ( ) ( ) 0
t
L u u u u
u u u
c c c
= + =
c c c
(7.20)
The solutions of the parameters are
74


( )
( )
( )
1
3
1 1 1
1
3
1 1
3
2
2
1 1
1
( , )
( , ) ( )
1
( , )
1
( , ) , 1, 2, 3
n n
T
j j j j
j j
n n
T
j j j
j j
n
j j j
j
n
j
j
X X E G y H
E G H H y X
E G y X H
n
E G
n
k
k
k k
k
k
k
k
| o k
o k |
o o k |
e o k k

= = =

= =
= =
=
( (
( =
( (


( (
( =
( (


( =

( = =


(7.21)
HYPOTHESIS TESTING
Hypothesis tests are complicated when QTL segregate in a non-Mendelian
fashion. There are many different hypotheses we can test here. Although
the Wald test can be performed for testing the presence of QTL, such a test
is not justified for testing the null hypothesis of Mendelian segregation.
Therefore, the likelihood ratio tests are more justifiable here. Regardless
what hypothesis is tested, the full model joint log likelihood function given in
equation (7.9) is required. Let us reintroduce this joint log likelihood function
using a different notation so that different likelihood ratio tests are easily
interpreted. The joint likelihood is rewritten as

2
3
2
1 2 2
1 1
( , ) log( )
2
1
ln exp ( ) ( , ) ( , )
2
QS
n
j j j
j
n
L
y M N
k k
k
e o
e k k
o
= =
=
(
+ I I
`
(
)

(7.22)
where indicates QTL effect and e represents non-Mendelian segregation.
The null hypothesis for QTL detection is : 0
QTL
H = while the null
hypothesis for detecting segregation distortion is :
SDL
H e | = .
Testing the presence of QTL:
The null hypothesis is : 0
QTL
H = . The likelihood ratio test statistic is

2 (0, ) ( , )
QTL S QS
L L e e ( =

(7.23)
where
(0, )
S
L e is the log likelihood value under the null model 0 = , which
is

2

(0, ) ( , | ) ( | )
S
L L y L m e | o e = + (7.24)
where
75


2 2 2 2
2
1
1

( , | ) ln( ) ( ) ln( ) 1
2 2 2
n
j j
j
n n
L y y X | o o | o
o
=
( = = +

(7.25)
and

3
1 2
1 1
( | ) ln ( , ) ( , )
n
j j
j
L m M N
k
k
e e k k
= =
(
= I I
(


(7.26)
The estimated parameters in (7.25) and (7.26) are obtained by maximizing
the corresponding likelihood functions.
Testing non-Mendelian segregation:
The null hypothesis is :
SDL
H e | = . The likelihood ratio test statistic is

2 ( , ) ( , )
SDL Q QS
L L | e ( =

(7.27)
where

3
2 2
1 2
2
1 1
1
( , ) ln( ) ln exp ( ) ( , ) ( , )
2 2
n
Q j j j
j
n
L y M N
k k
k
| o | k k
o
= =
(
= + I I
`
(
)

(7.28)
Again, the MLE of the parameters in equation (7.28) are obtained by
maximizing this likelihood function.
Testing both QTL and SDL:
The null hypothesis is
0
: 0 and H e | = = . The likelihood ratio test statistic
is

2 (0, ) ( , )
QS QS
L L | e ( =

(7.29)
where

2

(0, ) ( , | ) ( | ) L L y L m | | o | = + (7.30)
The two components of (7.30) are

2 2 2 2
2
1
1

( , | ) ln( ) ( ) ln( ) 1
2 2 2
n
j j
j
n n
L y y X | o o | o
o
=
( = = +

(7.31)
and

3
1 2
1 1
( | ) ln ( , ) ( , )
n
j j
j
L m M N
k
k
| | k k
= =
(
= I I
(


(7.32)
This hypothesis is rejected if either 0 = or e | = or both. The QTL effect
and the segregation distortion are confounded. This hypothesis may be
useful in the following situation. Suppose that, for some reason, we know
76

for sure that the population from which the sample is drawn is a Mendelian
population. The sample drawn from this population is selected based on
extreme phenotypes (selective genotyping). The sample is then non-
Mendelian regarding the QTL that control the trait subject to phenotypic
selection. Rejecting this hypothesis is equivalent to rejecting the null
hypothesis of QTL. The reason is that segregation distortion in the sample
is solely caused by selective genotyping. Therefore, this joint test can be
used to detect QTL under selective genotyping.
STANDARD ERRORS OF THE ESTIMATED PARAMETERS
Let us define the individual-wide complete-data log likelihood for individual
j as

( ) ( )
3
2 2
2
1
3
1 2
1
3
1
1 1
( , ) ln( ) ( , )( )
2 2
( , ) ln ( , ) ln ( , )
( , ) ln
j j j j
j j j
j
L G y X H
G T M T N
G
k
k
k
k
k
u o o o k |
o
o k k k
o k e
=
=
=
(
=
(

(
+ +

+

(7.33)
where
3 1 2
1 e e e = so that
3
e is excluded from the parameter vector. To
derive the variance covariance matrix of the estimated parameters, we need
to define the score vector ( , )
j
S u o and the Hessian matrix ( , )
j
H u o for the
individual-wide complete-data log likelihood function. The Louis (1982)
information matrix of the parameters under the EM algorithm is then

1 1

( ) ( , ) var ( , )
n n
j j
j j
I E H S u u o u o
= =
( (
=


(7.34)
where the expectation and variance are taken with respect to the missing
values o . Once the information matrix is defined, the variance matrix of the
estimated parameters simply takes

1
va (

) ) r ( I u u

~ (7.35)
The standard error of each parameter simply takes the square root of each
diagonal element of the above matrix.
We now present the score vector and the Hessian matrix. The score vector
is denoted by
( , )
( , )
j
j
L
S
u o
u o
u
c
=
c
, which consists of five blocks, as shown
below
77


3
2
1
3
2
1
3
2 2
2 2 4
1
1
1 1 1
( , )
1
( , ) ( , ) ( )
( , )
1
( , ) ( , ) ( )
( , )
1 1
( , ) ( , )( )
2 2
( , )
1 1
( , ) ( ,1) ( , 3)
1
j T
j j j j j
k
j
T
j j j j
j
j j j j
j
j j j
L
S G X y X H
L
S G H y X H
L
S G y X H
L
S G G
k
k k
k
k
k
u o
| o o k |
| o
u o
o o k |
o
u o
o o o k |
o o o
u o
e o o o
e e e e
=
=
=
c
= =
c
c
= =
c
c
= = +
c
c
= =
c

2
2
2 2 1 2
( , )
1 1
( , ) ( , 2) ( , 3)
1
j
j j j
L
S G G
u o
e o o o
e e e e
c
= =
c
(7.36)
Concatenating the above five scores vertically, we can get the score vector,

2
1
2
( , )
( , )
( , ) ( , )
( , )
( , )
j
j
j j
j
j
S
S
S S
S
S
| o
o
u o o o
e o
e o
(
(
(
(
=
(
(
(

(7.37)
The Hessian matrix is denoted by
2
( , )
( , )
j
j
T
L
H
u o
u o
u u
c
=
c c
, which is block
diagonal with non-zero blocks given below,
78


2
2
2
3
2
1
2
3
2
2 4
1
2
3
2
1
2
2
2 4
( , )
1
( )
( , )
1
( ) ( , )
( , )
1
( ) ( , ) ( )
( , )
1
( ) ( , )
( , )
1
( ) (
j T
j j j T
j T
j j j T
j T
j j j j j
j T
j j T
j
j
L
H X X
L
H G X H
L
H G X y X H
L
H G H H
L
H
k
k
k
k
k k
k
u o
||
| | o
u o
| o k
| o
u o
|o o k |
| o o
u o
o k
o
u o
o o
o o
=
=
=
c
= =
c c
c
= =
c c
c
= =
c c
c
= =
c c
c
= =
c c

3
1
2
3
2 2 2
2 2 4 6
1
2
1 1 2 2
1 1 1 3
2
1 2 2
1 2 3
2
2 2
2 2
, ) ( )
( , )
1 1
( ) ( , )( )
2
( , )
1 1
( ( ,1) ( , 3)
( , )
1
( ( , 3)
( , )
( (
)
)
, ) 2
T
j j j
j
j j j j
j
j j j
j
j j
j
j j
G H y X H
L
H G y X H
L
H G G
L
H G
L
H G
k k
k
k
k
k |
u o
o o o k |
o o o o
u o
ee o o
e e e e
u o
ee o
e e e
u o
e e o
e e
=
=
=

c
= =
c c
c
= +
c c
c
=
c c
c
=
c
=
c
=

2 2
2 3
1 1
) ( , 3)
j
G o
e e
+
(7.38)
The Hessian matrix is obtained through

2
2
2 2 2 2
1 1 1 2
1 2 2 2
( ) ( ) ( ) 0 0
( ) ( ) ( ) 0 0
( , ) ( ) ( ) ( ) 0 0
0 0 0 ( ) ( )
0 0 0 ( ) ( )
j j j
T
j j j
T T
j j j j
j j
T
j j
H H H
H H H
H H H H
H H
H H
|| | |o
| o
u o |o o o o
e e e e
e e e e
(
(
(
(
=
(
(
(

(7.39)
The expectation of the Hessian matrix ( , )
j
E H u o (

and the variance matrix
of the score vector var ( , )
j
S u o (

can be expressed explicitly because both
the Hessian matrix and the score vector are linear functions of the missing
value
( ,1) ( , 2) ( , 3)
T
j j j j
G G G o o o o ( =

(7.40)
Therefore, ( , )
j
E H u o (

and var ( , )
j
S u o (

can eventually be expressed as
functions of the expectation and variance of
j
o , which have simple
expressions because
j
o is a multinomial variable. The Hessian matrix
79

( , )
j
H u o is already expressed in linear function of
j
o and thus the
expectation can be obtained straightforwardly by replacing
j
o by ( )
j
E o . An
explicit linearity for the score function is not obvious. The following gives the
linear relationship using matrix notation. Let us define the following matrices,

1 1 1
2 2
2 2 2
2 2
3 3 3
2 2
2
1
4
2 2
2
4
1 1
( ) ( )
1 1
( ) ( ) ; ( ) ( )
1 1
( ) ( )
1
( )
2
1
( ) ( )
2
1
2
T T
j j j j j
T T
j j j j j j j
T T
j j j j j
j j
j j j
X y X H H y X H
C X y X H C H y X H
X y X H H y X H
y X H
C y X H
| |
o o
| | |
o o
| |
o o
|
o
o |
o
( (

( (
( (
( (
= =
( (
( (
( (

( (


1
1 2
2
3
2
4
3
3
1
0
1
; ( ) 0 ; ( )
1
1
( )
j j
j j
C C
y X H
e
e e
e
|
e
o
e
(
(
(
(
(
(
(
(
(
(
(
= = (
(
(
(
(
(
(
(
(

(
(

(


(7.41)
The score vector in matrix notation is

2
2
1
2
0
( )
0
( )
( , )
1
( , ) ( )
2
( )
0
( )
0
T
j
T
j
j
T
j j j j j
T
j
T
j
C
C
L
S C D C
C
C
|

u o
u o o o o
u o
e
e
(
(
(
(
(
(
c
(
(
= = + = +
(
c (
(
(
(
(
(


(7.42)
As a result,

var ( , ) var( )
T
j j j j
S C C u o o ( =

(7.43)
where

var( ) diag ( ) ( ) ( )
T
j j j j
E E E o o o o ( =

(7.44)
is the variance-covariance matrix of the multinomial variable
j
o .
80

INTERVAL MAPPING FOR MULTIPLE TRAITS
Multiple traits are measured virtually in all line crossing experiments of QTL
mapping. Yet, almost all data collected for multiple traits are analyzed
separately for different traits. Joint analysis for multiple traits has shed new
light in QTL mapping by improving the statistical power of QTL detection
and increasing the accuracy of QTL localization when different traits
segregating in the mapping population are genetically related. Joint analysis
for multiple traits is defined as a method that includes all traits
simultaneously in a single model, rather than analyzing one trait at a time
and reporting the results in a format that appears to be multiple trait
analysis. In addition to the increased power and resolution of QTL detection,
joint mapping can provide insights into fundamental genetic mechanisms
underlying trait relationships such as pleiotropy versus close linkage and
genotype by environment (GE) interaction, which would otherwise be
difficult to address if traits are analyzed separately. The current version of
PROC QTL can perform interval mapping for multiple continuously
distributed traits. If one or more traits are ordinal in the multiple trait set, you
must use the Bayesian method to perform the multivariate analysis, which
will be introduced later in the manual. Details of the multivariate QTL
mapping can be found from Xu et al. (2005a).
MULTIVARIATE MODEL
Let n be the sample size of a mapping population, say
2
F cross, and m be
the number of traits. Let
1
[ ]
T
j j jm
y y y = be an 1 m column vector for
the phenotypic values of m traits measured from individual j for 1,..., j n = .
The linear model for
j
y can be described as

j j j j
y X Z | c = + +
(8.1)
where | is an m p matrix for effects not related to QTL,
j
X is a 1 p
design matrix connecting the non-QTL effects with the phenotypic value,
is an 2 m matrix of QTL effects,
j
Z is a 2 1 vector determined by the
genotype of individual j for the QTL and
j
c is an 1 m vector for the
residual errors. The QTL effect matrix is defined as

1 1
2 2
m m
a d
a d
a d

(
(
(
=
(
(

(8.2)
81

where
i
a and
i
d are the additive and dominance effects for the i th trait for
1,..., i m = . The
1 2
[ ]
T
j j j
Z Z Z = vector is defined as

1 1 1 1
1 1 2 2 1 2
2 2 2 2
1 for 0 for
0 for and 1 for
1for 0 for
j j
A A A A
Z A A Z A A
A A A A


= =


(8.3)
In matrix notation,
j
Z can take one of the three columns of matrix H ,

1 0 1
0 1 0
H
(
=
(

(8.4)
In other words,
j
Z is defined as

1 1 1
2 1 2
3 2 2
for
for
for
j
H A A
Z H A A
H A A

(8.5)
where
k
H is the k th column of matrix H . The residual error
j
c is assumed
to be multivariate normal,

~ (0, )
j
N c E
(8.6)
where E is an m m positive definite covariance matrix.
LEAST SQUARE METHOD
There are two methods that users can selection for multivariate QTL
mapping. The first method is the so called least squares method, which is
the multivariate version (KNOTT and HALEY 2000) of the Haley-Knott method
(1992). Under the LS method, the model is

j j j j
y X U | c = + +
(8.7)
where

3
1
( 2)
j j k
k
U p k H
=
=

(8.8)
is the expectation of
j
Z conditional on the marker information. The LS
estimates of the parameters are obtained as follows. Let [ || ] u | = be the
horizontal concatenation of matrices | and , i.e., u is a ( 2) m p + matrix
combining | and horizontally (put side-by-side). Let [ / / ]
j j j
W X U = be
the vertical concatenation of vectors
j
X and
j
U . The linear model given in
82

equation (8.7) is rewritten as

j j j
y W u c = +
(8.9)
This model provides an easy expression for the LS estimation of the
parameters,

1
1 1

n n
T T
j j j j
j j
y W W W u

= =
( (
=
( (


(8.10)
and

1
1

( )( )
n
T
j j j j
j
y W y W
n
u u
=
E =

(8.11)
Since the variance-covariance matrix of the estimated parameters

u
requires a complicated rearrangement of the elements within

u , we did not
provide the Wald test statistics for the QTL effects; rather, we only gave the
likelihood ratio test statistics, which is defined as

0 1
2( ) L L = (8.12)
where

1
1
1
1

ln | | ( ) ( )
2 2
n
T
j j j j
j
n
L y W y W u u

=
= E E

(8.13)
and

1
0
1
1
ln | | ( ) ( )
2 2
n
T
j j j j
j
n
L y X y X | |

=
= E E

(8.14)
The estimated parameters under the reduced model are

1
1 1
n n
T T
j j j j
j j
y X X X |

= =
( (
=
( (


(8.15)
and

1
1
( )( )
n
T
j j j j
j
y X y X
n
| |
=
E =

(8.16)
MAXIMUM LIKELIHOOD METHOD
Let us denote the probability density for
j
y given the genotype of the QTL
by
83


1
1/2
1 1
( | ) exp ( ) ( )
2 | |
T
j j j k j j k
f y k y X H y X H | |

(
= E
(
E

(8.17)
The log likelihood function of the parameters is

1
( , ) ln ( 2) ( | )
n
j j
j
L p k f y k u
=
( E =

(8.18)
The MLE of the parameters are obtained via the EM algorithm described
below. In the maximization step, the parameters are updated using the
following equations,

( )
1
1 1
( )
n n
T T
j j j j j
j j
y E Z X X X |

= =
( (
=
( (


(8.19)

( )
1
1 1
( ) ( )
n n
T T
j j j j j
j j
y X E Z E Z Z |

= =
( (
=
( (


(8.20)

1
1
( )( )
n
T
j j j j j j
j
E y X Z y X Z
n
| |
=
( E =

(8.21)
In the expectation step, we calculated the posterior probabilities of QTL
genotypes and the posterior expectations involved in the above three
equations. The posterior probability of QTL genotype is

*
3
' 1
( 2) ( | )
( 2)
( ' 2) ( | ')
j j
j
j j
k
p k f y k
p k
p k f y k
=

(8.22)
The posterior expectations are

3
*
1
( ) ( 2)
j j k
k
E Z p k H
=
=

(8.23)

3
*
1
( ) ( 2)
T T
j j j k k
k
E Z Z p k H H
=
=

(8.24)
and

3
*
1
( )( )
( 2)( )( )
T
j j j j j j
T
j j j k j j k
k
E y X Z y X Z
p k y X H y X H
| |
| |
=
(

=

(8.25)
HYPOTHESIS TESTING
PROC QTL only reports the likelihood ratio test statistics for multivariate
mapping. We have described the likelihood ratio test statistics when the LS
84

method is used. The log likelihood value under the null model,
0
L , is
calculated using exactly the same formula as used in the LS method. The
likelihood value under the full model for the maximum likelihood method is

1
1

( , ) ln ( 2) ( | )
n
j j
j
L L p k f y k u
=
( = E =

(8.26)
where

1
1/ 2
1 1

( | ) exp ( ) ( )

2 | |
T
j j j k j j k
f y k y X H y X H | |

(
= E
(
E
(8.27)
The likelihood ratio test statistics is then given by

0 1
2( ) L L = (8.28)
The QTL procedure only provides the likelihood ratio test for overall QTL
effect and there is no separate test for additive or dominance effect.
85

BAYESIAN SHRINKAGE METHOD FOR QTL
MAPPING
Methods of interval mapping include the least square method (HALEY and
KNOTT 1992), the weighted least square method (XU 1998a, b), the
maximum likelihood method (LANDER and BOTSTEIN 1989) and the Fisher
scoring method (HAN and XU 2008). All these methods were originally
developed based on the single QTL model. Although interval mapping
(under the single QTL model) can detect multiple QTL by evaluating the
number of peaks in the test statistic profile, it cannot provide accurate
estimates of QTL effects. The best way to handle multiple QTL is to use a
multiple QTL model. Such a model requires knowledge of the number of
QTL. Most QTL mappers consider that the number of QTL is an important
parameter and should be estimated in QTL analysis. Therefore, model
selection is often conducted to determine the number of QTL (YI et al. 2003).
Under the Bayesian framework, model selection is implemented through the
reversible jump MCMC algorithm (YI and XU 2002). We (WANG et al. 2005b;
XU 2003), however, believed that the number of QTL is not an important
parameter. As a result, we proposed a model that includes as many QTL as
the model can handle. Such a model is called an over saturated model
(WANG et al. 2005b). Some of the proposed QTL may be real, but most of
them are spurious. As long as we can force the spurious QTL to have zero
or close to zero estimated effects, the over saturated model is considered
satisfactory. The selective shrinkage Bayesian method can generate the
result of QTL mapping exactly the same as we expected, that is spurious
QTL effects are shrunken to zero whereas true QTL have effects subject to
no shrinkage. In this chapter, we describe the Bayesian method for multiple
QTL mapping but only for a single trait. Multiple trait Bayesian method will
be given in the next chapter.
MULTIPLE QTL MODEL
The multiple QTL model can be described as

1 1
p q
j ji i jk k j
i k
y X Z |
= =
= + +

(9.1)
where
j
y is the phenotypic value of a trait for individual j for 1, , j n = .
and n is the sample size. The non-QTL effects are included in vector
{ }
1
, ,
p
| | | = . with matrix
{ }
1
, ,
j j jp
X X X = . being the design vector to
connect | to
j
y . The effect of the k th QTL is denoted by
k
for 1, , k q =
where q is the proposed number of QTL in the model. Vector
86

{ }
1
, ,
j j jq
Z Z Z = . are determined by the genotypes of the proposed QTL in
the model. The residual error is assumed to be i.i.d
2
(0, ) N o . Let us use a
BC population as an example. For the k th QTL, 1
jk
Z = for one genotype
and 1
jk
Z = for the other genotype. Extension to
2
F population and adding
the dominance effects are straightforward (only requires adding more QTL
effects and increasing the model dimension). The proposed number of QTL
is q , which must be larger than the true number of QTL to make sure that
no QTL will be missed. The optimal strategy is to put one QTL in every d
cM of the genome, where d can be any value between 5 to 50. If 5 d < , the
model may be ill-conditioned due to multicollinearity. If 50 d > , some
genome areas may not be visited by the proposed QTL even if there are
true QTL located in those areas. Of course, a larger sample size is required
to handle a larger model (more QTL).
PRIOR, LIKELIHOOD AND POSTERIOR
The data involved in QTL mapping include the phenotypic values of the trait
and marker genotypes for all individuals in the mapping population. Unlike
Wang et al. (2005b) who expressed marker genotypes explicitly as data in
the likelihood, here we suppress the marker genotypes from the data to
simplify the notation. The linkage map of markers and the marker
genotypes only affect the way to calculate QTL genotypes. We first use the
multipoint method to calculate the genotype probabilities for all putative loci
of the genome. These probabilities are then treated as the prior probabilities
of QTL genotypes, from which the posterior probabilities are calculated by
incorporating the phenotype and the current parameter values. Therefore,
the data used to construct the likelihood are represented by
1
{ , , }
n
y y y = . .
The vector of parameters is denoted by u , which consists of the positions of
the proposed QTL denoted by
1
{ , , }
q
= . , the effects of the QTL
denoted by
1
{ , , }
q
= . , the non-QTL effects denoted by
1
{ , , }
p
| | | = .
and the residual error variance
2
o . Therefore,
2
{ , , , , } u | o = , where
2 2
1
{ , , }
q
o o = . will be defined later. The QTL genotypes
1
{ , , }
j j jq
Z Z Z = .
are not parameters but missing values. The missing genotypes can be
redundantly expressed as
1
{ , , }
j j jq
o o o = where

( , )
jk jk
G g o o =

is the o function. If
jk
G g = , then ( , ) 1
jk
G g o = , else ( , ) 0
jk
G g o = , where
jk
G is the genotype of the k th QTL for individual j and 1, 2 g = for a BC
population (two possible genotypes per locus). The probability density of o
87

is

1
( | ) ( | )
q
j jk k
k
p p o o
=
=
[
(9.2)
The independence of the QTL genotype across loci is due to the fact that
they are the conditional probabilities given marker information. So, the
marker information has entered here to infer the QTL genotypes. The prior
for the | is

1
( ) ( ) constant
p
i
i
p p | |
=
= =
[
(9.3)
This is a uniform prior or, more appropriately, uninformative prior. The
reason for choosing uninformative prior for | is that the dimensionality of
| is usually very low so that | can be precisely estimated from the data
alone without resorting to any prior knowledge. The prior for the QTL effects
is

2 2
1 1
( | ) ( | ) ( | 0, )
q q
k k k k
k k
p p N o o
= =
= =
[ [
(9.4)
where
2
k
o is the variance of the prior distribution for the k th QTL effect.
Collectively, these variances are denoted by
2 2
1
{ , , }
q
o o = . . This is a
highly informative prior because of the zero expectation of the prior
distribution. The variance of the prior distribution determines the relative
weights of the prior information and the data. If
2
k
o is very small, the prior
will dominate the data and thus the estimated
k
will be shrunken towards
the prior expectation, that is zero. If
2
k
o is large, the data will dominate the
prior so that the estimated
k
will be largely unaltered (subject to no
shrinkage). The key difference between this prior and the prior commonly
used in Bayesian regression analysis is that different regression coefficient
has a different prior variance and thus different level of shrinkage. Therefore,
this method is also called the selective shrinkage method. The classical
Bayesian regression method, however often uses a common prior for all
regression coefficients, i.e.,
2 2 2
1 2 q
o o o = = = . The problem with this
selective shrinkage method is that there are too many prior variances and it
is hard to choose the appropriate values for the variance. There are two
approaches to choosing the prior variances, empirical Bayes (XU 2007b)
and hierarchical modeling (GELMAN 2006). The empirical Bayes estimates
the prior variances under the mixed model methodology by treating each
regression coefficient as a random effect. The hierarchical modeling treats
the prior variances as parameters and assigns a higher level prior to each
variance component. By treating the variances as parameters, rather than
88

as hyper-parameters, we can estimated the variances along with the
regression coefficients. Here, we take the hierarchical model approach and
assign each
2
k
o a prior distribution. The scaled inverse chi-square
distribution is chosen for each variance component,

2 2 2
( ) Inv ( | , ), 1, ,
k k
p k q o _ o t e = = . (9.5)
The degree of freedom t and the scale parameter e are hyper-parameters
and their influence on the estimated regression coefficients is much weak
because the influence is through the
2
k
o . It is now easy to choose t and e.
The degree of freedom t is also called the prior belief. Although the proper
prior should have 0 t > and 0 e > , our past experience showed that an
improper prior works better than the proper prior. Therefore, we choose
0 t e = = as the default value, which leads to

2
2
1
( ) , 1, ,
k
k
p k q o
o
= .
(9.6)
Users do have an option to choose a different set of hyper parameters. The
joint prior for all the
2
k
o is

2
1
( ) ( )
q
k
k
p p o
=
=
[
(9.7)
The residual error variance is also assigned to the improper prior,

2
2
1
( ) p o
o
(9.8)
The positions of the QTL depend on the number of QTL proposed, the
number of chromosomes and the size of each chromosome. Based on the
average coverage per QTL (e.g., 30 cM per QTL), the number of QTL
allocated to each chromosome can be easily calculated. Let
c
q be the
number of QTL proposed for chromosome c. These
c
q QTL should be
placed evenly along the chromosome. We can let the positions fixed
throughout all the MCMC process so that the positions are simply constants
(not parameters of interest). In this case, more QTL should be proposed to
make sure that the genome is well covered by QTL. The alternative and
also more efficient approach is to allow QTL positions to move along the
genome during the MCMC process. There is a restriction for the moving
range of each QTL. The positions are disjoint along the chromosome. The
first QTL must move between the first marker and the second QTL. The last
QTL must move between the last marker and the second last QTL. All other
QTL must move between the QTL in the left and the QTL in the right of the
current QTL, i.e., the QTL that flank the current QTL. Based on this search
strategy, the joint prior probability of QTL positions is
89


1 2 1 1
( ) ( ) ( | ) ( | )
c c
q q
p p p p

= .
(9.9)
Given the positions of all other QTL, the conditional probability of the
position of QTL k is

1 1
1
( )
k
k k
p

+
=

(9.10)
If QTL k is located at either end of a chromosome, the above prior needs to
be modified by replacing either
1 k


or
1 k

+
by the position of the nearest
end marker. We now have a situation where the prior probability of one
variable depends on values of other variables. This type of prior is called
adaptive prior.
Since marker information has been used to calculate the prior probabilities
of QTL genotypes, they are no longer expressed as data. The only data
appearing explicitly in the model are the phenotypic values of the trait.
Conditional on all parameters and the missing values, the probability
density of
j
y is normal. Therefore, the joint probability density of all the
j
y 's
(called the likelihood) is

2
1 1
1 1
( | ) ( )
n n
p q
j j j ji i jk k
i k
j j
p y p y N y X Z u o u o | o
| |
|
|
= =
\ .
= =
, = , , = , + ,
[ [ (9.11)
The fully conditional posterior of each variable is defined as
( | , , ) ( , , , )
i i i i
p y p y u u o u u o

(9.12)
where
i
u is a single element of the parameter vector u and
i
u

is the
collection of the remaining elements. The symbol means that there is an
ignored constant irrelevant to parameter
i
u . The joint probability density
( , , , ) ( , , )
i i
p y p y u u o u o

= is expressed as

2
( , , ) ( | , ) ( | ) ( ) ( | , ) ( | ) ( ) ( | ) ( ) ( ) p y p y p p p y p p p p p u o u o o u u u o | o o = (9.13)
The fully conditional posterior probability density for each variable is simply
derived by treating all other variables as constants and comparing the
kernel of the density with a standard distribution. After some algebraic
manipulation, we obtain the fully conditional distribution for most of the
unknown variables (including parameters and missing values).
The fully conditional poster for the non-QTL effect is

2
( | ) ( | , )
i
i i i
p N
|
| | | o . =
(9.14)
The special notation ( | )
i
p | is used to express the fully conditional
probability density. The three dots after the symbol | mean everything
90

else except the variable of interest. The posterior mean and posterior
variance are calculated using

( )
1
2
'
' 1
1
'
1
n n
p q
i ji ji j j i i jk k
i i k
j j
X X y X Z | |

= =
= =
| |
=
|
\ .

(9.15)
and

1
2 2 2
1
i
n
ji
j
X
|
o o

=
| |
=
|
\ .

(9.16)
The fully conditional posterior for the k th QTL effect is

2
( | ) ( | , )
k
k k k
p N

o . = (9.17)
where

1
2
2
' ' 2
1 1 1 '
p q n n
k jk jk j ji i jk k
j j i k k
k
Z Z y X Z
o
|
o

= = = =
| | | |
= +
| |
\ . \ .

(9.18)
and

1
2
2 2 2
2
1
k
n
jk
j
k
Z

o
o o
o

=
| |
= +
|
\ .

(9.19)
Comparing the conditional posterior distributions of
i
| with
k
, we notice
the difference between a normal prior and a uniform prior with respect to the
effects on the posterior distributions. When a normal prior is used, a
shrinkage factor,
2
2
k
o
o
, is added to
2
1
n
jk
j
Z
=

. If
2
k
o is very large, the shrinkage
factor disappears, meaning no shrinkage. On the other hand, if
2
k
o is small,
the shrinkage factor will dominate over
2
1
n
jk
j
Z
=

and, in the end, the


denominator will become infinitely large, leading to zero expectation and
zero variance for the conditional posterior distribution
k
. As such, the
estimated
k
is completely shrunken to zero. The conditional posterior
distribution for each of the variance component
2
k
o is scaled inverse chi-
square with probability density,
( ) ( )
2 2 2 2 2 2 2
( | ) Inv 1, Inv | 1,
k k k k k
p o _ o t e _ o . = + + =
(9.20)
This conditional posterior is proper, regardless whether 0 t e = = or not,
and thus Gibbs sampling can be performed. The conditional posterior
density for the residual error variance is
91

( ) ( )
2 2 2 2 2
( | ) Inv , Inv | , p n SS n SS o _ o t e _ o . = + + =
(9.21)
where

2
1 1 1
p q n
j ji i jk k
j i k
SS y X Z |
= = =
| |
=
|
\ .

(9.22)
The next step is to sample the QTL genotypes, which determine the value
of Z
j
. Let us again use a BC population as an example and consider
sampling the k th QTL genotype given that every other variable is known.
There are two sources of information available to infer the probability for
each of the two genotypes of the QTL. One information comes from the
markers denoted by ( 1)
j
p + and ( 1)
j
p , respectively, for the two genotypes,
where ( 1) ( 1) 1
j j
p p + + = . These two probabilities are calculated from the
multipoint method (JIANG and ZENG 1997). The other source of information
comes from the phenotypic value. The connection between the phenotypic
value and the QTL genotype is through the probability density of
j
y given
the QTL genotype. For the two alternative genotypes of the QTL , i.e.,
1
jk
Z = and = 1
jk
Z , the two probability densities are

( )
( )
2
' '
1 '
2
' '
1 '
( | 1, ) ,
( | 1, ) ,
p q
j jk j ji i jk k k
i k k
p q
j jk j ji i jk k k
i k k
p y Z N y X Z
p y Z N y X Z
| o
| o
= =
= =
= + . = + +
= . = +


(9.23)
Therefore, the conditional posterior probabilities for the two genotypes of
the QTL are

*
*
( 1) ( | 1, )
( 1)
( 1) ( | 1, ) ( 1) ( | 1, )
( 1) ( | 1, )
( 1)
( 1) ( | 1, ) ( 1) ( | 1, )
j j jk
j
j j jk j j jk
j j jk
j
j j jk j j jk
p p y Z
p
p p y Z p p y Z
p p y Z
p
p p y Z p p y Z
+ = + .
+ =
+ = + . + = .
= .
=
+ = + . + = .
(9.24)
where
*
( 1) ( 1| )
j jq
p p Z + = = + . and
*
( 1) ( 1| )
j jq
p p Z = = . . The genotype of
the QTL is 2 1
jq
Z u = , where u is sampled from a Bernoulli distribution with
probability
*
( 1)
j
p + .
FIXED INTERVAL
So far we have completed the sampling process for all variables except the
QTL positions. If we place a large number of QTL evenly along the genome,
say one QTL in every 20 cM, we can let the positions fixed (do not move)
across the entire MCMC process. Although this fixed-position approach
92

does not generate accurate result, it does provide some general information
about the ranges where the QTL are located. Suppose that the trait of
interest is controlled by only 5 QTL and we place 100 QTL evenly on the
genome, then majority of the assumed QTL are spurious. The Bayesian
shrinkage method allows effects of the spurious QTL to be shrunken to zero.
This is why the Bayesian shrinkage method does not need variable
selection. A QTL with estimated effect close to zero is equivalent to being
excluded from the model. When the assumed QTL positions are fixed,
investigators actually prefer to put the QTL at marker positions because the
marker positions contain the maximum information. This multiple marker
analysis is recommended before conducting detailed fully Bayesian analysis
with QTL positions moving. Result of the detailed analysis will be more or
less the same as that of the multiple marker analysis. Further detailed
analysis is only conducted after the investigators get a general picture of the
result.
RANDOM WALK
We now discuss several different ways to allow QTL positions to move
across the genome. If our purpose of QTL mapping is to find the regions of
the genome that most likely carry QTL, the number of QTL is irrelevant and
so are the QTL identities. If we allow QTL positions to move, the most
important information we want to capture is how many times a particular
segment (position) of the genome is hit or visited by non-spurious QTL. A
position can be visited many times by different QTL, but all these QTL have
negligible effects, such a position is not of interest. We are interested in
positions that are visited repeatedly by large QTL. Keeping this in mind, we
propose the first strategy of QTL moving, the random walking strategy. We
start with a "sufficient" number of QTL evenly placed on the genome. How
sufficient is sufficient enough? This perhaps depends on the marker density
and sample size of the mapping population. Putting one QTL in every 20 cM
seems to work well. Each QTL is allowed to travel freely between the left
and the right QTL. In other words, the QTL are distributed along the
genome in a disjoint manner. The positions of the QTL are moving but the
order of the QTL is reserved. This is the simplest method of QTL traveling.
Let us take the k th QTL for example, the current position of the QTL is
denoted by
k
. The new position can be sampled from the following
distribution,

*
k k
= A (9.25)
where ~ (0, ) U o A and o is the maximum distance (in cM) that the QTL is
allowed to move away from the current position at each step. The following
restriction
*
1 1 k k k

+
< < is enforced to make sure that the order of the QTL
does not change. Empirically, 2 A = cM seems to work well. The new
93

position is always accepted, regardless whether it is more likely or less
likely to carry a true QTL relative to the current position. The Markov chain
should be sufficiently long to make sure all putative positions are visited a
number of times. Theoretically, there is no need to enforce the disjoint
distribution for the QTL positions. The only reason for such a restriction is
the convenience of programming if the order is reserved. With the random
walk strategy of QTL moving, the frequency of hits by QTL at a position is
not of interest; instead, the average effect of all the QTL hitting that position
is the important information. The random walk approach does not
distinguish "hot regions" (regions containing QTL) and "cold regions"
(regions without QTL) of the genome. All regions are visited with equal
frequency. The hot regions, however, are supposed to be visited more often
than the cold regions to get a more accurate estimate of the average QTL
effects for those regions. The random walk approach does not discriminate
against the cold regions and thus needs a very long Markov chain to make
sure that the hot regions are sufficiently visited for accurate estimation of
the QTL effects.
MOVING INTERVAL
The optimal strategy for QTL moving is to allow QTL to visit the hot regions
more often than the cold regions. This sampling strategy cannot be
accomplished using the Gibbs sampler (GEMAN and GEMAN 1984) because
the conditional posterior of the position of a QTL does not have a well
known form of the distribution. Therefore, the Metropolis-Hastings algorithm
(HASTINGS 1970; METROPOLIS et al. 1953) is adopted here to sample the
QTL positions. Again, the new position is randomly generated in the
neighborhood of the old position using the same approach as used in the
random walk approach, but the new position
*
k
is only accepted with a
certain probability. The acceptance probability is determine based on the
Metropolis-Hastings rule, denoted by
*
min[1, ( , )]
k k
o . The new position
*
k

has an
*
1 min[1, ( , )]
k k
o chance to be rejected, where

1
*
* *
1
* 1
1 *
1 1
Pr( | ) ( | , )
( , ) ( | )
( , )
( , ) ( | )
Pr( | ) ( | , )
n
jk k j jk
l j
k k k
k k
n
k k k
jk k j jk
l j
Z l p y Z l
p q
p q
Z l p y Z l

u
o
u

+
= =
=
+
=
(
= =
(

=
(
= =
(

[
[
(9.26)
If the new position is rejected, the QTL remains at the current position, i.e.,
*
k k
= . If the new position is accepted, the old position is replaced by the
new position, i.e.,
*
k k
= A . Whether the new position is accepted or not,
all other variables are updated based on the information from position
*
k

where Pr( 1| )
jk k
Z = and Pr( 1| )
jk k
Z = + are the conditional probabilities
94

that 1
jk
Z = and 1
jk
Z = + , respectively, calculated from the multipoint
method. These probabilities depend on position
k
. Previously, these
probabilities were denoted by ( 1) Pr( 1| )
j jk k
p Z = = and
( 1) Pr( 1| )
j jk k
p Z + = = + , respectively. For the new position
*
k
, these
probabilities are
*
Pr( 1| )
jk k
Z = and
*
Pr( 1| )
jk k
Z = + , respectively. The prior
distribution for the parameters u are denoted by
*
( , )
k
p u and ( , )
k
p u ,
respectively. These two probability densities are usually cancelled out each
other. The proposal probabilities
*
( | )
k k
q and
*
( | )
k k
q are normally equal
to 1/ (2 ) A and thus are also canceled out each other. However, once
k

and
*
k
are near the boundaries, these two probabilities may not be the
same. Since the new position is always restricted to the interval where the
old position occurs, the proposal density ( | )
k k
q
-
and its reverse partner
( | )
k k
q
-
may be different. Let us denote the positions of the left and right
QTL by
1 k


and
1 k

+
, respectively. If
k
is close to the left QTL so that
1 k k

< A , then the new position must be sampled from


1
~ ( , )
k k k k
U
-

+A to make sure that the new position is within the


required sample space. Similarly, if
k
is close to the right QTL so that
1 k k

+
< A , then the new position must be sampled from
1
~ ( , )
k k k
U
-
+
A . In either case, the proposal density should be modified.
The general formula of the proposal density after incorporating the
modification is

1
1
1
1
1
if
( )
1
( | ) if
( )
1
otherwise
2
k k
k k
k k k k
k k
q



-
+
+

< A

A +

= < A

A +

(9.27)
The assumption of using the above proposal density is that the distance
between any two QTL must be larger than A . The reverse partner of this
proposal density is
95


*
1 *
1
* *
1 *
1
1
if
( )
1
( | ) if
( )
1
otherwise
2
k k
k k
k k k k
k k
q



+
+

< A

A +

= < A

A +

(9.28)
The difference between sampling
k
and sampling other variables are (1)
the proposed new position may or may not be accepted while the new
values of all other variables are always accepted and (2) when calculating
the acceptance probability for a new position, the likelihood does not
depend on the QTL genotype while the conditional posterior probabilities of
all other variables depend on the sampled QTL genotypes.
SUMMARY OF THE MCMC SAMPLING PROCESS
The MCMC sampling process is summarized as follows.
1. Choose the number of QTL to be placed in the model, q
2. Initialize parameters and missing values,
(0)
u u = and
(0)
jk jk
Z Z =
3. Sample
i
| from
2
( | , )
i
i i
N
|
| | o
4. Sample
k
from
2
( | , )
k
k k
N

o
5. Sample
2
k
o from
2 2 2
Inv ( | 1, )
k k
_ o
6. Sample
2
o from
2 2
Inv ( | , ) n SS _ o
7. Sample
jk
Z from its conditional posterior distribution
8. Sample
k
using the Metropolis-Hastings algorithm
9. Repeat step (3) to step (8) until the Markov chain is sufficiently long
The length of the chain should be sufficiently long to make sure that, after
burn-in deleting and chain trimming, the posterior sample size is large
enough to allow accurate estimation of the posterior means (modes or
medians) of all QTL parameters. Methods and computer programs are
available to check whether the chain has converged to the stationary
distribution (BROOKS and GELMAN 1998; GELMAN and RUBIN 1992; GEWEKE
et al. 1992; RAFTERY and LEWIS 1992). Our past experience showed that the
burn-in period may only contain a few thousand observations. The trimming
frequency of saving one in every 20 observations is sufficient. The posterior
sample size of 1000 usually works well. However, if the model is not very
large, it is always a good idea to delete more observations for the burn-in
96

and trim more observations to make the chain thinner.
POST MCMC ANALYSIS
The MCMC sampling process is much like doing an experiment. It only
generates data for further analysis. The Bayesian estimates will only be
available after summarizing the data (posterior sample). The parameter
vector u is very long but not all parameters are of interest. Unlike other
methods in which the number of QTL is an important parameter, the
Bayesian shrinkage method uses a fixed number of QTL, and thus q is not
the parameter of interest. Although the variance component for the k th QTL,
2
k
o , is a parameter, it is not a parameter of interest. It only serves as a
factor to shrink the estimated QTL effect. Since the marginal posterior of
2
k
o
does not exist, the empirical posterior mean or posterior mode of
2
k
o does
not have any biological meaning. In some observations, the sampled
2
k
o
can be very large and in others it may be very small. The residual error
variance
2
o is meaningful only if the number of QTL placed in the model is
small to intermediate. When q is very large, the residual error variance will
be absorbed by the very large number of spurious QTL. The only
parameters that are of interest are the QTL effects and QTL positions.
However, the QTL identity, k , is also not something of interest. Since the k
th QTL may jump all of the places over the chromosome where it is
originally placed, the average effect
k
does not have any meaningful
biological interpretation. The only things left are the positions of the genome
that are hit frequently by QTL with large effects. Let us consider a fixed
position of a genome. A position of a genome is only a point or a locus.
Since the QTL position is a continuous variable, a particular point of the
genome is hit by a QTL has a probability of zero. Therefore, we define the
genome position by a bin with a width of d cM, where d can be 1 or 2 or
any other suitable value. The mid value of the bin represents the genome
location. For example, if 2 d = cM, the genome location 15 cM actually
represents the bin covering a region of the genome from 14 cM to 16 cM,
where 14 15 / 2 d = and 16 15 / 2 d = + . Once we define the bin width of a
genome location, we can count the number of QTL that hit the bin. For each
hit, we record the effect of that hit. The same location may be hit many
times by QTL with the same or different identities. The average effect of the
QTL hitting the bin is the most important parameter in the Bayesian
shrinkage analysis. Each and every bin of the genome has an average QTL
effect. We can then plot the effect against the genome location to form a
QTL (effect) profile. This profile represents the overall result of the Bayesian
mapping. In the BC example of Bayesian analysis, the k th QTL effect is
denoted by
k
. Since the QTL identity k is irrelevant, it is now replaced by
97

the average QTL effect at position , which is a continuous variable. The
without a subscript indicates a genome location. The average QTL effect
at position can be expressed as ( ) to indicate that the effect is a
function of the genome location. The QTL effect profile is now represented
by ( ) . If we use ( ) to denote the posterior mean of QTL effect at
position , we may use
2
( ) o to denote the posterior variance of QTL
effect at position . If QTL moving is not random but guided by the
Metropolis-Hastings rule, the posterior sample size at position should be
a useful piece of information to indicate how often position is hit by a
QTL. Let ( ) n be the posterior sample size at , the standard error of the
QTL effect at should be ( ) / ( ) n o . Therefore, another useful profile is
the so called t -test statistic profile expressed as

( )
( ) ( )
( )
t n


o
=
(9.29)
The corresponding F -test statistic profile is

2
2
( )
( ) ( )
( )
F n


o
= (9.30)
The t-test statistic profile is more informative than the F-test statistic profile
because it also indicates the direction of the QTL effect (positive or negative)
while the F-test statistic profile is always positive. On the other hand, the F-
test statistic can be extended to multiple effects per locus, e.g., additive and
dominance in an
2
F design. Both the t-test and F-test statistic profiles can
be interpreted as kinds of weighted QTL effect profiles because they
incorporated the posterior frequency of the genome location. The current
version of PROC QTL only reports the posterior sample size ( ) n , the
average posterior mean ( ) and the posterior standard deviation ( ) o .
Other posterior statistics, such as the equal-tail credibility interval, the
highest posterior density interval and the t-test statistics etc. will be added
later.
Note that the t-test or F-test statistics highly depends on the posterior
standard deviations, which may be deflated by autocorrelation of the
posterior samples. Therefore, the Markov chain should be thinned (trimmed)
highly to reduce the autocorrelation. In other words, you should delete more
observations of the posterior sample.
BAYESIAN MAPPING FOR ORDINAL TRAITS
The Bayesian method for ordinal trait QTL mapping is different from the
generalized linear model described in interval mapping. Here, we assume
an underlying variable, called liability, for each individual. This latent
98

variable is connected to the ordinal phenotype by a series of thresholds. An
individual will fall into a particular category if the liability of this individual is
within an interval defined by a particular pairs of thresholds. Suppose that a
disease phenotype of individual j ( 1, , j n = . ) is measured by an ordinal
variable denoted by 1, , 1
j
S r = . + , where 1 r + is the total number of
disease classes and n is the sample size. Let
j
y be the underlying liability
for individual j . Let
k
o be the k-th threshold for 0, , 1 k r = . + where
1 k k
o o
+
< ,
0
o = and
1 r
o
+
= +. The connection between
j
y and
j
S is
j
S k = if
1 k j k
y o o

< s for 1, , 1 k r = . + . The underlying liability


j
y is treated
as a regular quantitative trait and described by the usual linear model

1 1
p
j
q
j ji i jk k
i k
y X Z | c
= =
= + +

(9.31)
where ~ (0,1)
j
N c is assumed. The parameter vector now contains all the
parameters described in the regular quantitative trait QTL mapping plus r
thresholds of the liability
1
[ ... ]
T
r
o o o = . The Bayesian method for such
an ordinal trait requires sampling
j
y conditional on { , , , }
j
S o | and
sampling o conditioning on all the
j
y s.
Given
{ }
, , ,
j
S o | , the liability
j
y follows a truncated normal distribution
between
1 k
o

and
k
o with mean
j
and variance 1, where

1 1
p q
j ji i jk k
i k
X Z |
= =
= +

(9.32)
This truncated normal distribution is denoted by
( )
1
( | ...) ,1
k
k
j j j
p y N y
o
o

=
(9.33)
The algorithm we used to sample
j
y is through the inverse distribution
function as described below. First, we sample a uniform variable from

~ ( , )
j
u U a b
(9.34)
where

1
1 1
p q
ji i jk k
i k
k
X Z a o |
= =

| |
= u
|

\ .

(9.35)
and
99


1 1
p q
ji i jk k
i k
k
X b Z | o
= =

| |
= u
|
\ .

(9.36)
We then assign
j
y a value using

1
( )
j j
y u

= u (9.37)
Such sampled
j
y is guaranteed to follow the truncated normal distribution.
Given all the sampled
j
y s, we can sample the thresholds. First, we classify
all individuals into 1 r + groups based on their observed ordinal phenotypes.
Let

{ }
min
( ) min
j
j
S k
y k y
=
=
(9.38)
be the minimum value of the
j
y s for all individuals with
j
S k = and

{ }
max
( ) max
j
j
S k
y k y
=
=
(9.39)
be the maximum value of the
j
y s for all individuals with
j
S k = . The
posterior distribution of
k
o given all the
j
y s is uniform,
| |
max min
( | ) | ( ), ( 1)
k k
p y U y k y k o o = + (9.40)
from which
k
o is sampled. Once all the
k
o s and
j
y s are sampled, other
parameters are sampled the same way as the Bayesian QTL mapping for
regular quantitative traits.
SAMPLING MISSING PHENOTYPIC VALUES
The problem of missing phenotypic values can be handled by using the
Bayesian method of QTL mapping. For single trait analysis, individuals with
missing phenotypic values may be deleted from the analysis, because they
do not provide any additional information. Alternatively, you may replace the
missing values by the average of the trait in the mapping population.
Another approach is to replace the missing values by the expectations, i.e.,
replace the missing
j
y by
j
, where

1 1
p q
j ji i jk k
i k
X Z |
= =
= +

(9.41)
Since the expectation is a function of the parameters, it varies throughout
the MCMC process. The Bayesian method of PROC QTL handles missing
phenotypic values through sampling. Let
j
y be the missing phenotypic
100

value and
*
j
y be the sampled value to replace
j
y . The distribution used to
sample
*
j
y is
2
( , )
j
N o , where
2
o is the residual error variance. All
parameters take their current values in the MCMC process.
Although deleting individuals with missing phenotypic values does not hurt
the result for single trait QTL mapping, it does reduce the efficiency for
multiple trait QTL mapping (to be described later).
PERMUTATION
The permutation option in the PROC QTL statement allows users to
perform permutation analysis for the Bayesian method. The permutation
analysis is different from the permutation test in interval mapping
(CHURCHILL and DOERGE 1994). You need to analyze the data using the
Bayesian method first to provide the posterior means for all the QTL. The
permutation analysis will draw the QTL effects from the null model
(assuming there are no QTL effects). To generate the null distributions, you
need to permute the data in every cycle of the MCMC sampling process.
This is why you cannot conduct the permutation outside the QTL procedure.
Once the permutation option is turned on, the phenotypic data are
reshuffled before parameters are sampled. When every parameter is drawn,
the phenotypic data are reshuffled again. This process continues until a
desired length of the Markov chain has been reached. The posterior sample
will contain all the QTL effects drawn from the null distribution. You can then
calculate the 100% a and (1 ) 100% a values for each QTL effect of the
posterior sample, where 0.05 o = or other value of the user's choice. PROC
QTL only helps you draw the posterior sample from the null model. You
must perform the post MCMC analysis using other methods available either
in SAS or other software packages. If the posterior mean of a particular
marker or QTL from the original (unshuffled) data is beyond the 2 100% o
and (1 2) 100% o interval, the QTL can be declared as "significant" at the
o level. Details of the Bayesian permutation analysis can be found in Che
and Xu (2010). In summary, you need to perform two MCMC runs to
complete a permutation tested Bayesian mapping. One MMC run is the
analysis for the original data and the other is the MCMC run for the
repeatedly permuted data. The original data analysis provides the estimated
QTL effects. The permutated data analysis provides the 2 100% o and
(1 ) 0 2 1 0% o interval drawn from the null distribution.
101


Figure 4. Estimated QTL effects for the entire genome and the empirical thresholds
drawn from permutation within the Markov chain analysis at =0.05 (2.5% - 97.5% ,
wider interval) and =0.10 ( 5% - 95%, narrower interval).

Genome Position (CentiMorgan)
0 40 80 120 160 200 240 280 320 360
Q
T
L

E
f
f
e
c
t
-6
-4
-2
0
2
4
6
102

BAYESIAN MAPPING FOR DISCRETE TRAITS
The Bayesian method for QTL mapping covers quantitative traits and
ordinal traits. The ordinal traits, however, belongs to the category of discrete
traits. Under the liability model, ordinal traits are simply discrete
observations of underlying quantitative traits. Therefore, we discussed
ordinal traits in the chapter of Bayesian mapping for quantitative traits.
There are many different discrete traits with a polygenic background. Three
of them are very important in experiments and thus will be introduced here
in this chapter. These traits can be mapped using the so called generalized
linear model (GLM). Although generalized linear model has been used for
mapping discrete traits in interval mapping, the GLM under the Bayesian
mapping framework is slightly different from the GLM in the interval
mapping. The GLM to be used here takes advantage of normal distribution
while the GLM described before does not require normal distribution. The
basic idea of the GLM here is to find a functional transformation of a
discrete trait so that the transformed trait is normally distributed conditional
on parameters. Since the parameters are sampled during the MCMC
sampling process, the transformation is constantly updated using newly
sampled parameters. Because the transformed trait is conditionally normal,
the conditional posterior distribution of each QTL effect is also normal
provided that the prior is also normal. Therefore, we can fully take
advantage of existing sampling algorithms learned previously in the
Bayesian mapping for continuous traits.
GENERALIZED LINEAR MODEL
Let
j
w be the value of a trait for individual j in a population of size n. The
trait does not have to be continuously distributed. It can be of binary,
binomial, Poisson or some other type of distribution in the exponential
family. We can analyze such traits using the generalized linear model,
which consists of the following three components: (1) a linear predictor, (2)
a monotonic mapping between the mean of the data and the linear predictor
and (3) a response distribution in the exponential family of distributions
(GELMAN 2005). Let

1 1
p q
j ji i jk k
i k
X Z |
= =
= +

(10.1)
be the linear predictor for individual j . The QTL and non-QTL effects along
with the design matrices are defined exactly the same as those in Bayesian
mapping for quantitative traits. Therefore, no further discussion will be given
for the sampling of these effects. The model can be simplified in a compact
form
103


j j j
X Z | = +
(10.2)
where
{ }
i
| | = and
{ }
k
= are vectors rather than single elements. Let
( | )
j j
E w u q = be the expectation of
j
w so that

1 1
( ) ( )
j j j j
X Z q |

= = + (10.3)
or

( )
j j j j
X Z q | = = +
(10.4)
where ( )
j
q is called the link function that connects the expectation of the
trait to the linear predictor. The parameters { } u | = include non-QTL
effects | (nuisance parameters) and QTL effects ( ). Wolfinger and
Oconnell (1993) showed that the likelihood function of the original datum
j
w can be approximated by a normal likelihood function of the pseudo
datum
j
y , where the pseudo datum is a function of the parameter u u = .
Let

j j j
X Z | = + (10.5)
and

1
( )
| |
j j
j j
j j
j
j j


q

=

=
c c
A = =
c c
(10.6)
The pseudo datum ( )
j
y u is defined as
( )
1 1
( )
j j j j j
y w

= A +
(10.7)
which is approximately normal with mean
j j j
X Z | = + and variance

2 1 1
var( | )
j j j j
s w u

= A A (10.8)
The quantity inside the above sandwich expression of
2
j
s is var( | )
j
w u ,
which is the variance of the observed data point given u u = . The above
mean and variance of the pseudo datum ( )
j
y u is the conditional mean and
conditional variance given
j j j
X Z | = + .
We now write the linear model for the pseudo datum ( ) y u as a vector,

y X Z | c = + +
(10.9)
104

which is distributed as
~ ( , )
T
y N X ZGZ R | + (10.10)
where
2 2
1 1
var( ) diag[ var( ),..., var( )] diag( ,..., )
p p
G o o = = = and
2 2
1
diag( ,..., )
n
R s s = . Now the typical mixed model Bayesian analysis can be
performed using the pseudo data ( ) y u . The pseudo data and matrix R ,
however, are functions of the parameters. Therefore, the MCMC
implemented Bayesian analysis involves an extra step within each cycle of
the samplings, that is recalculation of the pseudo data and R after all
parameters have been sampled.
The non-QTL effects | is a 1 q vector, which is not subject to shrinkage
and, therefore, uniform prior is given to | . The QTL effects are denoted by
an 1 p vector with the k th component representing the genetic effect of
locus k . The prior distribution for
k
is
2
(0, )
k
N o and the variance in the
prior is further described by

2 2
( ) Inv ( , )
k
p o _ t e = (10.11)
The Bayesian shrinkage analysis of Xu (2003) actually adopted the Jeffreys
prior
2 2
( ) 1/
k k
p o o = , which is a special case of the scaled inverse chi-square
with ( , ) (0, 0) t e = . This prior is improper but usually generates highly sparse
models. ter Braak, Boer and Bink (2005) proposed to use ( , ) ( 2 , 0) t e o = ,
where 0 0.5 o < s . This is a proper prior, meaning that a posterior
distribution exists. The current version of PROC QTL simply takes
( , ) (0, 0) t e = . More options will be added later. We now provide three
special cases of the GLM.
BINARY DATA
The first example is the binary trait where the probit link function is used.
The trait is defined as { } 0,1
j
w = where ( )
j j
E w q = and the link function is

1
( ) ( )
j j j j j
X Z q q |

= = u = + (10.12)
This link function leads to

( )
| | ( )
j j j j
j j
j j
j j

q
|

= =
c cu
A = = =
c c
(10.13)
and
( )
var( | ) ( ) 1 ( )
j j j
w u = u u
(10.14)
105

Therefore, the pseudo datum is

( )
1
( ) ( )
( )
j j j j
j
y w u
|
= u +
(10.15)
The mean and variance of the pseudo datum are

j j j
X Z | = +
(10.16)
and

( )
2 1 1
2
( ) 1 ( )
( ) var( | )
( )
j j
j j j j
j
s w

u u
|

u u
= A A = (10.17)
respectively.
BINOMIAL DATA
The trait is defined as /
j j j
w m n = where
j
m is called the number of events
and
j
n is called the number of trials. Let ( )
j j
E w q = and the link function is

1
( ) ( )
j j j j j
X Z q q |

= u = = + (10.18)
This link function leads to

( )
| | ( )
j j j j
j j
j j
j j
q q
q
|

= =
c cu
A = = =
c c
(10.19)
and

( )
1
var( | ) ( ) 1 ( )
j j j
j
w
n
u = u u
(10.20)
Therefore, the pseudo datum is

( )
1
( ) ( )
( )
j j j j
j
y w u
|
= u +
(10.21)
The mean and variance of the pseudo datum are

j j j
X Z | = +
(10.22)
and

( )
2 1 1
2
( ) 1 ( )
( ) var( | )
( )
j j
j j j j
j j
s w
n

u u
|

u u
= A A = (10.23)
respectively.
106

The binary trait is a special case of the binomial trait when 1
j
n = for all
1,..., j n = . Therefore, PROC QTL only provides two options for the
generalized linear model, the binomial (including binary as special case)
and Poisson options.
POISSON DATA
The second example is the Poisson data 0,1,...,
j
w > . The PDF of the
Poisson distribution is
( )
( ) exp
!
j
w
j
j j
j
p w
w
q
q = (10.24)
where ( ) var( )
j j j
E w w q = = . The log link function is used for the Poisson data,

( ) ln( )
j j j j j
X Z q q | = = = +
(10.25)
This link function leads to

exp( )
| | exp( )
j j j j
j j
j j
j j

q


= =
c c
A = = =
c c
(10.26)
Therefore, the pseudo datum and the variance of the pseudo datum are

( )
1
( ) exp( )
exp( )
j j j j
j
y w u

= +
(10.27)
and

2 1 1
1
( ) var( | )
exp( )
j j j j
j
s w u u


= A A =
(10.28)
respectively.
107

EMPIRICAL BAYESIAN METHOD
MAIN QTL EFFECT MODEL
Empirical Bayes is still a Bayesian method, but the hyper parameters (the
parameters of the prior distribution) are not preselected by the investigators,
instead, they are estimated from the same data as used in the Bayesian
analysis (XU 2007b). Once the hyper parameters are estimated, they are
used in the Bayesian analysis as if they were the hyper parameters of the
prior distributions. The data are actually used twice, once for estimating the
hyper parameters and once for estimating the Bayesian posterior means. In
the QTL mapping problem, the flat prior for | does not have any hyper
parameters. If uniform prior is used for the residual variance
2
o , there is
also no hyper parameter for the uniform prior. The prior for the QTL effect is
independent normal

2 2
( | ) ( | 0, )
k k k k
p N o o = (11.1)
where
2
k
o is the variance of the prior distribution. It is a hyper parameter,
which is assigned another higher level prior. We use

2 2 2
( ) Inv ( | , )
k k
p o _ o t e = (11.2)
as the prior, where t and e are hyper parameters at the higher level. For
q model effects, the number of
2
k
o is q , which can be a large number. In
the fully Bayesian method under the hierarchical model,
2
k
o is estimated
simultaneously along with
k
. In the empirical Bayes method, we estimate
2
k
o first, independent of
k
, from the same data set. Recall that the linear
model for
j
y is

1 1
p q
j ji i jk k j
i k
y X Z |
= =
= + +

(11.3)
where
2
~ (0, )
j
N o is normally distributed. The compact matrix notation of
this model is

y X Z | = + +
(11.4)
when
k
is treated as a random effect, the expectation of
k
is zero.
Therefore,

( ) E y X| =
(11.5)
108

The variance-covariance matrix of y is

2 2
1
var( )
q
T
k k k
k
y V Z Z I o o
=
= = +

(11.6)
Let
{ }
2 2 2
1
, , ,
q
o o o = be the vector of variance components. The
distribution of y is multivariate normal

( | , ) ( | , ) p y N y X V | | =
(11.7)
The log likelihood function for parameters { } , u | = is

1 1
1 1
( ) ln | | ( ) (
1
( 2) l )
2
n | tr(
2 2
)
2
T T
L V y X V y D D X u | |
e
t

= + | (11.8)
where
2 2
1
diag{ ,..., }
q
D o o = . This likelihood is not a function of the QTL
effects. We can maximize this log likelihood function with respect to the
parameters to obtain MLE of { , } u | = denoted by

{ , } u | = . These
parameters are then treated as known quantities and used to derive the
posterior distribution for the QTL effects
k
. The posterior mean of each
k
,
given

u u = , is the empirical Bayes estimate of


k
. We need a conditional
updating algorithm to find
{ }

, u | = , such as the algorithm given by Xu
(2007b).
EPISTATIC QTL EFFECT MODEL
One advantage of the empirical Bayesian method over the fully Bayesian
method is its fast computational speed because MCMC sampling is not
required for the empirical Bayes. As a result, the method can handle even
larger models, e.g., the epistatic QTL effect model. Let q be the number of
loci included in the model, the total number of QTL effects is ( 1) / 2 q q + ,
including q main effects and ( 1) / 2 q q pairwise interaction effects. Higher
order interactions may also be included in the model if q is not too large.
The epistatic effect model is

1
' '
1 1 ' 1
#
q q q
k k k k kk
k k k k
y X Z Z Z |

= = = +
= + + +

(11.9)
where
'
#
k k
Z Z represents direct multiplication of the two vectors and
' kk
is
the epistatic effect between loci k and ' k . Parameters of both the main
effect model and the epistatic effect model are estimated using the same
algorithm. Therefore, we only describe the updating algorithm for the main
effect model.
109

SIMPLEX ALGORITHM
The high dimensionality of the model ( 1) p q + + prohibits the use of
simultaneous search for all parameters. Instead, we adopt a conditional
updating algorithm to search for one parameter at a time, given the
remaining parameters. We then take turn to update each one of the other
parameters. The iteration continues for many cycles until convergence is
reached. When updating a single parameter, it is a single dimensional
problem and usually an explicit solution is possible. First, we define the
parameter value at the t th iteration by
{ }
( ) ( ) ( )
,
t t t
u | = . Let

( ) 2( ) 2( )
1
q
t T t t
k k k
k
V Z Z I o o
=
= +

(11.10)
We can write the conditional log likelihood function for each parameter. For
the non-QTL effect, we have the following conditional log likelihood function,

(
( ) ( ) ( )
(
1
) ) 1
2 2
1
( 2) l
1 1
( | ) ln | | ( ) ( )
n | tr ( )
2
( )
2
t t
t t T t
L V y X V y X
D D
|
e
| |
t

=
( +

|
(11.11)
Setting
( )
( | ) 0
t
L |
|
c
=
c
and solving for | , we obtain an updated | ,
denoted by

1
( 1) ( ) 1 ( ) 1
( ) ( )
t T t T t
X V X X V y |

+
( ( =

(11.12)
The conditional log likelihood function for the residual variance is

( ) ( ) ( )
2 ( ) ( ) 2( ) ( ) 2
1
( ) 2( ) ( ) ( )
2
1 1
( | , ) ln ln( )
1
2
2 2
t t t t
T
t t t t
L V
y X V y X
o | o o
| o |
o

=

(11.13)
Setting
2 ( ) ( )
2
( | , ) 0
t t
L o |
o
c
=
c
and solving for
2
o yields

2( )
2( 1) ( ) ( ) 1 ( )
( ) ( ) ( )
t
t t T t t
y X V y X
n
o
o | |
+
= (11.14)
The conditional log likelihood function for
2
k
o is
110


2 ( ) ( ) ( ) 1 2 2( )
2
2 2( ) ( ) ( ) 1
( ) 1 2 2( )
2
2
1
( | , ) ln ( ) ( ) 1
( ) ( ) ( )
1
( ) ( ) 1
1

2
2
( 2) ln
2 2
t t T t t
k k k k k
t t T t
k k k
T t t
k k k k
k
k
L Z V Z
y X V Z
Z V Z
o | o o
o o |
o o
e
t o
o

( = +

(

+
+
+
(11.15)
Explicit solution does not exist and thus we used the simplex algorithm of
Nelder and Mead (1965) to search for the solution. When 2 t = and 0 e = ,
there is an explicit solution. We set
2 ( ) ( )
2
( | , ) 0
t t
k
k
L o |
o
c
=
c
and solve for
2
k
o ,
yielding

2
( ) ( ) 1 ( ) 1
2( ) 2( )
2
( ) 1
( ) ( ) ( )
( )
t T t T t
k k k
t t
k k
T t
k k
y X V Z Z V Z
Z V Z
|
o o

( (

= +
(

(11.16)
The following initial values are used by PROC QTL,

(0) 1
2(0) (0) (0)
2(0)
( )
1
( ) ( )
0, 1, ,
T T
T
k
X X X y
y X y X
n
k q
|
o | |
o

=
=
= = .
(11.17)
The MLE of { } , u | = are used, as if they were true values of the
parameters, to derive the posterior means of the QTL effects.

2 1
( | ) ( )
T
k k k
E y Z V y X o |

= (11.18)
Such a posterior mean is called the best linear unbiased predictions (BLUP).
The posterior variance is

2 1 2
var( | ) (1 )
T
k k k k k
y Z V Z o o

= (11.19)
Let
( | )
k k
E y = and

var( | )
k
k
S y

= , the t -test statistic is


k
k
k
t
S

=
(11.20)
which can be plotted against the genome location to produce a visual
presentation of the QTL effects. Users have an option to choose ( , ) t e . The
default setting is ( , ) (0, 0) t e = , corresponding the Jeffreys prior (FIGUEIREDO
2003).
111

BAYESIAN MAPPING FOR MULTIPLE TRAITS
MULTIPLE CONTINUOUS TRAITS
Let
1
[ ... ]
T
j j jm
y y y = , for 1,..., j n = , be a 1 m vector for the phenotypic
values of m quantitative traits measured from the j th individual of an
2
F
mapping population, where n is the sample size. The vector of phenotypic
values is described by the following multivariate linear model,

1 1
p p
j jk k jk k j
k k
y X Z e o |
= =
= + + +

, (12.1)
where
1
[ ... ]
T
m
= is an 1 m vector of the population means (or
intercepts) for the m traits,
1
[ ... ]
T
k k mk
o o o = and
1
[ ... ]
T
k k mk
| | | = are
the additive and dominance effects, respectively, for locus k ( 1,..., k p = )
and p is the number of loci included in the model. Both
k
o and
k
| are
1 m vectors because there are m traits involved in the model. The residual
error
1
[ ... ]
T
j j jm
e e e = is an 1 m vector with an assumed multivariate
normal distribution (0, ) N E , where E is an m m positive definite variance-
covariance matrix. The independent variables,
jk
X and
jk
Z , are defined as
follows. Let
1 1
A A ,
1 2
A A and
2 2
A A be the three genotypes at locus k . These
two variables are

1 1 1 1
1 2 1 2
2 2 2 2
1 for 0 for
0 for and 1 for
1 for 0 for
jk jk
A A A A
X A A Z A A
A A A A
+

= =


. (12.2)
The scales of these independent variables are arbitrary. Alternative scales
have been used by other investigators, e.g., Yang et al. (2006).
Under the assumption of normal distribution for the residual errors, the
conditional probability density of
j
y is

1 1
( | , , , ) ( | , )
p p
j j jk k jk k
k k
p y N y X Z o | o |
= =
E = + + E

, (12.3)
The likelihood function of the parameters is proportional to

1
( , , , ) ( | , , , )
n
j
j
L p y o | o |
=
E = E
[ , (12.4)
The parameter vector is { , , , , } u o | = E , where { }
k
= is the array of
112

QTL positions. The QTL genotype arrays,
1
{ }
n
j j
X X
=
= and
1
{ }
n
j j
Z Z
=
= , are
not parameters of interest but missing values in QTL mapping. They are
interesting quantities when marker assisted selection is considered after
QTL mapping. The likelihood function serves as a link between the data, the
parameters and the missing values. Combined with the prior distribution of
the parameters, the likelihood function is used to derive the posterior
distribution of the parameters. The number of QTL is p , which is supposed
to be a parameter of interest in the classical QTL mapping experiment, but
in the Bayesian shrinkage analysis, it is a preset constant. We set p as the
number of marker intervals. If an interval does not contain a QTL, the QTL
effects will be shrunken to zero. Therefore, a QTL with effect of zero is
equivalent to being excluded from the model. With this shrinkage analysis,
model selection is not conducted explicitly but implicitly via shrinkage.
Each of the parameters is assigned a prior distribution. The population
mean can be estimated accurately from the data, and thus a flat prior is
given to , i.e., ( ) constant. p = Each of the QTL effect vectors is assigned
a normal prior, ( ) ( | 0, )
k k k
p N A o o = and ( ) ( | 0, )
k k k
p N B | | = , where
k
A and
k
B are unknown variance-covariance matrices with dimension m m . The
above notation for the probability distribution is adopted from (GELMAN
2005), which are equivalently expressed as ~ (0, )
k k
N A o and ~ (0, )
k k
N B | .
The key difference between the shrinkage analysis and the usually
Bayesian regression analysis is that these prior variance-covariance
matrices are effect specific, i.e., they vary across different loci. Another
difference between the two is that the hyper-parameters (parameters of the
prior),
k
A and
k
B , are not known a priori but estimated from the data. To do
this, we give each of them a prior distribution. Once we assign a prior
distribution to a hyper-parameter, there will be multilevel prior assignment.
This is called hierarchical modeling (LINDLEY and SMITH 1972). We assign
the variance-covariance matrices with the following inverse Wishart
distributions, ( ) Inv-Wishart( | , )
k k
p A A t = I and ( ) Inv-Wishart( | , )
k k
p B B t = I ,
where 1 m t > and | | 0 I > are the prior degree of freedom and prior scale
matrix. These hyper-parameters are already remote from
k
o and
k
| , and
thus they can be preset with some convenient values (constant across loci)
without affecting the posterior inference of the QTL effects. To reflect the
lack of knowledge, t and I are set with values as small as possible, e.g.,
m t = and 0.1
m
I I = , where
m
I is an m m identity matrix. The residual
variance-covariance matrix is also assigned the same inverse Wishart
distribution, ( ) Inv-Wishart( | , ) p t E = E I . Although E is a parameter of
interest, data are usually sufficient to provide an accurate estimate of E ,
and thus the hyper-parameters t and I will have little influence on the
estimated E . Finally, a uniform prior distribution for
k
is chosen. Since we
113

assume that each marker interval contains one and only one QTL, the
uniform distribution for
k
is ( ) ( | , ) 1/ ( )
L R R L
k k k k k k
p U = = . All these
priors are independent across loci. Therefore, the joint prior distribution of
the parameters is

1
( , , , , ) ( ) ( ) ( ) ( ) ( )
p
k k k
k
p p p p p p o | o |
=
E = E
[
. (12.5)
The distribution of QTL genotype array is

1
( | ) ( | )
n
j
j
p X p X
=
=
[ . (12.6)
To simplify the sampling process, it is easier to sample one variable at a
time conditional on values of all other variables. The single variable defined
here means a vector of variables with the same type. For example,
k
o is
defined as a variable, but it is a vector containing the additive effects for all
traits. The conditional posterior distribution for one variable usually has an
explicit form of the distribution, making Monte Carlo simulation easy. We
now provide the posterior distribution for each of the parameters.
The conditional posterior distribution of is multivariate normal with mean
and variance given by

1 1 1
1
E( | ...) ( )
p p n
j jk k jk k
j k k
y x z
n
o |
= = =
=

(12.7)
and

1
var( | ...)
n
= E , (12.8)
respectively, where the special notation ( | ...) means conditional on all
other variables.
The conditional posterior for
k
o is multivariate normal with the following
mean and variance,

( )
1
2 1 1 1
' '
1
1 ' 1
E( | ...) ( )
p p n
n
k jk k jk j jk k jk k
j
j k k k
X A X y X Z o o |


=
= = =
= E + E

(12.9)
and

( )
1
2 1 1
1
var( | ...)
n
k jk k
j
X A o


=
= E +

. (12.10)
Similarly, the conditional posterior for
k
| is also multivariate normal with
mean and variance of
114


( )
1
2 1 1 1
' '
1
1 1 '
E( | ...) ( )
p p n
n
k jk k jk j jk k jk k
j
j k k k
Z B Z y X Z | o |


=
= = =
= E + E

(12.11)
and

( )
1
2 1 1
1
var( | ...)
n
k jk k
j
Z B |


=
= E +

, (12.12)
respectively. The conditional posterior means of
k
o and
k
| are called the
shrinkage estimates. Derivation of the shrinkage estimates can be found in
a recent note by Xu (2007a).
The hierarchical model also requires sampling of
k
A and
k
B from their
conditional posterior distribution. The inverse Wishart prior is conjugate and
thus the conditional posteriors of
k
A and
k
B are also inverse Wishart,
( | ...) Inv-Wishart( | 1, )
T
k k k k
p A A t o o = + I+ (12.13)
and
( | ...) Inv-Wishart( | 1, )
T
k k k k
p B B t | | = + I+ . (12.14)
The conditional posterior for the residual variance-covariance matrix is
inverse Wishart due to the conjugate nature of the prior,
( ) ( | ...) Inv-Wishart | , p n SS t E = E + I+ , (12.15)
where

( )( )
1 1 1 1
1
n
T
p p p p
j jk k jk k j jk k jk k
k k k k
j
SS y x z y x z o | o |
= = = =
=
=

.(12.16)
The distribution of
jk
X is discrete, and thus the conditional posterior
distribution can be obtained from the Bayes theorem. Let

1 2 3
[ ] [ 1 0 1]
T T
g g g g = = +
be the three genotype indicators for variable
jk
x and

1 2 3
[ ] [0 1 0]
T T
h h h h = =
be the three genotype indicators for variable
jk
Z . Assume that
L
jk u
m g =
( 1, 2, 3 u = ) and
R
jk v
m g = ( 1, 2, 3 v = ) are the observed genotypes for the two
flanking markers. The conditional posterior probability for
jk w
X g =
( 1, 2, 3) w = is calculated using the following Bayes theorem,
115


*
3
*
' ' '
' 1
( | ...)
( ) ( , ) ( , ) ( | , )
( ) ( ', ) ( ', ) ( | , )
L R
L R
jk w
jk w j w k w k
km km
jk w j w k w k
km km w
p X g
p X g H w u H w v N y g h
p X g H w u H w v N y g h
o |
o |
=
=
= + E
=
= + E

. (12.17)
where

1
1 3 2 2
( ) ( ) ( ) 1/ 4
jk jk jk
p X g p X g p X g = = = = = =

is the Mendelian segregation ratio. Other items in the above Bayes
theorem are defined as follows.

*
' ' ' '
' '
p p
j j jk k jk k
k k k k
y y X Z o |
= =
=

, (12.18)
is the adjusted phenotypic value of individual j by removing effects of all
other QTL except k .
L
km
H is the transition matrix between QTL k and
marker
L
m .
R
km
H is the transition matrix between QTL k and marker
R
m .
The conditional posterior distribution of the position of QTL k , ( | ...)
k
p , has
no explicit form due to the complexity of the model. Therefore,
k
must be
sampled from a Metropolis-Hastings (HASTINGS 1970; METROPOLIS et al.
1953) algorithm. The algorithm presented by Wang et al. (2005b) for
univariate QTL mapping can be directly adopted here for multivariate QTL
mapping.
Finally, when a marker genotype is missing, it must be sampled from its
conditional posterior distribution.
The MCMC sampling process is summarized as follows:
1. Initialize all variables, including parameters and missing values,
with some values in their legal domains;
2. Sample from its conditional posterior distribution (multivariate
normal);
3. Sample
k
o and
k
| from their conditional posterior distributions
(multivariate normal);
4. Sample
k
A and
k
B from their conditional posterior distributions
(inverse Wishart);
5. Sample E from its conditional posterior distribution (inverse
Wishart);
6. Sample QTL genotypes from their conditional posterior distributions
(derived from the Bayes theorem);
116

7. Sample genotypes of missing markers from their conditional
posterior distributions (derived from the Bayes theorem);
8. Sample QTL positions from their conditional posterior distribution
using the Metropolis-Hastings algorithm;
9. Repeat steps (1) (8) until the Markov chain is sufficiently long.
Steps (1) - (7) are called the Gibbs sampler steps while step (8) is
called the M-H step.
How long is sufficiently long for the Markov chain? Users can use the
algorithm of Gelman et al. (2005) to check the convergence of the chain.
The product of MCMC sampling is a realized sample of all unknown
variables from the joint posterior distribution. The MCMC does not result in
a significance test but serves as a process of creating the empirical
posterior distributions of parameters, from which all the information about
the QTL is inferred. The most important parameters are the locations and
the effects of the QTL, while the covariance matrices are not of immediate
interest but assist in the estimation of the effects. In the conventional
Bayesian mapping analysis (SILLANP and ARJAS 1998; XU 2002), the
marginal posterior distribution of QTL position was graphically summarized
by plotting the number of hits by a QTL in a short region against the location
where that short region occurs in the genome. The curve produced is called
the QTL intensity profile. In this study, we assume that each marker interval
is associated with a QTL, and thus all intervals are hit by a QTL the same
number of times. If an interval contains a real QTL, the QTL intensity profile
within the interval is expected to show a peak. Otherwise, the intensity
profile will be flat (uniform). Such a QTL intensity profile is denoted by ( ) f ,
where now denotes a particular location of the genome.
The QTL intensity profile itself is not the best indicator of the QTL location
under the Bayesian shrinkage analysis. We propose to weigh the intensity
profile by the QTL effects and use the weighted QTL intensity profile to
indicate the locations of the QTL. The majority of the genome segments
have negligible QTL effects and thus only the areas with nontrivial QTL
effects will show clear peaks. Let ( ) o and ( ) | be 1 q vectors of additive
and dominance effects, respectively, of QTL collected at position of the
genome. There are many ways to present the QTL effects as functions of
genome location. However, we choose the following profile to present the
QTL effects,

1 1
( ) ( ) ( ) ( ) ( )
2 4
T T
g o o | | = + . (12.19)
The coefficients 1/ 2 and 1/ 4 in front of the quadratic terms are the
expected variances of
jk
X and
jk
Z across individuals within the
2
F
population (assuming no segregation distortion). This QTL effect profile
( ) g reduces to
2 2
1 1
2 4
( ) ( ) ( ) g o | = + in the special case of single trait
117

analysis, which is the QTL variance at location . If desired, one can also
draw QTL effect profile for each trait or each effect of the trait (additive or
dominance). The QTL effect profile presented here is the overall effect on
the entire genome.
The weighted QTL intensity profile is defined as

( ) ( ) ( ) w f g =
. (12.20)
The intensity profile ( ) f does not tell much about QTL across marker
intervals because each interval is hit by the same number of QTL, but if an
interval contains a QTL, ( ) f is able to show a peak within that interval. The
QTL effect profile ( ) g , on the other hand, can pick up the intervals with
large effect QTL, but it is not sensitive to the change of location within an
interval. Therefore, the weighted intensity profile ( ) w can pick up the
intervals with QTL and also show sharp peaks within intervals.
In practice, not all traits are measured in the same scale. The profile of the
overall QTL effect may be dominated by the traits with large variances. Two
approaches may be taken to eliminate this problem. One is to standardize
all traits before the analysis so that they all have roughly the same variance
(XU 2002). Alternatively, ( ) g may be modified by

1 1
1 1
( ) ( ) ( ) ( ) ( )
2 4
T T
g o o | |

= E + E
. (12.21)
where E is the residual covariance matrix.
Pleiotropic effects can be visualized by comparing the weighted QTL
intensity profiles for individual traits. Let
1
( ) [ ( ) ... ( )]
T
q
o o o = be the
additive effects of QTL at location , where ( )
i
o is the effect for the i th
trait for ( 1,..., i q = ). Pleiotropic effect occurs at position if more than one
component of ( ) o is noticeably large.
MULTIPLE BINARY TRAITS
With little effort, the method can be extended to handle binary traits. A
binary trait is a categorical trait with two states: presence and absence.
Recall that
1
[ ... ]
T
j j jm
y y y = is a vector of phenotypic values for q
quantitative traits. If the i th trait is binary, the phenotype is denoted by
{0,1}
ji
w = with 0 representing absence and 1 representing presence. Under
the threshold model for a binary trait (XU et al. 2005a), we propose that trait
i is still a quantitative trait, but we cannot observe
ji
y . This latent
quantitative trait, however, determines the observed binary phenotype. We
propose a hypothetical threshold 0
i
t = so that 0
ji
w = for
ji i
y t s and 1
ji
w =
118

for
ji i
y t > . The latent variable is still described by the usual linear model
with normal residual error except that the residual error variance is set to 1
because it is not estimable. Under this threshold model, we can derive the
conditional posterior distribution of
ji
y given
ji
w , the phenotypic values of
all other traits and the current parameter values. This conditional posterior
distribution happens to be a truncated normal distribution, from which
ji
y is
sampled. Detailed algorithm for sampling
ji
y has been given by Xu et al.
(2005a). Korsgaard et al. (2003) provided a general method for sampling
the liability for ordered categorical traits. Again, their method is not for QTL
mapping but for classical quantitative trait analysis. Once
ji
y is sampled,
j
y
becomes a full vector of quantitative trait values. The MCMC sampling
schemes described earlier applies. Therefore, mapping multiple traits with
one or more binary trait components requires only one more step of
sampling the missing phenotype of an underlying quantitative trait.
MIXTURE OF CONTINUOUS AND BINARY TRAITS
PROC QTL can also perform QTL mapping for a trait set consisting of
multiple continuous and binary traits. The key here is to sample the latent
liability for the binary trait components. Recall that
1
[ ... ]
T
j j jm
y y y = is a
vector of continuous trait phenotypes. If a trait is binary, say the i th trait,
the corresponding
ji
y is missing and it must be sampled from a truncated
normal distribution. This truncated normal distribution is conditional on all
other components of
j
y except
ji
y . This conditional distribution is truncated
normal with the direction of truncation determined by the binary phenotype
of the trait of concern.
MISSING VALUES
Missing values of a phenotype is sampled using the same approach as
sampling the liability except that if the missing phenotype is continuous, it is
sampled from the conditional normal distribution. If the missing trait is a
binary, it is sampled from the truncated conditional normal distribution.
119

REFERENCE
BROOKS, S., and A. GELMAN, 1998 General methods for monitoring convergence of
iterative simulations. J. Comput. Graph. Stat. 7: 434-455.
CASELLA, G., and E. I. GEORGE, 1992 Explaining the Gibbs sampler. Am. Stat. 46: 167-
174.
CHE, X., and S. XU, 2010 Significance test and genome selection in Bayesian shrinkage
analysis. Int. J. Plant Genomics (in press).
CHURCHILL, G. A., and R. W. DOERGE, 1994 Empirical threshold values for quantitative
trait mapping. Genetics 138: 963-971.
DEMPSTER, A. P., M. N. LAIRD and D. B. RUBIN, 1977 Maximum likelihood from
incomplete data via the EM algorithm. J. Roy. Stat. Soc. B 39: 1-38.
ELSTON, R. C., and J. STEWART, 1973 The analysis of quantitative traits for simple
genetic models from parental, F1 and backcross data. Genetics 73: 695-711.
FARIS, J. D., B. LADDOMADA and B. S. GILL, 1998 Molecular mapping of segregation
distortion loci in Aegilops tauschii. Genetics 149: 319-327.
FEENSTRA, B., I. M. SKOVGAARD and K. W. BROMAN, 2006 Mapping quantitative trait loci
by an extension of the Haley-Knott regression method using estimating
equations. Genetics 173: 2269-2282.
FIGUEIREDO, M., 2003 Adaptive sparseness for supervised learning. IEEE T. Pattern.
Anal. 25: 1150-1159.
FU, Y. B., and K. RITLAND, 1994 On estimating the linkage of marker genes to viability
genes-controlling inbreeding depression. Theor. Appl. Genet. 88: 925-932.
GELMAN, A., 2005 Analysis of variance: why it is more important than ever. Ann. Stat. 33:
1-31.
GELMAN, A., 2006 Prior distributions for variance parameters in hierarchical models
(Comment on Article by Browne and Draper). Bayesian Anal. 1: 515534.
GELMAN, A., and D. RUBIN, 1992 Inference from iterative simulation using multiple
sequences. Stat. Sci. 7: 457-472.
GEMAN, S., and D. GEMAN, 1984 Stochastic relaxation, Gibbs distributions, and the
Bayesian restoration of images. IEEE T. Pattern. Anal. 6: 721-741.
GEWEKE, J., J. BERGER and A. DAWID, 1992 Evaluating the accuracy of sampling-based
approaches to the calculation of posterior moments in Bayesian Statistics 4,
120

edited by J. M. BERNARDO, J. O. BERGER, A. P. DAWID and A. F. M. SMITH.
Oxford Univ. Press, Oxford.
HALDANE, J. B. S., 1919 The combination of linkage values and the calculation of
distances between the loci of linked factors. J. Genet. 8: 299-309.
HALEY, C. S., and S. A. KNOTT, 1992 A simple regression method for mapping
quantitative trait loci in line crosses using flanking markers. Heredity 69: 315-324.
HAN, L., and S. XU, 2008 A Fisher scoring algorithm for the weighted regression method
of QTL mapping. Heredity 101: 453-464.
HANSEN, M. M., E. E. NIELSEN and K.-L. D. MENSBERG, 1997 The problem of sampling
families rather than populations: relatedness among individuals in samples of
juvenile brown trout Salmo trutta L. Mol. Ecol. 6: 469-474.
HASTINGS, W., 1970 Monte Carlo sampling methods using Markov chains and their
applications. Biometrika: 97-109.
HOERL, A., and R. KENNARD, 2000 Ridge regression: Biased estimation for
nonorthogonal problems. Technometrics: 80-86.
JIANG, C., and Z. B. ZENG, 1997 Mapping quantitative trait loci with dominant and
missing markers in various crosses from two inbred lines. Genetica 101: 47-58.
KAO, C. H., 2000 On the differences between maximum likelihood and regression
interval mapping in the analysis of quantitative trait loci. Genetics 156: 855-865.
KAO, C. H., Z. B. ZENG and R. D. TEASDALE, 1999 Multiple interval mapping for
quantitative trait loci. Genetics 152: 1203-1216.
KRKKINEN, K., V. KOSKI and O. SAVOLAINEN, 1996 Geographical variation in the
inbreeding depression of scots pine. Evolution 50: 111-119.
KNOTT, S. A., and C. S. HALEY, 2000 Multitrait least squares for quantitative trait loci
detection. Genetics 156: 899-911.
KORSGAARD, I., M. LUND, D. SORENSEN, D. GIANOLA, P. MADSEN et al., 2003 Multivariate
Bayesian analysis of Gaussian, right censored Gaussian, ordered categorical
and binary traits using Gibbs sampling. Genet. Sel. Evol. 35: 159 - 183.
KOSAMBI, D. D., 1944 The estimation of map distances from recombination values. Ann.
Eugenics 12: 172-175.
LANDER, E. S., and D. BOTSTEIN, 1989 Mapping mendelian factors underlying
quantitative traits using RFLP linkage maps. Genetics 121: 185-199.
LANDER, E. S., P. GREEN, J. ABRAHAMSON, A. BARLOW, M. J. DALY et al., 1987
MAPMAKER: an interactive computer package for constructing primary genetic
linkage maps of experimental and natural populations. Genomics 1: 174-181.
121

LANDER, E. S., and N. J. SCHORK, 2006 Genetic dissection of complex traits. Focus 4:
442-458.
LINDLEY, D., and A. SMITH, 1972 Bayes estimates for the linear model. J. Roy. Stat. Soc.
B 34: 1-41.
LORIEUX, M., B. GOFFINET, X. PERRIER, D. G. LEN and C. LANAUD, 1995a Maximum-
likelihood models for mapping genetic markers showing segregation distortion. 1.
Backcross populations. Theor. Appl. Genet. 90: 73-80.
LORIEUX, M., X. PERRIER, B. GOFFINET, C. LANAUD and D. LEN, 1995b Maximum-
likelihood models for mapping genetic markers showing segregation distortion. 2.
F2 populations. Theor. Appl. Genet. 90: 81-89.
LOUIS, T., 1982 Finding the observed information matrix when using the EM algorithm. J.
Roy. Stat. Soc. B 44: 226-233.
LUO, L., and S. XU, 2003 Mapping viability loci using molecular markers. Heredity 90:
459-467.
LUO, L., Y. M. ZHANG and S. XU, 2005 A quantitative genetics model for viability
selection. Heredity 94: 347-355.
MCCULLAGH, P., and J. NELDER, 1989 Generalized linear models. London: Chapmann
and Hall.
METROPOLIS, N., A. W. ROSENBLUTH, M. N. ROSENBLUTH, A. H. TELLER and E. TELLER,
1953 Equation of state calculations by fast computing machines. J. Chem. Phys.
21: 1087-1092.
NELDER, J. A., and R. MEAD, 1965 A simplex method for function minimization. Comput.
J. 7: 308-313.
NELDER, J. A., and R. W. M. WEDDERBURN, 1972 Generalized linear models. J. Roy.
Stat. Soc. A Sta. 135: 370-384.
PIEPHO, H. P., 2001 A quick method for computing approximate thresholds for
quantitative trait loci detection. Genetics 157: 425-432.
PRITCHARD, J. K., M. STEPHENS, N. A. ROSENBERG and P. DONNELLY, 2000 Association
mapping in structured populations. Am. J. Hum. Genet. 67: 170-181.
RAFTERY, A., and S. LEWIS, 1992 How many iterations in the Gibbs sampler. Bayesian
Stat. 4: 763-773.
RAO, S., and S. XU, 1998 Mapping quantitative trait loci for ordered categorical traits in
four-way crosses. Heredity 81: 214-224.
SILLANP, M. J., and E. ARJAS, 1998 Bayesian mapping of multiple quantitative trait loci
from incomplete inbred line cross data. Genetics 148: 1373-1388.
122

SOLLER, M., T. BRODY and A. GENIZI, 1976 On the power of experimental designs for the
detection of linkage between marker loci and quantitative loci in crosses between
inbred lines. Theor. Appl. Genet. 47: 35-39.
TANKSLEY, S. D., 1993 Mapping polygenes. Annu. Rev. Genet. 27: 205-233.
TER BRAAK, C. J. F., M. P. BOER and M. C. A. M. BINK, 2005 Extending Xu's Bayesian
model for estimating polygenic effects using markers of the entire genome.
Genetics 170: 1435-1438.
TIERNEY, L., 1994 Markov chains for exploring posterior distributions. Ann. Stat. 22:
1701-1728.
TURNPENNY, P., and S. ELLARD, 2005 Emery's elements of medical genetics. Elsevier,
Churchill Livingstone.
TYCHONOFF, A. N., 1943 On the stability of inverse problems. Dok. Akad. Nauk SSSR
39: 195-198.
VOGL, C., and S. Z. XU, 2000 Multipoint mapping of viability and segregation distorting
loci using molecular markers. Genetics 155: 1439-1447.
WALD, A., 1957 Tests of statistical hypotheses concerning several parameters when the
number of observations is large. Selected papers in statistics and probability 54:
323.
WANG, C., C. ZHU, H. ZHAI and J. WAN, 2005a Mapping segregation distortion loci and
quantitative trait loci for spikelet sterility in rice ( Oryza sativa L.). Genet. Res. 86:
97-106.
WANG, H., Y. ZHANG, X. LI, G. L. MASINDE, S. MOHAN et al., 2005b Bayesian shrinkage
estimation of quantitative trait loci parameters. Genetics 170: 465-480.
WEDDERBURN, R. W. M., 1974 Generalized linear models specified in terms of
constraints. J. Roy. Stat. Soc. B 36: 449-454.
WILKS, S. S., 1938 The large-sample distribution of the likelihood ratio for testing
composite hypotheses. Ann. Math. Stat. 9: 60-62.
WOLFINGER, R., and M. O'CONNELL, 1993 Generalized linear mixed models: a pseudo-
likelihood approach. J. Stat. Comput. Sim. 48: 233-243.
WRIGHT, S., 1934 An analysis of variability in number of digits in an inbred strain of
guinea pigs. Genetics 19: 506-536.
XU, C., Z. LI and S. XU, 2005a Joint mapping of quantitative trait loci for multiple binary
characters. Genetics 169: 1045-1059.
XU, C., Y. ZHANG and S. XU, 2005b An EM algorithm for mapping quantitative resistance
loci. Heredity 94: 119-128.
123

XU, S., 1995 A Comment on the Simple Regression Method for Interval Mapping.
Genetics 141: 1657-1659.
XU, S., 1996 Mapping quantitative trait loci using four-way crosses. Genet. Res. 68: 175-
181.
XU, S., 1998a Further investigation on the regression method of mapping quantitative
trait loci. Heredity 80: 364-373.
XU, S., 1998b Iteratively reweighted least squares mapping of quantitative trait loci.
Behav. Genet. 28: 341-355.
XU, S., 2002 QTL analysis in plants, pp. 283-310 in Quantitative Trait Loci: Methods and
Protocols, edited by N. J. CAMP and A. COX.
XU, S., 2003 Estimating polygenic effects using markers of the entire genome. Genetics
163: 789-801.
XU, S., 2007a Derivation of the shrinkage estimates of quantitative trait locus effects.
Genetics 177: 1255-1258.
XU, S., 2007b An empirical Bayes method for estimating epistatic effects of quantitative
trait loci. Biometrics 63: 513-521.
XU, S., 2008 Quantitative trait locus mapping can benefit from segregation distortion.
Genetics 180: 2201-2208.
XU, S., and W. R. ATCHLEY, 1996 Mapping Quantitative Trait Loci for Complex Binary
Diseases Using Line Crosses. Genetics 143: 1417-1424.
XU, S., and Z. HU, 2010a Generalized linear model for interval mapping of quantitative
trait loci. Theor. Appl. Genet.: doi: 10.1007/s00122-00010-01290-00120.
XU, S., and Z. HU, 2010b Mapping quantitative trait loci using distorted markers. Int. J.
Plant Genomics 2009: 11 doi:10.1155/2009/410825.
YANG, R., Q. TIAN and S. XU, 2006 Mapping quantitative trait loci for longitudinal traits in
line crosses. Genetics 173: 2339-2356.
YI, N., and S. XU, 2002 Linkage analysis of quantitative trait loci in multiple line crosses.
Genetica 114: 217-230.
YI, N., S. XU and D. B. ALLISON, 2003 Bayesian model choice and search strategies for
mapping interacting quantitative trait Loci. Genetics 165: 867-883.
ZHU, C., and Y. M. ZHANG, 2007 An EM algorithm for mapping segregation distortion
loci. BMC Genet. 8: 82.

You might also like