QTL Mapping

Quantitative Traits
• Vary continuously (e.g.

yield, quality, stress
QTL Mapping tolerance)
• Usually governed by a
number of genes
Violeta I. Bartolome • Loci involved in the
Senior Associate Scientist-Biometrics
Crop Research Informatics Laboratory inheritance of quantitative
International Rice Research Institute traits are called QTL
(quantitative trait loci)
Mapping Populations
QTL Mapping
Objective is to identify QTLs that affect
the quantitative trait of interest.
Methods to Detect QTLS
Data Needed for QTL Mapping
• Assign a trait value for each mapping • Single marker analysis

population member. • Interval mapping
• Allele score for the set of marker loci • Composite interval mapping
distributed throughout the genome. • Multiple QTL mapping
Single Marker Analysis (SMA)

Model for SMA
y = µ + MG i + e
where y = the phenotype

MG = marker genotype
Single Marker Analysis
• A significant difference between

phenotypic means of the groups indicates
that the marker locus being used to
partition the mapping population is linked
to a QTL controlling the trait.
• The QTL and marker is usually inherited
together and the mean of the group with
the tightly-linked marker will be
significantly different to the mean of the
group without the marker.

• Advantages • Disadvantages
o Simple o Must exclude
individuals with
o Easily incorporates missing genotype
covariances data R/qtl
o Does not require a o Less precise about
complete genetic the location of the
QTL.
map
o The farther away a Data Entry, Data Quality Check,
QTL from a marker
the less likely it is to and Single Marker Analysis
be detected thus the
QTL effect may be
underestimated.
o Only considers one
QTL at a time.
Data analyzable by R/qtl Data not analyzable by R/qtl
• F2
• Outcross data
• Backcross
• Half-sib families
• RILs
• Advanced intercross lines
– class(mydata) [1] <- “riself” # if by selfing
– Class(mydata) [1] <- “risib” # if by sibling mating
Sample Data
Input files • cvs format
• Text file (comma delimited)

• Mapmaker format
• QTL cartographer format
back
Sample Data Sample Data
• Map maker format – genotype data • Map maker format – phenotype data
back
Sample Data Sample Data

• QTL Cartographer format – Rcross data • QTL Cartographer format – Rmap data
Reading cross data Reading csv data
• csv file
read.cross(“csv”, file=“csvfile.csv”,
genotypes=c("A","H","B","D","C" ))
• Map maker file
read.cross(“csvs”,
genfile=“mapmaker_gen.csv”,
phefile=“mapmaker_phe.csv”)
• QTL cartographer
read.cross(“qtlcart”,
file=“qtlcart.cro”,
mapfile=“qtlcart.map”)
plot.missing()
Data Quality Check
• Plot a grid showing
which genotypes
are missing
Note: Genotypes with

Drop markers deviating from the hypothesized ratio missing data are denoted by
using the following statement black pixels.
plot.map() plot.pheno()
• Plot genetic map of • Plots a histogram or

marker locations for barplot of the data
all chromosomes for a phenotype
from an
experimental cross
Note: pheno.col indicates the column

number of the data to be plotted.
plot() est.rf()
• Estimate the sex-averaged recombination
• Plots all graphs fraction between all pairs of genetic
together markers
• For a backcross, one can simply count
recombination events. For an intercross or
4-way cross, recombination fractions must
be estimated.
Plot both rf and lod
plot.rf()
• Plot a grid showing the recombination

fractions for all pairs of markers, and/or
the LOD scores for tests of linkage
between pairs of markers
• If both are plotted, the recombination
fractions are in the upper left triangle
while the LOD scores are in the lower right
triangle. Red corresponds to a large LOD
or a small recombination fraction, while
blue is the reverse. Missing values appear
in light gray
Plot rf and lod for Chr 1 only Plot lod only for Chr 2 and 3
scanone()
scanone() scanone(cross, chr, pheno.col=1,
model=c("normal","binary","2part","np"),
• Genome scan with a single QTL method=c("em","imp","hk","ehk","mr","mr-
model, with possible allowance for imp","mr-argmax"), addcovar=NULL, n.perm,)
covariates, using any of several cross – object to be analyzed
possible models for the phenotype chr - optional vector indicating the chromosomes for
and any of several possible which LOD scores should be calculated
numerical methods pheno.col – column number of the phenotype data
addcovar - additive covariates, allowed only for the normal
and binary models
forward
n.perm – the number of permutations
model=
• normal – the standard QTL model for QTL
mapping. The residual phenotypic variation
is assumed to follow a normal distribution
• binary – for binary phenotype, which must
have values 0 and 1. Available for em and mr
methods only
• 2part – when there is a spike in the
phenotype distribution
• np( non-parametric) – an extension of the
Kruskal-Wallis test is used
Single marker ANOVA
method=
• Threshold=3
• mr – single marker regression
o mr – deletes individuals with missing genotype
o mr-imp – fills in missing data using single imputation
o mr-argmax – fills in missing data suing the Vitervi algorithm
• em – maximum likelihood using the Expectation-
maximization (EM) algorithm • Using permutation test
• hk – Haley-Knott regression
• imp – multiple imputation (Sen and Churchill, 2001).
Uses Monte Carlo algorithm instead of EM.
• ehk – extended Haley-Knott method (Feenstra et al.,
2006). An improvement of the hk especially when
epistasis exists between QTLs
Estimating heritability
for each marker Interval Mapping (IM)
• Used for estimating

the position of a QTL
within two markers
• Statistically more
powerful than single
marker analysis
Methods used in IM Probabilities of a putative QTL for
• Maximum Likelihood (standard interval a backcross
mapping)
• Haley-Knott Regression (1 − r1 )(1 − r2 )
Prob(Q|M1M2)
• Extended Haley-Knott Regression 1 − r12
Note: (1 − r1 )r2
Prob(Q|M1m2)
• All methods estimate three parameters: mean, r12
genetic effects and residual variance. r1 (1 − r2 )
Prob(Q|m1M2)
• All methods compute the conditional r12
probabilities for each QTL genotype at a position r1r2
between markers. Prob(Q|m1m2)
1 − r12
LOD Scores Odds

• Logarithmic of the odds – used to identify
prob . of success p
the most likely position for a QTL in Odds = =
relation to the linkage map prob . of failure 1− p
• Test of Significance
o LOD > 3 is the significance threshold – 1 in 1,000 Odds = 1 equal chance of success and failure
the loci are not linked
o Permutation test Odds < 1 lower chance of success
Odds > 1 higher chance of success
forward
Maximum Likelihood Maximum Likelihood
• A test statistic for this method is:
• The likelihood for a given set of parameters
(QTL position and QTL effect) given the Max_Likelihood(reduced model)
LR = −2 ln
observed data on phenotypes and marker Max_Likelihood(full model)
genotypes The reduced model refers to the null-
• The estimates for the parameters are those hypothesis of no QTL effect.
where the likelihood are highest
• Expectation-maximization(EM) method is • The LOD score for a QTL at position c is:
used in the estimation procedure LR(c) LR(c)
LOD(c) = =
2ln10 4.61
Haley-Knott (HK) Regression HK Regression

• For each QTL position, the residual sums of
• For two markers, the model is: squares (SSE) is determined.
y = µ + αx + e • The estimate of the QTL position is where
the SSE is the minimum.
where y is the observed phenotype • Estimates an approximate likelihood ratio:
x is the P(Q|mg1,mg2,r1,r12)
 SSEreduced 
LR = n ln 
 SSE 
 full 
Extended HK Regression Which IM method to use
• An improvement of the HK regression • ML provides better estimates but analysis

is complex and computationally expensive
• Correct variance for each genotype is • HK regression is computationally faster
being used instead of a constant but estimate of the residual variance is
biased and the power of QTL detection
variance used in the HK regression may be affected (Kao et al 1999)
• Extended HK regression is not as fast as
HK but provides improved approximations
and still faster than ML
• Results are hardly different in practical
mapping
Multiple Imputation Method

Interval Mapping
• Another method available for IM
• Fills in all missing genotype data then • Advantages • Disadvantages
uses single marker ANOVA to identify o Takes proper o Increased
significant QTLS account of missing computation time
• More robust than ML but has little data o Requires specialized
advantage over the extended HK for o Allows examination software
of positions between Difficult to generalize
single QTL mapping markers
o
o Only considers one
• Intensive in both computation time and o Gives improved QTL at a time
memory use estimates of QTL
effects
IM sample output
Red – EM
Blue - EHK
R/qtl
Interval Mapping
EM, HK, and EHK
calc.genoprob()
Interval mapping
• Maximum likelihood • Calculate QTL probabilities conditional
on the available marker data.
• Needed in most mapping functions
o step – indicates step size in cM at which the
probabilities are to be calculated
o error.prob – assumed genotyping error rate
Permutation test can also be used to get
threshold value for lod scores. Note: genotyping error occurs when the
observed genotype of an individual does not
correspond to the true genotype.
Combining IM results
Interval mapping
• Extended Haley-Knott Regression
Permutation test can also be used to get

threshold value for lod scores.
Plot of combined results Composite Interval Mapping

• Performs interval mapping using a
subset of marker loci as covariates
red – em
blue - ehk
• Markers serve as proxies for other
QTLs to account for linked QTLs and
reduce residuals
• Gives greater power in identifying key
QTL.
• More statistically complicated and
requires more computational power.
Steps in CIM
Sample CIM output
• Selects a set of markers to serve as
covariates.
• Performs interval mapping with these Blue – EM
markers as covariates. Red - CIM
• Excludes markers at a fixed distance from

the test position.
• Calculates a LOD score comparing the model
with the putative QTL in the presence of
covariates to the model with just the
covariates.
Problem with CIM
• The estimated position of the first QTL

can be influenced by the second QTL R/qtl
and vice versa, especially for linked
QTLs.
• The choice of covariates is critical: if Composite Interval Mapping
too many or too few markers are
chosen there will be a loss of power to
detect QTL.
Composite interval mapping
cim()
• cim(cross, pheno.col=1, n.marcovar=3,

method=c("em", "imp", "hk", "ehk"),
imp.method=c("imp", "argmax"),
error.prob=0.0001, n.perm, window=10)
o n.marcovar - number of marker covariates to use
o imp.method - method used to impute any missing
marker genotype data
o window – marker covariates will be omitted this
distance from the test postion
• add.cim.covar - Add dots at the locations
of the selected marker covariates, for a
plot of composite interval mapping results
CIM-Using permutation test Composite interval mapping
blue – em
red – cim
Sample Multiple QTL Mapping
Multiple QTL Mapping output
• Extension of interval mapping to

multiple QTLs
• Infer the location of QTLs to positions
between markers
• Investigate interactions between QTLs
(epistasis)
• More powerful and precise in detecting
QTL (Kao et al 1999)
Other Methods used in Interval

Mapping
• Bayesian Method – uses probability

theories in parameter estimations R/qtl
based on prior knowledge about the
data (R/qtlbim)
Multiple QTL Mapping
• Mixed model regression – available
in R/ASReml
Multiple QTL Mapping Displays the QTL on the genetic map
• sim.geno() is used to impute genotypes with missing

data to minimize loss of information
• makeqtl() is used to create a qtl object. It pulls out the
imputed genotypes at the selected positions
• n.gen is the number of genotypes with imputed data
Multiple QTL Mapping Multiple QTL Mapping
Not significant and may be

dropped from the model
Multiple QTL Mapping Multiple QTL Mapping
refineqtl() - Iteratively scan the positions for QTL in the

context of a multiple QTL model, to try to identify the
positions with maximum likelihood, for a fixed QTL model.

QTL Mapping

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

QTL Mapping

Uploaded by

Copyright:

Available Formats

Quantitative Traits

• Vary continuously (e.g.

• Assign a trait value for each mapping • Single marker analysis

Single Marker Analysis (SMA)

where y = the phenotype

• A significant difference between

Single Marker Analysis

• Text file (comma delimited)

Sample Data Sample Data

Note: Genotypes with

• Plot genetic map of • Plots a histogram or

Note: pheno.col indicates the column

• Plot a grid showing the recombination

• Used for estimating

LOD Scores Odds

Haley-Knott (HK) Regression HK Regression

• An improvement of the HK regression • ML provides better estimates but analysis

Multiple Imputation Method

Permutation test can also be used to get

Plot of combined results Composite Interval Mapping

• Excludes markers at a fixed distance from

Problem with CIM

• The estimated position of the first QTL

• cim(cross, pheno.col=1, n.marcovar=3,

CIM-Using permutation test Composite interval mapping

• Extension of interval mapping to

Other Methods used in Interval

• Bayesian Method – uses probability

• sim.geno() is used to impute genotypes with missing

Multiple QTL Mapping Multiple QTL Mapping

Not significant and may be

refineqtl() - Iteratively scan the positions for QTL in the

Multiple QTL Mapping

You might also like