Professional Documents
Culture Documents
Fred van Eeuwijk, Marcos Malosetti, Hans Jansen & Martin Boer
Wageningen, June 2011
Phenotype, Genotype and Environment
Phenotype =
Function of
Genotype
Genes, QTLs
Environment
Is this a sufficient description?
Which genotype-to-phenotype (G2P) function transforms the genotype in
the phenotype (additive, multiplicative, crop growth model, network
model)?
What is a genotype (single/multi-locus, interactions within and between
loci)?
How does the environment enter the G2P function (GxE, environmental
characterization)?
What about intermediate omics-levels between genotype and phenotype?
An ambitious definition
Task of statistical modeling in (plant)
genetics:
Predict phenotypic expression for
from molecular marker variation, genomic information
and environmental inputs
for various types of (offspring) populations
across a range of environmental conditions
for multiple traits
over developmental time
Before QTL mapping
Recombination
Types of breeding populations
Phenotypic and genotypic values
Genetic variance, heritability
Genetic architecture
Recombination in F2
Marker 1
Marker 2
Marker 1
Marker 2
Marker 1
Marker 2
Three-generation pedigree
Grandparents
Parent
Gametes
produced
by parent
Molecular marker =
short DNA segment
point on one of the chromosomes
= locus
(plural: loci)
Allele =
DNA variant
grandmaternal variant:
grandpaternal variant :
Recombination in F2
Marker 1
Marker 2
Marker 1
Marker 2
Marker 1
Marker 2
Three-generation pedigree
Grandparents
Parent
Gametes
produced
by parent
Marker 1
Marker 2
Frequencies of gametes
produced by parent
2
1 r
2
1 r
2
r
2
r
r = recombination frequency
probability that the grandparental origin of
the allele of marker 1 is different from the
grandparental origin of the allele of marker 2
Recombination in F2 and DHs
Marker 1
Marker 2
Marker 1
Marker 2
Marker 1
Marker 2
Three-generation pedigree
Grandparents
Parent
Gametes
produced
by parent
Problem:
we cannot observe gametes
(haploid genotypes)
we observe combinations of gametes
(diploid genotypes)
Doubled haploids
obtained using anther/ovary culture
Marker 1
Marker 2
Estimation of recombination frequency
JoinMap format
D
o
u
b
l
e
d
h
a
p
l
o
i
d
1
D
o
u
b
l
e
d
h
a
p
l
o
i
d
2
D
o
u
b
l
e
d
h
a
p
l
o
i
d
3
D
o
u
b
l
e
d
h
a
p
l
o
i
d
4
Marker 1 a b b a
Marker 2 a b a b
In a doubled-haploid population we have four different types of marker
data for any pair of markers:
type observed number probability
of doubled haploid
a-a n
aa
p
aa
= (1-r)/2
b-b n
bb
p
bb
= (1-r)/2
b-a n
ba
p
ba
= r/2
a-b n
ab
p
ab
= r/2
( ) ( ) ( ) ( )
( ) ( ) ( )
( ) ( )
( )
R R N
n n n n
ab ba bb aa
N
n n n n
ab ba bb aa
N
n
r
n
r
n
r
n
r
ab ba bb aa
n
ab
n
ba
n
bb
n
aa
ab ba bb aa
r r C
r r
n n n n
N
r r r r
n n n n
N
n n n n
N
p p p p
n n n n
N
L
ab ba bb aa
ab ba bb aa
ab ba bb aa
ab ba bb aa
+ +
=
=
=
=
=
1
1
! ! ! !
!
1 1
! ! ! !
!
! ! ! !
!
! ! ! !
!
2
1
2
1
2 2 2
1
2
1
Likelihood:
Estimation of recombination frequency
( )
R R N
r r C L
= 1 Likelihood:
The maximum likelihood estimator of r is obtained by maximizing the likelihood L with regard to r.
Easier: take the natural logarithm of the likelihood rather than the likelihood itself.
( ) ( ) ( ) ( ) r R r R N C ln 1 ln ln + + =
Log-Likelihood:
Take derivative of log-likelihood and put result equal to zero:
( ) ( )
( )
( ) r r
rN R
r r
R r R N r
r
R
r
R N
+
=
+
=
1
1
1
1
0
Maximum likelihood estimator:
N
R
r =
Estimation of recombination frequency
Fisher information: Expectation with regard to R of minus the second derivative of the log-likelihood
r
R
r
R N
r
+
=
c
c
1
1
st
derivative:
( )
2 2 2
2
1 r
R
r
R N
r
=
c
c
2
nd
derivative:
( )
( ) r r
N
r r
N
r
Nr
r
Nr N
r
E
=
|
.
|
\
|
+
=
+
=
(
(
c
c
1
1
1
1
1
2 2 2
2
| |
( )
N
r r
r
=
1
var
Variance: Inverse of Fisher information
The variance is zero if r = 0; thevariance attains a maximum of 1/(4N) if r = .
Testing for linkage
( )
R R N
r r C L 1
= Maximum likelihood:
Likelihood under assumption of no linkage (r = ):
N
R R N
C
C L
2
1
2
1
2
1
=
=
Likelihood ratio:
( )
( )
N
R R N
N
R R N
r r
C
r r C
LR
2
1
2
1
1
1
=
10
log(LR) is called the LOD score for linkage.
Geneticists speak: a LOD of 3 means that the observed value of the recombination frequency is
1000 times as likely as the value .
2ln(LR) follows (approximately) a chi-square distribution with 1 degree of freedom
Some population types
F2
Back cross
Population types
o
o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o
o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o
o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o
o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o
o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o
F
1
F
2
F
3
F
4
F
8
RIL
Recombinant inbred lines
Single seed descent
Population types
Recombinant inbred lines
AA
Aa
aa
BB Bb bb
Per locus:
every generation the proportion of
heterozygotes is halved
... 2 , 1 g
2
1
) g ( He
1 g
=
|
.
|
\
|
=
008 . 0
2
1
) 8 (
7
=
|
.
|
\
|
= He
a h b
a
h
b
Population types
Recombinant inbred lines
Number of recombinant individuals:
Nr
r
R
+
=
2 1
2
intensity of recombination
~ 2 if r is small
Estimate of the recombination frequency:
( ) R N
R
r
=
2