You are on page 1of 32

2 Some basic concepts in genetics

Fred van Eeuwijk, Marcos Malosetti, Hans Jansen & Martin Boer
Wageningen, June 2011
Phenotype, Genotype and Environment
Phenotype =
Function of
Genotype
Genes, QTLs
Environment
Is this a sufficient description?
Which genotype-to-phenotype (G2P) function transforms the genotype in
the phenotype (additive, multiplicative, crop growth model, network
model)?
What is a genotype (single/multi-locus, interactions within and between
loci)?
How does the environment enter the G2P function (GxE, environmental
characterization)?
What about intermediate omics-levels between genotype and phenotype?
An ambitious definition
Task of statistical modeling in (plant)
genetics:

Predict phenotypic expression for
from molecular marker variation, genomic information
and environmental inputs
for various types of (offspring) populations
across a range of environmental conditions
for multiple traits
over developmental time
Before QTL mapping
Recombination
Types of breeding populations
Phenotypic and genotypic values
Genetic variance, heritability
Genetic architecture
Recombination in F2
Marker 1
Marker 2
Marker 1
Marker 2
Marker 1
Marker 2

Three-generation pedigree
Grandparents
Parent
Gametes
produced
by parent
Molecular marker =
short DNA segment

point on one of the chromosomes
= locus
(plural: loci)
Allele =
DNA variant

grandmaternal variant:
grandpaternal variant :

Recombination in F2
Marker 1
Marker 2
Marker 1
Marker 2
Marker 1
Marker 2

Three-generation pedigree
Grandparents
Parent
Gametes
produced
by parent
Marker 1
Marker 2
Frequencies of gametes
produced by parent
2
1 r
2
1 r
2
r
2
r
r = recombination frequency

probability that the grandparental origin of
the allele of marker 1 is different from the
grandparental origin of the allele of marker 2
Recombination in F2 and DHs
Marker 1
Marker 2
Marker 1
Marker 2
Marker 1
Marker 2

Three-generation pedigree
Grandparents
Parent
Gametes
produced
by parent
Problem:
we cannot observe gametes
(haploid genotypes)

we observe combinations of gametes
(diploid genotypes)
Doubled haploids
obtained using anther/ovary culture
Marker 1
Marker 2
Estimation of recombination frequency
JoinMap format
D
o
u
b
l
e
d

h
a
p
l
o
i
d

1
D
o
u
b
l
e
d

h
a
p
l
o
i
d

2
D
o
u
b
l
e
d

h
a
p
l
o
i
d

3
D
o
u
b
l
e
d

h
a
p
l
o
i
d

4
Marker 1 a b b a
Marker 2 a b a b
In a doubled-haploid population we have four different types of marker
data for any pair of markers:

type observed number probability
of doubled haploid
a-a n
aa
p
aa
= (1-r)/2
b-b n
bb
p
bb
= (1-r)/2
b-a n
ba
p
ba
= r/2
a-b n
ab
p
ab
= r/2
( ) ( ) ( ) ( )
( ) ( ) ( )
( ) ( )
( )
R R N
n n n n
ab ba bb aa
N
n n n n
ab ba bb aa
N
n
r
n
r
n
r
n
r
ab ba bb aa
n
ab
n
ba
n
bb
n
aa
ab ba bb aa
r r C
r r
n n n n
N
r r r r
n n n n
N
n n n n
N
p p p p
n n n n
N
L
ab ba bb aa
ab ba bb aa
ab ba bb aa
ab ba bb aa

+ +

=
=
=
=
=
1
1
! ! ! !
!
1 1
! ! ! !
!
! ! ! !
!
! ! ! !
!
2
1
2
1
2 2 2
1
2
1
Likelihood:
Estimation of recombination frequency
( )
R R N
r r C L

= 1 Likelihood:
The maximum likelihood estimator of r is obtained by maximizing the likelihood L with regard to r.

Easier: take the natural logarithm of the likelihood rather than the likelihood itself.
( ) ( ) ( ) ( ) r R r R N C ln 1 ln ln + + =
Log-Likelihood:
Take derivative of log-likelihood and put result equal to zero:
( ) ( )
( )
( ) r r
rN R
r r
R r R N r
r
R
r
R N

+
=
+

=
1
1
1
1
0
Maximum likelihood estimator:
N
R
r =
Estimation of recombination frequency
Fisher information: Expectation with regard to R of minus the second derivative of the log-likelihood
r
R
r
R N
r
+

=
c
c
1

1
st
derivative:
( )
2 2 2
2
1 r
R
r
R N
r

=
c
c
2
nd
derivative:
( )
( ) r r
N
r r
N
r
Nr
r
Nr N
r
E

=
|
.
|

\
|
+

=
+

=
(
(

c
c

1
1
1
1
1
2 2 2
2

| |
( )
N
r r
r

=
1
var
Variance: Inverse of Fisher information
The variance is zero if r = 0; thevariance attains a maximum of 1/(4N) if r = .
Testing for linkage
( )
R R N
r r C L 1

= Maximum likelihood:
Likelihood under assumption of no linkage (r = ):
N
R R N
C
C L
2
1
2
1
2
1
=
=

Likelihood ratio:
( )
( )
N
R R N
N
R R N
r r
C
r r C
LR
2
1
2
1
1
1

=
10
log(LR) is called the LOD score for linkage.
Geneticists speak: a LOD of 3 means that the observed value of the recombination frequency is
1000 times as likely as the value .

2ln(LR) follows (approximately) a chi-square distribution with 1 degree of freedom

Some population types
F2
Back cross
Population types

o
o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o
o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o
o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o
o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o
o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o
F
1
F
2
F
3
F
4
F
8
RIL
Recombinant inbred lines
Single seed descent
Population types
Recombinant inbred lines

AA
Aa
aa
BB Bb bb
Per locus:
every generation the proportion of
heterozygotes is halved
... 2 , 1 g
2
1
) g ( He
1 g
=
|
.
|

\
|
=

008 . 0
2
1
) 8 (
7
=
|
.
|

\
|
= He
a h b
a
h
b
Population types
Recombinant inbred lines
Number of recombinant individuals:
Nr
r
R
+
=
2 1
2
intensity of recombination
~ 2 if r is small
Estimate of the recombination frequency:
( ) R N
R
r

=
2

By the process of repeated selfing we obtain


(nearly) twice as many recombinations as in
a single meiosis.
We are able to produce denser maps
Accumulated recombination history
BC and DH populations have less accumulated
recombination than RILs and for that reason mapping in
RILs can be more accurate
The accumulated recombination history of a population
determines how accurate QTLs can be mapped & the
power to identify QTLs
Another factor determining the power to pick up QTLs is
the number of different genetic effects that can be picked
up at a particular locus
Additive + Dominance effects in F2 (Advanced Intercross Lines)
Additive effects in DHs and RILs
What about BCs?
Accumulated recombination history for complex populations?
Transforming genotype into phenotype (multi-locus)
Genes segregating in populations affect
phenotypic distribution of trait
For quantitative (complex) trait there are many
genes, whose effect in addition may depend on
the environment (gene by environment
interaction)
Linear model
P = G + E + GxE + error (field trials in plant genetics)
Variance decomposition
V
p
= V
G
+ V
E
+ V
GxE
+ V
error
Genetic effects (Table 4.3 Lynch & Walsh)
Additive and dominance effects for alleles
Intrinsic property of allele, but may differ with genetic background
(population)
Average (additive) effect due to substitution of alleles
Property of alleles in a particular population, function of intrinsic
additive and dominance effects & genotype frequencies
Breeding value
Property of individual in particular reference population, sum of
average effects of an individuals alleles
Additive genetic variance
Variance of breeding values of individuals in particular population

Mean and genetic variance for trait under single gene in HW
Additive and dominance variance (single and multiple genes)
For multiple non interactive genes:
Average substitution effect
Wu, Ma & Casella, 1.7
Genotypic values and frequencies for 2 genes in F2
Back cross
Two locus epistatic model (F2)
Genotypic values and frequencies for two locus epistatic model (F2)
Broad sense heritability
General
H
2
= V
G
/[V
G
+ V
E
+ V
GxE
+ V
error
]

In plant breeding heritability comparison are
made between genotypes within environments
(eliminate V
E
from total variation)
For homozygous population types (DH, RIL) H
2

can be manipulated by increasing number of
environments and replicates within environments
at which genotypes are evaluated
H
2
= V
G
/[V
G
+ V
GxE
/n
E
+ V
error
/n
E
n
r
]

Narrow sense heritability
What is the use of broad and narrow sense heritability?
Estimating genetic variances and heritabilities in practice
Create a set of crosses following specific mating
designs and grow the offspring populations (for
example: back cross, F2, RIL, DH, etc) in field
experiments using appropriate experimental
design to minimize environmental disturbances in
the estimation of quantitative genetic parameters.
Example: Grow Parent 1 (P
1
), Parent 2 (P
2
), F
1

and F
2
together in a trial
Then


Genetic architecture of quantitative trait
Fairly large number of loci (50 or more)
Additive, dominance, epistatic action & interaction
with environment
Magnitude of effects at different loci can vary
considerably
Same gene may act at different phenotypic traits:
pleiotropy
Genes for a trait are distributed over the genome
at random or following particular pattern
Estimation of gene number
Assume two contrasting parental lines,
homozygous and one containing all + alleles, and
other all alleles, m
e
unlinked effective genes
with the same effect (a), purely additive






Summary
Recombination
Genotypic and average allele effect due to substitution
Genetic, environmental, GxE variances
Broad sense and narrow sense heritability
Estimating number of genes underlying a trait

You might also like